Contributing to Transmog

Contributions to Transmog are welcome and appreciated. This guide covers everything needed to get started contributing to the project.

Quick Start

Prerequisites

  • Python 3.8 or higher

  • Git

Development Setup

  1. Fork and clone the repository

git clone https://github.com/your-username/transmog.git
cd transmog
  1. Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install development dependencies

pip install -e ".[dev]"
  1. Install pre-commit hooks

pre-commit install

The pre-commit hooks handle code formatting, linting, and style checks automatically. No manual formatting is needed.

  1. Verify the setup

python -m pytest tests/

Development Workflow

Making Changes

  1. Create a feature branch

git checkout -b feature/your-feature-name
  1. Make your changes

Write code, add tests, update documentation as needed.

  1. Run tests locally

# Run all tests
python -m pytest

# Run specific test file
python -m pytest tests/test_specific.py

# Run with coverage
python -m pytest --cov=transmog
  1. Commit your changes

Pre-commit hooks will automatically format and check code.

git add .
git commit -m "Add feature: brief description"
  1. Push and create pull request

git push origin feature/your-feature-name

Then create a pull request on GitHub.

Types of Contributions

Bug Reports

When reporting bugs, include:

  • Python version and operating system

  • Transmog version

  • Minimal code example that reproduces the issue

  • Expected vs actual behavior

  • Full error traceback if applicable

Feature Requests

For feature requests:

  • Describe the use case and problem being solved

  • Provide examples of how the feature would be used

  • Consider if the feature fits Transmog’s scope and design principles

Code Contributions

Good first contributions:

  • Documentation improvements

  • Test coverage improvements

  • Bug fixes with clear reproduction steps

  • Performance optimizations with benchmarks

Larger contributions:

  • Data format support

  • Processing optimizations

  • API enhancements

Codebase Structure

The Transmog codebase is organized into focused, single-responsibility modules:

Core Processing Modules

  • src/transmog/core/ - Core data transformation logic

    • flattener.py - JSON flattening operations

    • extractor.py - Array extraction logic

    • hierarchy.py - Nested structure handling

    • metadata.py - Metadata annotation

    • memory.py - Memory optimization utilities

  • src/transmog/process/strategies/ - Processing strategy implementations

    • base.py - Base strategy class and common utilities

    • shared.py - Shared batch processing logic

    • memory.py, file.py, batch.py, chunked.py, csv.py - Specific strategies

  • src/transmog/process/result/ - Result handling and output

    • core.py - Core result functionality

    • converters.py - Format conversion methods

    • writers.py - File writing operations

    • streaming.py - Streaming functionality

Configuration and Validation

  • src/transmog/config/ - Configuration management

  • src/transmog/validation.py - Unified parameter validation system

  • src/transmog/error/ - Error handling and recovery strategies

This modular structure makes it easier to:

  • Navigate and understand specific functionality

  • Write focused tests for individual components

  • Make changes without affecting unrelated code

  • Maintain and extend the codebase

Development Standards

Code Style

Code style is automatically enforced by pre-commit hooks. The setup includes:

  • Black for code formatting

  • isort for import sorting

  • flake8 for linting

  • mypy for type checking

No manual formatting is required - just commit changes and the hooks handle the rest.

Testing

  • Write tests for added functionality

  • Maintain or improve test coverage

  • Use descriptive test names that explain what is being tested

  • Include both unit tests and integration tests where appropriate

Test Structure

def test_flatten_nested_objects():
    """Test that nested objects are properly flattened."""
    data = {"user": {"name": "Alice", "age": 30}}
    result = tm.flatten(data)

    assert len(result.main) == 1
    assert result.main[0]["user_name"] == "Alice"
    assert result.main[0]["user_age"] == 30

Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=transmog --cov-report=html

# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/performance/

Documentation

  • Update docstrings for functions and classes

  • Add examples to demonstrate usage

  • Update relevant documentation files in docs/

  • Ensure examples are runnable and accurate

API Design Principles

When contributing code changes, follow these principles:

Simplicity First

The main API should remain simple and intuitive:

# Good: Simple, clear API
result = tm.flatten(data)

# Avoid: Overly complex APIs
result = tm.flatten(data, complex_config=ComplexConfigObject())

Sensible Defaults

Parameters should have sensible defaults for common use cases:

# Most users should get good results with defaults
result = tm.flatten(data)

# Advanced users can customize as needed
result = tm.flatten(data, arrays="inline", batch_size=5000)

Backward Compatibility

  • Avoid breaking changes to existing APIs

  • Deprecate features before removing them

  • Provide migration paths for major changes

Performance Considerations

  • Optimize for common use cases

  • Provide memory-efficient options for large datasets

  • Include benchmarks for performance-critical changes

Pull Request Process

Before Submitting

  • Tests pass locally

  • Code follows project conventions (enforced by pre-commit)

  • Documentation is updated if needed

  • Changes are focused and atomic

  • Commit messages are clear and descriptive

Pull Request Template

When creating a pull request, include:

Description Brief description of changes and motivation.

Type of Change

  • Bug fix

  • New feature

  • Documentation update

  • Performance improvement

  • Code refactoring

Testing

  • Tests added/updated

  • All tests pass

  • Manual testing performed

Documentation

  • Documentation updated

  • Examples added/updated

  • API documentation updated

Review Process

  1. Automated Checks: CI runs tests and checks

  2. Code Review: Maintainers review the code

  3. Discussion: Address feedback and questions

  4. Approval: Maintainer approves the changes

  5. Merge: Changes are merged to main branch

Advanced Topics

Architecture Overview

Transmog follows a modular design with several core components:

  • Processor: Main entry point for users, orchestrates transformation

  • Core Transformation: Flattener and Extractor for data transformation

  • Configuration System: Hierarchical configuration with factory methods

  • I/O System: Handles reading input and writing output in various formats

  • Error Handling: Configurable strategies for dealing with errors

Extension Points

Transmog provides several extension points for customization:

  1. Custom Recovery Strategies:

    from transmog.error import RecoveryStrategy
    
    class MyRecoveryStrategy(RecoveryStrategy):
        def recover(self, error, context=None):
            # Custom recovery logic
            return recovery_result
    
  2. Custom ID Generation:

    def custom_id_strategy(record):
        # Generate ID based on record contents
        return f"CUSTOM-{record.get('id', 'unknown')}"
    
    # Use with processor
    processor = Processor.with_custom_id_generation(custom_id_strategy)
    
  3. Output Format Extensions:

    from transmog.io import DataWriter, register_writer
    
    class MyCustomWriter(DataWriter):
        def write(self, data, destination):
            # Custom writing logic
            pass
    
    # Register the writer
    register_writer("custom-format", MyCustomWriter)
    

Performance Testing

When making performance-related changes:

  • Use the benchmarking script: python scripts/run_benchmarks.py

  • Compare before and after performance metrics

  • Include benchmark results in pull request description

  • Consider memory usage as well as processing speed

Documentation Updates

When updating documentation:

  • Follow the passive voice style guidelines

  • Avoid personal pronouns (you, we, our, etc)

  • No temporal language (previously, new version, etc)

  • Provide complete, runnable examples

  • Test all code examples for accuracy

Release Process

Releases follow semantic versioning:

  • Patch (1.0.1): Bug fixes and minor improvements

  • Minor (1.1.0): New features, backward compatible

  • Major (2.0.0): Breaking changes

Getting Help

  • Documentation: Check the documentation (available after deployment)

  • Issues: Search existing GitHub issues

  • Questions: Use GitHub issues for questions and support

  • Contact: Reach out to maintainers for guidance on larger contributions

Code of Conduct

Be respectful, inclusive, and constructive in all interactions. Focus on the code and ideas, not the person. Help create a welcoming environment for all contributors.

Recognition

Contributors are recognized in:

  • Release notes for significant contributions

  • GitHub contributors list

  • Project acknowledgments

Thank you for contributing to Transmog!