# Contributing to Transmog Contributions to Transmog are welcome and appreciated. This guide covers everything needed to get started contributing to the project. ## Quick Start ### Prerequisites - Python 3.8 or higher - Git ### Development Setup 1. **Fork and clone the repository** ```bash git clone https://github.com/your-username/transmog.git cd transmog ``` 1. **Create a virtual environment** ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 1. **Install development dependencies** ```bash pip install -e ".[dev]" ``` 1. **Install pre-commit hooks** ```bash pre-commit install ``` The pre-commit hooks handle code formatting, linting, and style checks automatically. No manual formatting is needed. 1. **Verify the setup** ```bash python -m pytest tests/ ``` ## Development Workflow ### Making Changes 1. **Create a feature branch** ```bash git checkout -b feature/your-feature-name ``` 1. **Make your changes** Write code, add tests, update documentation as needed. 1. **Run tests locally** ```bash # Run all tests python -m pytest # Run specific test file python -m pytest tests/test_specific.py # Run with coverage python -m pytest --cov=transmog ``` 1. **Commit your changes** Pre-commit hooks will automatically format and check code. ```bash git add . git commit -m "Add feature: brief description" ``` 1. **Push and create pull request** ```bash git push origin feature/your-feature-name ``` Then create a pull request on GitHub. ## Types of Contributions ### Bug Reports When reporting bugs, include: - Python version and operating system - Transmog version - Minimal code example that reproduces the issue - Expected vs actual behavior - Full error traceback if applicable ### Feature Requests For feature requests: - Describe the use case and problem being solved - Provide examples of how the feature would be used - Consider if the feature fits Transmog's scope and design principles ### Code Contributions **Good first contributions:** - Documentation improvements - Test coverage improvements - Bug fixes with clear reproduction steps - Performance optimizations with benchmarks **Larger contributions:** - Data format support - Processing optimizations - API enhancements ## Codebase Structure The Transmog codebase is organized into focused, single-responsibility modules: ### Core Processing Modules - **`src/transmog/core/`** - Core data transformation logic - `flattener.py` - JSON flattening operations - `extractor.py` - Array extraction logic - `hierarchy.py` - Nested structure handling - `metadata.py` - Metadata annotation - `memory.py` - Memory optimization utilities - **`src/transmog/process/strategies/`** - Processing strategy implementations - `base.py` - Base strategy class and common utilities - `shared.py` - Shared batch processing logic - `memory.py`, `file.py`, `batch.py`, `chunked.py`, `csv.py` - Specific strategies - **`src/transmog/process/result/`** - Result handling and output - `core.py` - Core result functionality - `converters.py` - Format conversion methods - `writers.py` - File writing operations - `streaming.py` - Streaming functionality ### Configuration and Validation - **`src/transmog/config/`** - Configuration management - **`src/transmog/validation.py`** - Unified parameter validation system - **`src/transmog/error/`** - Error handling and recovery strategies This modular structure makes it easier to: - Navigate and understand specific functionality - Write focused tests for individual components - Make changes without affecting unrelated code - Maintain and extend the codebase ## Development Standards ### Code Style Code style is automatically enforced by pre-commit hooks. The setup includes: - **Black** for code formatting - **isort** for import sorting - **flake8** for linting - **mypy** for type checking No manual formatting is required - just commit changes and the hooks handle the rest. ### Testing - Write tests for added functionality - Maintain or improve test coverage - Use descriptive test names that explain what is being tested - Include both unit tests and integration tests where appropriate #### Test Structure ```python def test_flatten_nested_objects(): """Test that nested objects are properly flattened.""" data = {"user": {"name": "Alice", "age": 30}} result = tm.flatten(data) assert len(result.main) == 1 assert result.main[0]["user_name"] == "Alice" assert result.main[0]["user_age"] == 30 ``` #### Running Tests ```bash # Run all tests pytest # Run with coverage report pytest --cov=transmog --cov-report=html # Run specific test categories pytest tests/unit/ pytest tests/integration/ pytest tests/performance/ ``` ### Documentation - Update docstrings for functions and classes - Add examples to demonstrate usage - Update relevant documentation files in `docs/` - Ensure examples are runnable and accurate ## API Design Principles When contributing code changes, follow these principles: ### Simplicity First The main API should remain simple and intuitive: ```python # Good: Simple, clear API result = tm.flatten(data) # Avoid: Overly complex APIs result = tm.flatten(data, complex_config=ComplexConfigObject()) ``` ### Sensible Defaults Parameters should have sensible defaults for common use cases: ```python # Most users should get good results with defaults result = tm.flatten(data) # Advanced users can customize as needed result = tm.flatten(data, arrays="inline", batch_size=5000) ``` ### Backward Compatibility - Avoid breaking changes to existing APIs - Deprecate features before removing them - Provide migration paths for major changes ### Performance Considerations - Optimize for common use cases - Provide memory-efficient options for large datasets - Include benchmarks for performance-critical changes ## Pull Request Process ### Before Submitting - [ ] Tests pass locally - [ ] Code follows project conventions (enforced by pre-commit) - [ ] Documentation is updated if needed - [ ] Changes are focused and atomic - [ ] Commit messages are clear and descriptive ### Pull Request Template When creating a pull request, include: **Description** Brief description of changes and motivation. #### Type of Change - [ ] Bug fix - [ ] New feature - [ ] Documentation update - [ ] Performance improvement - [ ] Code refactoring #### Testing - [ ] Tests added/updated - [ ] All tests pass - [ ] Manual testing performed #### Documentation - [ ] Documentation updated - [ ] Examples added/updated - [ ] API documentation updated ### Review Process 1. **Automated Checks**: CI runs tests and checks 2. **Code Review**: Maintainers review the code 3. **Discussion**: Address feedback and questions 4. **Approval**: Maintainer approves the changes 5. **Merge**: Changes are merged to main branch ## Advanced Topics ### Architecture Overview Transmog follows a modular design with several core components: - **Processor**: Main entry point for users, orchestrates transformation - **Core Transformation**: Flattener and Extractor for data transformation - **Configuration System**: Hierarchical configuration with factory methods - **I/O System**: Handles reading input and writing output in various formats - **Error Handling**: Configurable strategies for dealing with errors ### Extension Points Transmog provides several extension points for customization: 1. **Custom Recovery Strategies**: ```python from transmog.error import RecoveryStrategy class MyRecoveryStrategy(RecoveryStrategy): def recover(self, error, context=None): # Custom recovery logic return recovery_result ``` 2. **Custom ID Generation**: ```python def custom_id_strategy(record): # Generate ID based on record contents return f"CUSTOM-{record.get('id', 'unknown')}" # Use with processor processor = Processor.with_custom_id_generation(custom_id_strategy) ``` 3. **Output Format Extensions**: ```python from transmog.io import DataWriter, register_writer class MyCustomWriter(DataWriter): def write(self, data, destination): # Custom writing logic pass # Register the writer register_writer("custom-format", MyCustomWriter) ``` ### Performance Testing When making performance-related changes: - Use the benchmarking script: `python scripts/run_benchmarks.py` - Compare before and after performance metrics - Include benchmark results in pull request description - Consider memory usage as well as processing speed ### Documentation Updates When updating documentation: - Follow the passive voice style guidelines - Avoid personal pronouns (you, we, our, etc) - No temporal language (previously, new version, etc) - Provide complete, runnable examples - Test all code examples for accuracy ## Release Process Releases follow semantic versioning: - **Patch** (1.0.1): Bug fixes and minor improvements - **Minor** (1.1.0): New features, backward compatible - **Major** (2.0.0): Breaking changes ## Getting Help - **Documentation**: Check the documentation (available after deployment) - **Issues**: Search existing [GitHub issues](https://github.com/scottdraper8/transmog/issues) - **Questions**: Use GitHub issues for questions and support - **Contact**: Reach out to maintainers for guidance on larger contributions ## Code of Conduct Be respectful, inclusive, and constructive in all interactions. Focus on the code and ideas, not the person. Help create a welcoming environment for all contributors. ## Recognition Contributors are recognized in: - Release notes for significant contributions - GitHub contributors list - Project acknowledgments Thank you for contributing to Transmog!