# Getting Started This guide provides everything needed to get up and running quickly with data transformation. ## What is Transmog? Transmog transforms complex nested data structures into flat, tabular formats while preserving relationships between parent and child records. Perfect for: - Converting JSON data for database storage - Preparing API responses for analytics - Normalizing document data for SQL queries - ETL pipeline data transformation ## Installation Install Transmog using pip: ```bash pip install transmog ``` Verify the installation: ```python import transmog as tm print(tm.__version__) # Should print "1.1.0" ``` ## 10 Minutes to Transmog ### Basic Data Transformation Transform nested data with a single function call: ```python import transmog as tm # Sample nested data data = { "company": "TechCorp", "location": { "city": "San Francisco", "country": "USA" }, "employees": [ {"name": "Alice", "role": "Engineer", "salary": 95000}, {"name": "Bob", "role": "Designer", "salary": 75000} ] } # Transform the data result = tm.flatten(data, name="companies") # Explore the results print("Main table:") print(result.main) print("\nEmployee table:") print(result.tables["companies_employees"]) ``` **Output:** Main table: ```python [{ 'company': 'TechCorp', 'location_city': 'San Francisco', 'location_country': 'USA', '_id': 'auto_generated_id' }] ``` Employee table: ```python [ { 'name': 'Alice', 'role': 'Engineer', 'salary': '95000', '_parent_id': 'auto_generated_id' }, { 'name': 'Bob', 'role': 'Designer', 'salary': '75000', '_parent_id': 'auto_generated_id' } ] ``` ### How It Works The transformation process: 1. **Flattens nested objects** - `location.city` becomes `location_city` 2. **Extracts arrays** - `employees` array becomes a separate table 3. **Preserves relationships** - Links parent and child records with IDs ### Working with Files Process files directly: ```python # Process a JSON file result = tm.flatten_file("data.json", name="products") # Save results as CSV result.save("output", output_format="csv") # Save results as JSON result.save("output", output_format="json") ``` ### Streaming Large Data For large datasets that don't fit in memory: ```python # Stream process directly to files tm.flatten_stream( large_data, output_path="output/", name="large_dataset", output_format="parquet" ) ``` ## Core Functions Transmog provides three main functions: | Function | Purpose | Use When | |----------|---------|----------| | `tm.flatten(data)` | Transform data in memory | Data fits in memory | | `tm.flatten_file(path)` | Process files directly | Working with files | | `tm.flatten_stream(data, output_path)` | Stream to files | Large datasets | ## Configuration Basics ### Array Handling Control how arrays are processed: ```python # Default: arrays become separate tables result = tm.flatten(data, arrays="separate") # Keep arrays as JSON strings in main table result = tm.flatten(data, arrays="inline") # Skip arrays entirely result = tm.flatten(data, arrays="skip") ``` ### Field Naming Customize how nested fields are named: ```python # Use dots instead of underscores result = tm.flatten(data, separator=".") # Simplify deeply nested paths result = tm.flatten(data, nested_threshold=2) ``` ### ID Management Control identifier fields: ```python # Use existing field as ID result = tm.flatten(data, id_field="product_id") # Custom parent ID field name result = tm.flatten(data, parent_id_field="parent_ref") # Add timestamp metadata result = tm.flatten(data, add_timestamp=True) ``` ## Understanding the Results The `FlattenResult` object provides easy access to transformed data: ```python result = tm.flatten(data, name="products") # Access main table main_data = result.main # Access specific child table reviews = result.tables["products_reviews"] # Get all tables including main all_tables = result.all_tables # Table information info = result.table_info() print(f"Tables: {list(result.keys())}") print(f"Main table records: {len(result)}") # Iterate over main table for record in result: print(record) # Check if table exists if "products_tags" in result: print(result["products_tags"]) ``` ## Error Handling Configure how errors are handled using the unified error handling system: ```python # Raise errors (default) - stops on first error result = tm.flatten(data, errors="raise") # Skip problematic records - continues processing result = tm.flatten(data, errors="skip") # Warn about issues but continue - logs warnings result = tm.flatten(data, errors="warn") ``` The error handling system provides consistent error messages with standardized templates and context information across all processing modules. ## Common Patterns ### JSON API Response Processing ```python # API response with nested user data api_response = { "users": [ { "id": 1, "profile": {"name": "Alice", "email": "alice@example.com"}, "preferences": {"theme": "dark", "notifications": True}, "posts": [ {"title": "Hello World", "likes": 10}, {"title": "Python Tips", "likes": 25} ] } ] } result = tm.flatten(api_response["users"], name="users") ``` ### Log File Processing ```python # Process log entries log_data = [ { "timestamp": "2024-01-01T10:00:00Z", "level": "INFO", "source": {"service": "api", "version": "1.2.0"}, "metadata": {"request_id": "abc123", "user_id": "user456"} } ] result = tm.flatten(log_data, name="logs") ``` ### Configuration Data Normalization ```python # Application configuration config = { "database": { "host": "localhost", "port": 5432, "credentials": {"username": "admin", "password": "secret"} }, "features": { "feature_flags": ["new_ui", "beta_api"], "limits": {"max_users": 1000, "max_requests": 10000} } } result = tm.flatten(config, name="config") ``` ## Next Steps Understanding the basics: 1. **[User Guide](user_guide/file-processing.md)** - Comprehensive task-oriented guides 2. **[API Reference](api_reference/api.md)** - Complete function documentation 3. **[Developer Guide](developer_guide/extending.md)** - Advanced usage and customization ## Quick Reference ```python import transmog as tm # Basic usage result = tm.flatten(data, name="table_name") # File processing result = tm.flatten_file("input.json", name="table_name") # Streaming tm.flatten_stream(data, "output/", name="table_name", output_format="parquet") # Save results result.save("output", output_format="csv") result.save("output.json") # Single file for simple data # Access data main_table = result.main child_tables = result.tables all_tables = result.all_tables ```