# API Reference ## Functions ### flatten() Transform nested data structures into flat tables. ```python flatten( data: dict[str, Any] | list[dict[str, Any]] | str | Path | bytes | Iterator[dict[str, Any]], name: str = "data", config: TransmogConfig | None = None, progress_callback: Callable[[int, int | None], None] | None = None, ) -> FlattenResult ``` **Parameters:** - **data** (*dict | list[dict] | str | Path | bytes | Iterator[dict]*): Input data. Can be dictionary, list of dictionaries, JSON string, file path, bytes, or an iterator/generator yielding dictionaries. - **name** (*str*, default="data"): Base name for generated tables. - **config** (*TransmogConfig | None*, default=None): Configuration object. Uses defaults if not provided. - **progress_callback** (*Callable[[int, int | None], None] | None*, default=None): Optional callable invoked after each batch flush. Receives `(records_processed, total_records)`. `total_records` is the input length for `list` and `dict` inputs, or `None` when unknown (file paths, byte strings). Invocation frequency depends on `batch_size`. **Returns:** - **FlattenResult**: Object containing transformed tables. **Examples:** ```python import transmog as tm # Basic usage result = tm.flatten({"name": "Product", "price": 99.99}) # With configuration config = tm.TransmogConfig(include_nulls=True, batch_size=10000) result = tm.flatten(data, config=config) # Custom configuration result = tm.flatten(data, config=tm.TransmogConfig(include_nulls=True)) # Progress tracking def on_progress(processed, total): if total: print(f"{processed}/{total} records") result = tm.flatten(data, progress_callback=on_progress) # Process file directly result = tm.flatten("data.json") result = tm.flatten("data.jsonl") result = tm.flatten("data.json5") result = tm.flatten("data.hjson") ``` **Supported File Formats:** JSON (`.json`), JSON Lines (`.jsonl`, `.ndjson`), JSON5 (`.json5`, requires `pip install json5`), HJSON (`.hjson`, requires `pip install hjson`). See [Working with Files](working-with-files) for details. ### flatten_stream() Stream data directly to files. ```python flatten_stream( data: dict[str, Any] | list[dict[str, Any]] | str | Path | bytes | Iterator[dict[str, Any]], output_path: str | Path, name: str = "data", output_format: str = "csv", config: TransmogConfig | None = None, progress_callback: Callable[[int, int | None], None] | None = None, **format_options: Any, ) -> list[Path] ``` **Parameters:** - **data** (*dict | list[dict] | str | Path | bytes | Iterator[dict]*): Input data (same as `flatten()`). - **output_path** (*str | Path*): Directory path for output files. - **name** (*str*, default="data"): Base name for output files. - **output_format** (*str*, default="csv"): Output format ("csv", "parquet", "orc", "avro"). - **config** (*TransmogConfig | None*, default=None): Configuration object. - **progress_callback** (*Callable[[int, int | None], None] | None*, default=None): Optional progress callback (same as `flatten()`). - **\*\*format_options**: Format-specific options. **Output Formats:** - **"csv"**: CSV files - **"parquet"**: Parquet files (requires pyarrow) - **"orc"**: ORC files (requires pyarrow) - **"avro"**: Avro files (requires fastavro, cramjam) **Returns:** - **list[Path]**: List of `Path` objects for each file written. **Examples:** ```python # Stream to CSV files = tm.flatten_stream(large_data, "output/", output_format="csv") # files: [PosixPath('output/data.csv'), ...] # Stream to Parquet files = tm.flatten_stream(data, "output/", output_format="parquet") # Stream to ORC with configuration config = tm.TransmogConfig(batch_size=5000) files = tm.flatten_stream(data, "output/", output_format="orc", config=config) ``` :::{note} When `config` is not provided, `flatten_stream()` uses `batch_size=100` (instead of the default 1000) for memory efficiency. Pass an explicit config to override. ::: :::{seealso} For large datasets that don't fit in memory, use `flatten_stream()` instead of `flatten()`. It writes directly to disk without keeping all data in memory. ::: ## Classes ### TransmogConfig Configuration class for all processing parameters. ```python TransmogConfig( array_mode: ArrayMode = ArrayMode.SMART, include_nulls: bool = False, stringify_values: bool = False, max_depth: int = 100, id_generation: str | list[str] = "random", id_field: str = "_id", parent_field: str = "_parent_id", time_field: str | None = "_timestamp", batch_size: int = 1000, ) ``` See {doc}`configuration` for detailed parameter descriptions, usage guidance, and batch size recommendations. ### FlattenResult Container for flattened data. #### Properties **entity_name** (*str*): Name of the entity associated with the main table. ```python result = tm.flatten(data, name="products") entity = result.entity_name # "products" ``` **main** (*list[dict[str, Any]]*): Main flattened table. ```python result = tm.flatten(data) main_table = result.main ``` **tables** (*dict[str, list[dict[str, Any]]]*): Child tables dictionary. ```python child_tables = result.tables reviews = result.tables["products_reviews"] ``` **all_tables** (*dict[str, list[dict[str, Any]]]*): All tables including main. ```python all_data = result.all_tables ``` #### Methods ##### save() Save tables to files. ```python save( path: str | Path, output_format: str | None = None, **format_options: Any ) -> list[str] | dict[str, str] ``` **Parameters:** - **path**: Output path (file or directory). - **output_format**: Output format ("csv", "parquet", "orc", "avro"). Auto-detected from extension if not specified. Defaults to "csv" when no extension is present. - **\*\*format_options**: Format-specific writer options (e.g., `delimiter`, `quoting` for CSV; `compression` for Parquet; `codec` for Avro). See {doc}`outputs` for codec details and optional dependency requirements. **Returns:** - **list[str] | dict[str, str]**: Created file paths. Returns a list for single table output or a dictionary mapping table names to file paths for multiple tables. **Behavior:** - **With child tables:** Saves all tables to a directory. If a file path with an extension is given (e.g., `"output/data.csv"`), the extension is stripped and a directory is created instead. Returns `dict[str, str]` mapping table names to paths. - **Without child tables:** Saves the main table to a single file. If no extension is present, the output format extension is appended automatically. Returns `list[str]`. **Examples:** ```python # Save to directory (when child tables exist) paths = result.save("output/") # Creates: output/products.csv, output/products_reviews.csv # Save with explicit format paths = result.save("output/", output_format="parquet") # Save single table (when no child tables) paths = result.save("data.csv") ``` #### Accessing Data Access result data through properties: ```python # Main table records records = result.main print(f"Main table records: {len(result.main)}") # Iterate over main table records for record in result.main: print(record) ``` ## Error Classes All exceptions inherit from `TransmogError`. Three are exported in the public API: | Exception | Available as | Description | | --------- | ----------- | ----------- | | `TransmogError` | `tm.TransmogError` | Base exception for all Transmog errors | | `ValidationError` | `tm.ValidationError` | Input data validation failures | | `MissingDependencyError` | `tm.MissingDependencyError` | Missing optional dependency (pyarrow, fastavro) | `ConfigurationError` and `OutputError` exist internally but are not exported. Catch them via `TransmogError`. See {doc}`errors` for usage examples, troubleshooting, and error handling patterns. ## Type Definitions ### ArrayMode Enumeration for controlling array handling behavior. Available as `tm.ArrayMode` when importing `transmog as tm`. ```python import transmog as tm tm.ArrayMode.SMART # Default: simple arrays inline, complex extracted tm.ArrayMode.SEPARATE # All arrays to child tables tm.ArrayMode.INLINE # All arrays as JSON strings tm.ArrayMode.SKIP # Ignore arrays ``` ## Module Information ```python import transmog as tm # Version print(tm.__version__) # Exported names print(tm.__all__) # ['flatten', 'flatten_stream', 'FlattenResult', # 'TransmogConfig', 'ArrayMode', # 'TransmogError', 'ValidationError', 'MissingDependencyError', '__version__'] # All exported types are available directly result = tm.flatten(data) # Main function config = tm.TransmogConfig() # Configuration mode = tm.ArrayMode.SMART # Array handling mode # Exception handling try: result = tm.flatten(data) except tm.ValidationError as e: # Validation errors print(f"Validation error: {e}") except tm.TransmogError as e: # Base class for all errors print(f"Processing error: {e}") ``` ## Advanced Usage ```python # Advanced configuration usage import transmog as tm config = tm.TransmogConfig( batch_size=1000, array_mode=tm.ArrayMode.SEPARATE ) result = tm.flatten(data, name="products", config=config) ```