API Reference

Functions

flatten()

Transform nested data structures into flat tables.

flatten(
    data: dict[str, Any] | list[dict[str, Any]] | str | Path | bytes | Iterator[dict[str, Any]],
    name: str = "data",
    config: TransmogConfig | None = None,
    progress_callback: Callable[[int, int | None], None] | None = None,
) -> FlattenResult

Parameters:

  • data (dict | list[dict] | str | Path | bytes | Iterator[dict]): Input data. Can be dictionary, list of dictionaries, JSON string, file path, bytes, or an iterator/generator yielding dictionaries.

  • name (str, default=“data”): Base name for generated tables.

  • config (TransmogConfig | None, default=None): Configuration object. Uses defaults if not provided.

  • progress_callback (Callable[[int, int | None], None] | None, default=None): Optional callable invoked after each batch flush. Receives (records_processed, total_records). total_records is the input length for list and dict inputs, or None when unknown (file paths, byte strings). Invocation frequency depends on batch_size.

Returns:

  • FlattenResult: Object containing transformed tables.

Examples:

import transmog as tm

# Basic usage
result = tm.flatten({"name": "Product", "price": 99.99})

# With configuration
config = tm.TransmogConfig(include_nulls=True, batch_size=10000)
result = tm.flatten(data, config=config)

# Custom configuration
result = tm.flatten(data, config=tm.TransmogConfig(include_nulls=True))

# Progress tracking
def on_progress(processed, total):
    if total:
        print(f"{processed}/{total} records")

result = tm.flatten(data, progress_callback=on_progress)

# Process file directly
result = tm.flatten("data.json")
result = tm.flatten("data.jsonl")
result = tm.flatten("data.json5")
result = tm.flatten("data.hjson")

Supported File Formats: JSON (.json), JSON Lines (.jsonl, .ndjson), JSON5 (.json5, requires pip install json5), HJSON (.hjson, requires pip install hjson). See Working with Files for details.

flatten_stream()

Stream data directly to files.

flatten_stream(
    data: dict[str, Any] | list[dict[str, Any]] | str | Path | bytes | Iterator[dict[str, Any]],
    output_path: str | Path,
    name: str = "data",
    output_format: str = "csv",
    config: TransmogConfig | None = None,
    progress_callback: Callable[[int, int | None], None] | None = None,
    **format_options: Any,
) -> list[Path]

Parameters:

  • data (dict | list[dict] | str | Path | bytes | Iterator[dict]): Input data (same as flatten()).

  • output_path (str | Path): Directory path for output files.

  • name (str, default=“data”): Base name for output files.

  • output_format (str, default=“csv”): Output format (“csv”, “parquet”, “orc”, “avro”).

  • config (TransmogConfig | None, default=None): Configuration object.

  • progress_callback (Callable[[int, int | None], None] | None, default=None): Optional progress callback (same as flatten()).

  • **format_options: Format-specific options.

Output Formats:

  • “csv”: CSV files

  • “parquet”: Parquet files (requires pyarrow)

  • “orc”: ORC files (requires pyarrow)

  • “avro”: Avro files (requires fastavro, cramjam)

Returns:

  • list[Path]: List of Path objects for each file written.

Examples:

# Stream to CSV
files = tm.flatten_stream(large_data, "output/", output_format="csv")
# files: [PosixPath('output/data.csv'), ...]

# Stream to Parquet
files = tm.flatten_stream(data, "output/", output_format="parquet")

# Stream to ORC with configuration
config = tm.TransmogConfig(batch_size=5000)
files = tm.flatten_stream(data, "output/", output_format="orc", config=config)

Note

When config is not provided, flatten_stream() uses batch_size=100 (instead of the default 1000) for memory efficiency. Pass an explicit config to override.

See also

For large datasets that don’t fit in memory, use flatten_stream() instead of flatten(). It writes directly to disk without keeping all data in memory.

Classes

TransmogConfig

Configuration class for all processing parameters.

TransmogConfig(
    array_mode: ArrayMode = ArrayMode.SMART,
    include_nulls: bool = False,
    stringify_values: bool = False,
    max_depth: int = 100,
    id_generation: str | list[str] = "random",
    id_field: str = "_id",
    parent_field: str = "_parent_id",
    time_field: str | None = "_timestamp",
    batch_size: int = 1000,
)

See Configuration for detailed parameter descriptions, usage guidance, and batch size recommendations.

FlattenResult

Container for flattened data.

Properties

entity_name (str): Name of the entity associated with the main table.

result = tm.flatten(data, name="products")
entity = result.entity_name  # "products"

main (list[dict[str, Any]]): Main flattened table.

result = tm.flatten(data)
main_table = result.main

tables (dict[str, list[dict[str, Any]]]): Child tables dictionary.

child_tables = result.tables
reviews = result.tables["products_reviews"]

all_tables (dict[str, list[dict[str, Any]]]): All tables including main.

all_data = result.all_tables

Methods

save()

Save tables to files.

save(
    path: str | Path,
    output_format: str | None = None,
    **format_options: Any
) -> list[str] | dict[str, str]

Parameters:

  • path: Output path (file or directory).

  • output_format: Output format (“csv”, “parquet”, “orc”, “avro”). Auto-detected from extension if not specified. Defaults to “csv” when no extension is present.

  • **format_options: Format-specific writer options (e.g., delimiter, quoting for CSV; compression for Parquet; codec for Avro). See Output Formats for codec details and optional dependency requirements.

Returns:

  • list[str] | dict[str, str]: Created file paths. Returns a list for single table output or a dictionary mapping table names to file paths for multiple tables.

Behavior:

  • With child tables: Saves all tables to a directory. If a file path with an extension is given (e.g., "output/data.csv"), the extension is stripped and a directory is created instead. Returns dict[str, str] mapping table names to paths.

  • Without child tables: Saves the main table to a single file. If no extension is present, the output format extension is appended automatically. Returns list[str].

Examples:

# Save to directory (when child tables exist)
paths = result.save("output/")
# Creates: output/products.csv, output/products_reviews.csv

# Save with explicit format
paths = result.save("output/", output_format="parquet")

# Save single table (when no child tables)
paths = result.save("data.csv")

Accessing Data

Access result data through properties:

# Main table records
records = result.main
print(f"Main table records: {len(result.main)}")

# Iterate over main table records
for record in result.main:
    print(record)

Error Classes

All exceptions inherit from TransmogError. Three are exported in the public API:

Exception

Available as

Description

TransmogError

tm.TransmogError

Base exception for all Transmog errors

ValidationError

tm.ValidationError

Input data validation failures

MissingDependencyError

tm.MissingDependencyError

Missing optional dependency (pyarrow, fastavro)

ConfigurationError and OutputError exist internally but are not exported. Catch them via TransmogError.

See Error Handling for usage examples, troubleshooting, and error handling patterns.

Type Definitions

ArrayMode

Enumeration for controlling array handling behavior. Available as tm.ArrayMode when importing transmog as tm.

import transmog as tm

tm.ArrayMode.SMART    # Default: simple arrays inline, complex extracted
tm.ArrayMode.SEPARATE # All arrays to child tables
tm.ArrayMode.INLINE   # All arrays as JSON strings
tm.ArrayMode.SKIP     # Ignore arrays

Module Information

import transmog as tm

# Version
print(tm.__version__)

# Exported names
print(tm.__all__)
# ['flatten', 'flatten_stream', 'FlattenResult',
#  'TransmogConfig', 'ArrayMode',
#  'TransmogError', 'ValidationError', 'MissingDependencyError', '__version__']

# All exported types are available directly
result = tm.flatten(data)                    # Main function
config = tm.TransmogConfig()                 # Configuration
mode = tm.ArrayMode.SMART                    # Array handling mode

# Exception handling
try:
    result = tm.flatten(data)
except tm.ValidationError as e:              # Validation errors
    print(f"Validation error: {e}")
except tm.TransmogError as e:                # Base class for all errors
    print(f"Processing error: {e}")

Advanced Usage

# Advanced configuration usage
import transmog as tm

config = tm.TransmogConfig(
    batch_size=1000,
    array_mode=tm.ArrayMode.SEPARATE
)
result = tm.flatten(data, name="products", config=config)