API Reference¶

Complete reference for all public Transmog functions, classes, and types.

Functions¶

flatten()¶

Transform nested data structures into flat tables.

flatten(
    data: Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes],
    *,
    name: str = "data",
    # Naming options
    separator: str = "_",
    nested_threshold: int = 4,
    # ID options
    id_field: Union[str, dict[str, str], None] = None,
    parent_id_field: str = "_parent_id",
    add_timestamp: bool = False,
    # Array handling
    arrays: Literal["separate", "inline", "skip"] = "separate",
    # Data options
    preserve_types: bool = False,
    skip_null: bool = True,
    skip_empty: bool = True,
    # Error handling
    errors: Literal["raise", "skip", "warn"] = "raise",
    # Performance
    batch_size: int = 1000,
    low_memory: bool = False,
) -> FlattenResult

Parameters:

data (Union[Dict, list[Dict], str, Path, bytes]): Input data to transform. Can be:
- Dictionary or list of dictionaries
- JSON string
- File path (str or Path)
- Raw bytes containing JSON
name (str, default=“data”): Base name for generated tables

Naming Options:

separator (str, default=“_”): Character to join nested field names
nested_threshold (int, default=4): Maximum nesting depth before simplifying field names

ID Options:

id_field (str | dict[str, str] | None, default=None): Field(s) to use as record IDs
parent_id_field (str, default=“_parent_id”): Name for parent reference fields
add_timestamp (bool, default=False): Add processing timestamp metadata

Array Handling:

arrays (Literal[“separate”, “inline”, “skip”], default=“separate”): How to process arrays:
- “separate”: Extract arrays into child tables (default)
- “inline”: Keep arrays as JSON strings in main table
- “skip”: Ignore arrays completely

Data Options:

preserve_types (bool, default=False): Maintain original data types vs convert to strings
skip_null (bool, default=True): Exclude null values from output
skip_empty (bool, default=True): Exclude empty strings and collections

Error Handling:

errors (Literal[“raise”, “skip”, “warn”], default=“raise”): Error handling strategy:
- “raise”: Stop processing and raise exception
- “skip”: Skip problematic records and continue
- “warn”: Log warnings but continue processing

Performance:

batch_size (int, default=1000): Records to process in each batch
low_memory (bool, default=False): Use memory-efficient processing (slower)

Returns:

FlattenResult: Object containing transformed tables and metadata

Examples:

import transmog as tm

# Basic usage
data = {"name": "Product", "price": 99.99}
result = tm.flatten(data, name="products")

# Custom configuration
result = tm.flatten(
    data,
    name="products",
    separator=".",
    arrays="inline",
    preserve_types=True
)

# Using existing ID field
result = tm.flatten(data, id_field="product_id")

# Error handling
result = tm.flatten(data, errors="skip")

flatten_file()¶

Process data directly from files.

flatten_file(
    path: Union[str, Path],
    *,
    name: Optional[str] = None,
    file_format: Optional[str] = None,
    **options: Any,
) -> FlattenResult

Parameters:

path (Union[str, Path]): Path to input file
name (Optional[str], default=None): Table name (defaults to filename without extension)
file_format (Optional[str], default=None): Input format (auto-detected from extension)
**options: All options from flatten() function

Supported Formats:

JSON (.json)
CSV (.csv) - for files containing JSON in cells

Returns:

FlattenResult: Object containing transformed tables

Examples:

# Process JSON file
result = tm.flatten_file("data.json", name="products")

# Auto-detect name from filename
result = tm.flatten_file("products.json")  # name="products"

# Pass additional options
result = tm.flatten_file("data.json", arrays="inline", errors="skip")

flatten_stream()¶

Stream large datasets directly to files without loading into memory.

flatten_stream(
    data: Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes],
    output_path: Union[str, Path],
    *,
    name: str = "data",
    output_format: str = "json",
    # All options from flatten()
    separator: str = "_",
    nested_threshold: int = 4,
    id_field: Union[str, dict[str, str], None] = None,
    parent_id_field: str = "_parent_id",
    add_timestamp: bool = False,
    arrays: Literal["separate", "inline", "skip"] = "separate",
    preserve_types: bool = False,
    skip_null: bool = True,
    skip_empty: bool = True,
    errors: Literal["raise", "skip", "warn"] = "raise",
    batch_size: int = 1000,
    **format_options: Any,
) -> None

Parameters:

data: Input data (same as flatten())
output_path (Union[str, Path]): Directory or file path for output
name (str, default=“data”): Base name for output files
output_format (str, default=“json”): Output format (“json”, “csv”, “parquet”)
**format_options: Format-specific options

Output Formats:

“json”: JSON Lines format for efficient streaming
“csv”: CSV files with proper escaping
“parquet”: Columnar format for analytics (requires pyarrow)

Returns:

None: Data is written directly to files

Examples:

# Stream to JSON files
tm.flatten_stream(large_data, "output/", name="products", output_format="json")

# Stream to Parquet for analytics
tm.flatten_stream(data, "output/", output_format="parquet", batch_size=5000)

# Stream with custom options
tm.flatten_stream(
    data,
    "output/",
    name="logs",
    output_format="csv",
    arrays="skip",
    errors="warn"
)

Classes¶

FlattenResult¶

Container for flattened data with convenience methods for access and export.

Properties¶

main (list[dict[str, Any]]): Main flattened table

result = tm.flatten(data)
main_table = result.main

tables (dict[str, list[dict[str, Any]]]): Child tables dictionary

child_tables = result.tables
reviews = result.tables["products_reviews"]

all_tables (dict[str, list[dict[str, Any]]]): All tables including main

all_data = result.all_tables

Methods¶

save(path, output_format=None)¶

Save all tables to files.

save(
    path: Union[str, Path],
    output_format: Optional[str] = None
) -> Union[list[str], dict[str, str]]

Parameters:

path: Output path (file or directory)
output_format: Output format (“json”, “csv”, “parquet”, auto-detected from extension)

Returns:

Union[list[str], dict[str, str]]: Created file paths

Examples:

# Save as JSON files in directory
paths = result.save("output/")

# Save as CSV with explicit format
paths = result.save("output/", output_format="csv")

# Save single table as JSON file
paths = result.save("data.json")

table_info()¶

Get metadata about all tables.

table_info() -> dict[str, dict[str, Any]]

Returns:

Dict: Table metadata including record counts, fields, and main table indicator

Example:

info = result.table_info()
# {
#     "products": {
#         "records": 100,
#         "fields": ["name", "price", "_id"],
#         "is_main": True
#     },
#     "products_reviews": {
#         "records": 250,
#         "fields": ["rating", "comment", "_parent_id"],
#         "is_main": False
#     }
# }

Container Methods¶

FlattenResult supports standard container operations:

# Length (main table record count)
count = len(result)

# Iteration (over main table)
for record in result:
    print(record)

# Key access
reviews = result["products_reviews"]
main = result["main"]  # or result[entity_name]

# Membership testing
if "products_tags" in result:
    print("Has tags table")

# Keys, values, items
table_names = list(result.keys())
table_data = list(result.values())
table_pairs = list(result.items())

# Safe access with default
tags = result.get_table("products_tags", default=[])

Error Classes¶

TransmogError¶

Base exception class for all Transmog errors.

class TransmogError(Exception):
    """Base exception for Transmog operations."""

ValidationError¶

Raised when input data or configuration is invalid.

class ValidationError(TransmogError):
    """Raised for validation failures."""

Common Causes:

Invalid configuration parameters
Malformed input data
Unsupported data types
File format issues

Example:

try:
    result = tm.flatten(data, arrays="invalid_option")
except tm.ValidationError as e:
    print(f"Configuration error: {e}")

Type Aliases¶

DataInput¶

Type alias for supported input data formats.

DataInput = Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes]

ArrayHandling¶

Type alias for array processing options.

ArrayHandling = Literal["separate", "inline", "skip"]

ErrorHandling¶

Type alias for error handling strategies.

ErrorHandling = Literal["raise", "skip", "warn"]

IdSource¶

Type alias for ID field specifications.

IdSource = Union[str, dict[str, str], None]

Module Information¶

Version: Access current version

import transmog
print(transmog.__version__)  # "1.1.0"

Available Functions: Check what’s available

print(transmog.__all__)
# ['flatten', 'flatten_file', 'flatten_stream', 'FlattenResult',
#  'TransmogError', 'ValidationError', '__version__']

Advanced Usage¶

For advanced features like custom processing or configuration objects:

# Import advanced components directly
from transmog.process import Processor
from transmog.config import TransmogConfig

# Create custom processor
processor = Processor()

# Use configuration objects
config = TransmogConfig(
    separator=".",
    array_handling="inline",
    preserve_types=True
)

See the Developer Guide for more advanced usage patterns.