Configuration¶

Configuration options control processing behavior through the TransmogConfig class.

Parameters¶

import transmog as tm

config = tm.TransmogConfig(
    # Data Transformation
    array_mode=tm.ArrayMode.SMART,       # How to handle arrays
    include_nulls=False,                 # Include null and empty values
    stringify_values=False,              # Convert all values to strings
    max_depth=100,                       # Maximum recursion depth

    # ID and Metadata
    id_generation="random",              # ID generation strategy
    id_field="_id",                      # Field name for record IDs
    parent_field="_parent_id",           # Field name for parent references
    time_field="_timestamp",             # Field name for timestamps (None to disable)

    # Processing Control
    batch_size=1000,                     # Records to process at once
)

result = tm.flatten(data, config=config)

Core Parameters¶

These are the parameters most users will configure.

array_mode¶

Type: ArrayMode Default: ArrayMode.SMART

Controls how arrays are processed. See Array Handling for detailed examples of each mode.

Options: SMART, SEPARATE, INLINE, SKIP.

id_generation¶

Type: str | list[str] Default: "random"

Controls how record IDs are generated. See ID Management for detailed examples of each strategy.

Options: "random", "natural", "hash", or a list of field names for composite keys.

include_nulls¶

Type: bool Default: False

Include null and empty values in output. Enable this for CSV output where consistent columns across all rows are needed.

config = tm.TransmogConfig(include_nulls=True)

stringify_values¶

Type: bool Default: False

Convert all leaf values to strings after flattening:

Numbers become strings: 42 → "42", 3.14 → "3.14"
Booleans become strings: True → "True", False → "False"
Null values remain as None/null (not stringified)

config = tm.TransmogConfig(stringify_values=True)
result = tm.flatten({"price": 19.99, "active": True}, config=config)
# Result: {"price": "19.99", "active": "True"}

Useful when targeting CSV output or when downstream systems expect uniform string types. Eliminates type coercion errors in Parquet/ORC writers.

batch_size¶

Type: int Default: 1000

Number of records to process in each batch. Affects memory usage and throughput.

config = tm.TransmogConfig(batch_size=100)    # Small batches
config = tm.TransmogConfig(batch_size=10000)  # Large batches

Tip

Choosing batch_size

Small batches (100-500): Use for memory-constrained environments or very large records. flatten_stream() defaults to 100 for memory efficiency.
Medium batches (1000-5000): Default choice, balances memory and throughput.
Large batches (10000+): Use when memory is plentiful and throughput is critical. Reduces per-batch overhead.

Advanced Parameters¶

These parameters have sensible defaults and rarely need adjustment.

id_field¶

Type: str Default: "_id"

Controls two things depending on id_generation:

Output field name — the name of the ID field written to every output record, regardless of strategy.
Source field name — when id_generation="natural", the field transmog reads from each source record to use as that record’s ID.

For all other strategies ("random", "hash", composite list), the value is only used as the output field name — no source field is read.

Change this only if _id conflicts with your data schema.

parent_field¶

Type: str Default: "_parent_id"

Controls the output field name written on child records to reference their parent’s ID. This is purely an output concern — it does not read from or target any field in the source data. The parent-child link is established automatically from the nesting structure.

Change this only if _parent_id conflicts with your data schema.

Note

id_field, parent_field, and time_field must all be distinct. Supplying the same name for any two raises a ConfigurationError.

time_field¶

Type: str | None Default: "_timestamp"

Field name for processing timestamps. Timestamps are UTC in YYYY-MM-DD HH:MM:SS.ssssss format. Set to None to disable timestamp generation entirely.

config = tm.TransmogConfig(time_field=None)  # Disable timestamps

max_depth¶

Type: int Default: 100

Maximum recursion depth for nested structures. The entire subtree below this depth is silently omitted — not just the field at that level, but all of its descendants. This is a safety guard; most JSON data is well under 100 levels deep.

Note

Adjust only if processing unusually deep structures or to intentionally truncate output at a specific nesting level.

Logging¶

Transmog uses Python’s standard logging module. By default no output is produced (a NullHandler is attached to the root transmog logger). To enable diagnostic output, configure the logger in your application:

import logging

logging.basicConfig()
logging.getLogger("transmog").setLevel(logging.INFO)

Log Levels¶

INFO — API entry/exit and streaming batch progress:

INFO:transmog.api:flatten started, name=products, input_type=list
INFO:transmog.api:flatten completed, name=products, main_records=150, child_tables=3
INFO:transmog.streaming:stream started, entity=events, format=parquet
INFO:transmog.streaming:stream batch 1 processed, records_in_batch=100, total_records=100
INFO:transmog.streaming:stream completed, entity=events, total_batches=5, total_records=500

DEBUG — Format detection, schema inference, and batch processing internals:

DEBUG:transmog.iterators:file input detected, path=data.json, extension=.json
DEBUG:transmog.iterators:string format detected as jsonl
DEBUG:transmog.flattening:processing batch, records=100, entity=products
DEBUG:transmog.writers.arrow_base:arrow schema created, fields=12, types={'name': 'string', ...}
DEBUG:transmog.writers.csv:csv schema created, table=main, fields=8

WARNING — Schema drift and data issues:

WARNING:transmog.writers.csv:csv schema drift detected, table=main, unexpected_fields=['new_col']

By default, schema drift raises an OutputError. To drop unexpected fields instead, pass schema_drift="drop" to flatten_stream(). See Schema Drift for details.

Per-Module Loggers¶

Each module uses its own logger under the transmog namespace. Target specific modules to reduce noise:

import logging

# Only show streaming batch progress
logging.basicConfig()
logging.getLogger("transmog.streaming").setLevel(logging.INFO)

# Only show format detection decisions
logging.getLogger("transmog.iterators").setLevel(logging.DEBUG)

Tip

Enable DEBUG on transmog.writers.csv when troubleshooting schema drift errors. The warning log shows exactly which unexpected fields triggered the error before the exception is raised.