Configuration¶
Configuration options control processing behavior through the TransmogConfig class.
Parameters¶
import transmog as tm
config = tm.TransmogConfig(
# Data Transformation
array_mode=tm.ArrayMode.SMART, # How to handle arrays
include_nulls=False, # Include null and empty values
stringify_values=False, # Convert all values to strings
max_depth=100, # Maximum recursion depth
# ID and Metadata
id_generation="random", # ID generation strategy
id_field="_id", # Field name for record IDs
parent_field="_parent_id", # Field name for parent references
time_field="_timestamp", # Field name for timestamps (None to disable)
# Processing Control
batch_size=1000, # Records to process at once
)
result = tm.flatten(data, config=config)
Core Parameters¶
These are the parameters most users will configure.
array_mode¶
Type: ArrayMode
Default: ArrayMode.SMART
Controls how arrays are processed. See Array Handling for detailed examples of each mode.
Options: SMART, SEPARATE, INLINE, SKIP.
id_generation¶
Type: str | list[str]
Default: "random"
Controls how record IDs are generated. See ID Management for detailed examples of each strategy.
Options: "random", "natural", "hash", or a list of field names for composite keys.
include_nulls¶
Type: bool
Default: False
Include null and empty values in output. Enable this for CSV output where consistent columns across all rows are needed.
config = tm.TransmogConfig(include_nulls=True)
stringify_values¶
Type: bool
Default: False
Convert all leaf values to strings after flattening:
Numbers become strings:
42→"42",3.14→"3.14"Booleans become strings:
True→"True",False→"False"Null values remain as None/null (not stringified)
config = tm.TransmogConfig(stringify_values=True)
result = tm.flatten({"price": 19.99, "active": True}, config=config)
# Result: {"price": "19.99", "active": "True"}
Useful when targeting CSV output or when downstream systems expect uniform string types. Eliminates type coercion errors in Parquet/ORC writers.
batch_size¶
Type: int
Default: 1000
Number of records to process in each batch. Affects memory usage and throughput.
config = tm.TransmogConfig(batch_size=100) # Small batches
config = tm.TransmogConfig(batch_size=10000) # Large batches
Tip
Choosing batch_size
Small batches (100-500): Use for memory-constrained environments or very large records.
flatten_stream()defaults to 100 for memory efficiency.Medium batches (1000-5000): Default choice, balances memory and throughput.
Large batches (10000+): Use when memory is plentiful and throughput is critical. Reduces per-batch overhead.
Advanced Parameters¶
These parameters have sensible defaults and rarely need adjustment.
id_field¶
Type: str
Default: "_id"
Controls two things depending on id_generation:
Output field name — the name of the ID field written to every output record, regardless of strategy.
Source field name — when
id_generation="natural", the field transmog reads from each source record to use as that record’s ID.
For all other strategies ("random", "hash", composite list), the value is only
used as the output field name — no source field is read.
Change this only if _id conflicts with your data schema.
parent_field¶
Type: str
Default: "_parent_id"
Controls the output field name written on child records to reference their parent’s ID. This is purely an output concern — it does not read from or target any field in the source data. The parent-child link is established automatically from the nesting structure.
Change this only if _parent_id conflicts with your data schema.
Note
id_field, parent_field, and time_field must all be distinct. Supplying the
same name for any two raises a ConfigurationError.
time_field¶
Type: str | None
Default: "_timestamp"
Field name for processing timestamps. Timestamps are UTC in
YYYY-MM-DD HH:MM:SS.ssssss format. Set to None to disable timestamp
generation entirely.
config = tm.TransmogConfig(time_field=None) # Disable timestamps
max_depth¶
Type: int
Default: 100
Maximum recursion depth for nested structures. The entire subtree below this depth is silently omitted — not just the field at that level, but all of its descendants. This is a safety guard; most JSON data is well under 100 levels deep.
Note
Adjust only if processing unusually deep structures or to intentionally truncate output at a specific nesting level.
Logging¶
Transmog uses Python’s standard logging module. By default no output is
produced (a NullHandler is attached to the root transmog logger). To
enable diagnostic output, configure the logger in your application:
import logging
logging.basicConfig()
logging.getLogger("transmog").setLevel(logging.INFO)
Log Levels¶
INFO — API entry/exit and streaming batch progress:
INFO:transmog.api:flatten started, name=products, input_type=list
INFO:transmog.api:flatten completed, name=products, main_records=150, child_tables=3
INFO:transmog.streaming:stream started, entity=events, format=parquet
INFO:transmog.streaming:stream batch 1 processed, records_in_batch=100, total_records=100
INFO:transmog.streaming:stream completed, entity=events, total_batches=5, total_records=500
DEBUG — Format detection, schema inference, and batch processing internals:
DEBUG:transmog.iterators:file input detected, path=data.json, extension=.json
DEBUG:transmog.iterators:string format detected as jsonl
DEBUG:transmog.flattening:processing batch, records=100, entity=products
DEBUG:transmog.writers.arrow_base:arrow schema created, fields=12, types={'name': 'string', ...}
DEBUG:transmog.writers.csv:csv schema created, table=main, fields=8
WARNING — Schema drift and data issues:
WARNING:transmog.writers.csv:csv schema drift detected, table=main, unexpected_fields=['new_col']
By default, schema drift raises an OutputError. To drop unexpected fields
instead, pass schema_drift="drop" to flatten_stream(). See
Schema Drift for details.
Per-Module Loggers¶
Each module uses its own logger under the transmog namespace. Target
specific modules to reduce noise:
import logging
# Only show streaming batch progress
logging.basicConfig()
logging.getLogger("transmog.streaming").setLevel(logging.INFO)
# Only show format detection decisions
logging.getLogger("transmog.iterators").setLevel(logging.DEBUG)
Tip
Enable DEBUG on transmog.writers.csv when troubleshooting schema drift
errors. The warning log shows exactly which unexpected fields triggered the
error before the exception is raised.