# Configuration

Configuration options control processing behavior through the `TransmogConfig` class.

## Parameters

```python
import transmog as tm

config = tm.TransmogConfig(
    # Data Transformation
    array_mode=tm.ArrayMode.SMART,       # How to handle arrays
    include_nulls=False,                 # Include null and empty values
    stringify_values=False,              # Convert all values to strings
    max_depth=100,                       # Maximum recursion depth

    # ID and Metadata
    id_generation="random",              # ID generation strategy
    id_field="_id",                      # Field name for record IDs
    parent_field="_parent_id",           # Field name for parent references
    time_field="_timestamp",             # Field name for timestamps (None to disable)

    # Processing Control
    batch_size=1000,                     # Records to process at once
)

result = tm.flatten(data, config=config)
```

## Core Parameters

These are the parameters most users will configure.

### array_mode

**Type:** `ArrayMode`
**Default:** `ArrayMode.SMART`

Controls how arrays are processed. See [Array Handling](arrays.md) for detailed
examples of each mode.

Options: `SMART`, `SEPARATE`, `INLINE`, `SKIP`.

### id_generation

**Type:** `str | list[str]`
**Default:** `"random"`

Controls how record IDs are generated. See [ID Management](ids.md) for detailed
examples of each strategy.

Options: `"random"`, `"natural"`, `"hash"`, or a list of field names for composite keys.

### include_nulls

**Type:** `bool`
**Default:** `False`

Include null and empty values in output. Enable this for CSV output where
consistent columns across all rows are needed.

```python
config = tm.TransmogConfig(include_nulls=True)
```

### stringify_values

**Type:** `bool`
**Default:** `False`

Convert all leaf values to strings after flattening:

- Numbers become strings: `42` → `"42"`, `3.14` → `"3.14"`
- Booleans become strings: `True` → `"True"`, `False` → `"False"`
- Null values remain as None/null (not stringified)

```python
config = tm.TransmogConfig(stringify_values=True)
result = tm.flatten({"price": 19.99, "active": True}, config=config)
# Result: {"price": "19.99", "active": "True"}
```

Useful when targeting CSV output or when downstream systems expect uniform string
types. Eliminates type coercion errors in Parquet/ORC writers.

### batch_size

**Type:** `int`
**Default:** `1000`

Number of records to process in each batch. Affects memory usage and throughput.

```python
config = tm.TransmogConfig(batch_size=100)    # Small batches
config = tm.TransmogConfig(batch_size=10000)  # Large batches
```

:::{tip}
**Choosing batch_size**

- **Small batches (100-500):** Use for memory-constrained environments or very
  large records. `flatten_stream()` defaults to 100 for memory efficiency.
- **Medium batches (1000-5000):** Default choice, balances memory and throughput.
- **Large batches (10000+):** Use when memory is plentiful and throughput is
  critical. Reduces per-batch overhead.

:::

## Advanced Parameters

These parameters have sensible defaults and rarely need adjustment.

### id_field

**Type:** `str`
**Default:** `"_id"`

Controls two things depending on `id_generation`:

- **Output field name** — the name of the ID field written to every output record,
  regardless of strategy.
- **Source field name** — when `id_generation="natural"`, the field transmog reads
  from each source record to use as that record's ID.

For all other strategies (`"random"`, `"hash"`, composite list), the value is only
used as the output field name — no source field is read.

Change this only if `_id` conflicts with your data schema.

### parent_field

**Type:** `str`
**Default:** `"_parent_id"`

Controls the **output field name** written on child records to reference their
parent's ID. This is purely an output concern — it does not read from or target
any field in the source data. The parent-child link is established automatically
from the nesting structure.

Change this only if `_parent_id` conflicts with your data schema.

:::{note}
`id_field`, `parent_field`, and `time_field` must all be distinct. Supplying the
same name for any two raises a `ConfigurationError`.
:::

### time_field

**Type:** `str | None`
**Default:** `"_timestamp"`

Field name for processing timestamps. Timestamps are UTC in
`YYYY-MM-DD HH:MM:SS.ssssss` format. Set to `None` to disable timestamp
generation entirely.

```python
config = tm.TransmogConfig(time_field=None)  # Disable timestamps
```

### max_depth

**Type:** `int`
**Default:** `100`

Maximum recursion depth for nested structures. The entire subtree below this
depth is silently omitted — not just the field at that level, but all of its
descendants. This is a safety guard; most JSON data is well under 100 levels
deep.

:::{note}
Adjust only if processing unusually deep structures or to intentionally
truncate output at a specific nesting level.
:::

## Logging

Transmog uses Python's standard `logging` module. By default no output is
produced (a `NullHandler` is attached to the root `transmog` logger). To
enable diagnostic output, configure the logger in your application:

```python
import logging

logging.basicConfig()
logging.getLogger("transmog").setLevel(logging.INFO)
```

### Log Levels

**INFO** — API entry/exit and streaming batch progress:

```text
INFO:transmog.api:flatten started, name=products, input_type=list
INFO:transmog.api:flatten completed, name=products, main_records=150, child_tables=3
INFO:transmog.streaming:stream started, entity=events, format=parquet
INFO:transmog.streaming:stream batch 1 processed, records_in_batch=100, total_records=100
INFO:transmog.streaming:stream completed, entity=events, total_batches=5, total_records=500
```

**DEBUG** — Format detection, schema inference, and batch processing internals:

```text
DEBUG:transmog.iterators:file input detected, path=data.json, extension=.json
DEBUG:transmog.iterators:string format detected as jsonl
DEBUG:transmog.flattening:processing batch, records=100, entity=products
DEBUG:transmog.writers.arrow_base:arrow schema created, fields=12, types={'name': 'string', ...}
DEBUG:transmog.writers.csv:csv schema created, table=main, fields=8
```

**WARNING** — Schema drift and data issues:

```text
WARNING:transmog.writers.csv:csv schema drift detected, table=main, unexpected_fields=['new_col']
```

By default, schema drift raises an `OutputError`. To drop unexpected fields
instead, pass `schema_drift="drop"` to `flatten_stream()`. See
[Schema Drift](schema-drift) for details.

### Per-Module Loggers

Each module uses its own logger under the `transmog` namespace. Target
specific modules to reduce noise:

```python
import logging

# Only show streaming batch progress
logging.basicConfig()
logging.getLogger("transmog.streaming").setLevel(logging.INFO)

# Only show format detection decisions
logging.getLogger("transmog.iterators").setLevel(logging.DEBUG)
```

:::{tip}
Enable `DEBUG` on `transmog.writers.csv` when troubleshooting schema drift
errors. The warning log shows exactly which unexpected fields triggered the
error before the exception is raised.
:::