# API Reference

Complete reference for all public Transmog functions, classes, and types.

## Functions

### flatten()

Transform nested data structures into flat tables.

```python
flatten(
    data: Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes],
    *,
    name: str = "data",
    # Naming options
    separator: str = "_",
    nested_threshold: int = 4,
    # ID options
    id_field: Union[str, dict[str, str], None] = None,
    parent_id_field: str = "_parent_id",
    add_timestamp: bool = False,
    # Array handling
    arrays: Literal["separate", "inline", "skip"] = "separate",
    # Data options
    preserve_types: bool = False,
    skip_null: bool = True,
    skip_empty: bool = True,
    # Error handling
    errors: Literal["raise", "skip", "warn"] = "raise",
    # Performance
    batch_size: int = 1000,
    low_memory: bool = False,
) -> FlattenResult
```

**Parameters:**

- **data** (*Union[Dict, list[Dict], str, Path, bytes]*): Input data to transform. Can be:
  - Dictionary or list of dictionaries
  - JSON string
  - File path (str or Path)
  - Raw bytes containing JSON

- **name** (*str*, default="data"): Base name for generated tables

**Naming Options:**

- **separator** (*str*, default="_"): Character to join nested field names
- **nested_threshold** (*int*, default=4): Maximum nesting depth before simplifying field names

**ID Options:**

- **id_field** (*str | dict[str, str] | None*, default=None): Field(s) to use as record IDs
- **parent_id_field** (*str*, default="_parent_id"): Name for parent reference fields
- **add_timestamp** (*bool*, default=False): Add processing timestamp metadata

**Array Handling:**

- **arrays** (*Literal["separate", "inline", "skip"]*, default="separate"): How to process arrays:
  - "separate": Extract arrays into child tables (default)
  - "inline": Keep arrays as JSON strings in main table
  - "skip": Ignore arrays completely

**Data Options:**

- **preserve_types** (*bool*, default=False): Maintain original data types vs convert to strings
- **skip_null** (*bool*, default=True): Exclude null values from output
- **skip_empty** (*bool*, default=True): Exclude empty strings and collections

**Error Handling:**

- **errors** (*Literal["raise", "skip", "warn"]*, default="raise"): Error handling strategy:
  - "raise": Stop processing and raise exception
  - "skip": Skip problematic records and continue
  - "warn": Log warnings but continue processing

**Performance:**

- **batch_size** (*int*, default=1000): Records to process in each batch
- **low_memory** (*bool*, default=False): Use memory-efficient processing (slower)

**Returns:**

- **FlattenResult**: Object containing transformed tables and metadata

**Examples:**

```python
import transmog as tm

# Basic usage
data = {"name": "Product", "price": 99.99}
result = tm.flatten(data, name="products")

# Custom configuration
result = tm.flatten(
    data,
    name="products",
    separator=".",
    arrays="inline",
    preserve_types=True
)

# Using existing ID field
result = tm.flatten(data, id_field="product_id")

# Error handling
result = tm.flatten(data, errors="skip")
```

### flatten_file()

Process data directly from files.

```python
flatten_file(
    path: Union[str, Path],
    *,
    name: Optional[str] = None,
    file_format: Optional[str] = None,
    **options: Any,
) -> FlattenResult
```

**Parameters:**

- **path** (*Union[str, Path]*): Path to input file
- **name** (*Optional[str]*, default=None): Table name (defaults to filename without extension)
- **file_format** (*Optional[str]*, default=None): Input format (auto-detected from extension)
- **\*\*options**: All options from `flatten()` function

**Supported Formats:**

- JSON (.json)
- CSV (.csv) - for files containing JSON in cells

**Returns:**

- **FlattenResult**: Object containing transformed tables

**Examples:**

```python
# Process JSON file
result = tm.flatten_file("data.json", name="products")

# Auto-detect name from filename
result = tm.flatten_file("products.json")  # name="products"

# Pass additional options
result = tm.flatten_file("data.json", arrays="inline", errors="skip")
```

### flatten_stream()

Stream large datasets directly to files without loading into memory.

```python
flatten_stream(
    data: Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes],
    output_path: Union[str, Path],
    *,
    name: str = "data",
    output_format: str = "json",
    # All options from flatten()
    separator: str = "_",
    nested_threshold: int = 4,
    id_field: Union[str, dict[str, str], None] = None,
    parent_id_field: str = "_parent_id",
    add_timestamp: bool = False,
    arrays: Literal["separate", "inline", "skip"] = "separate",
    preserve_types: bool = False,
    skip_null: bool = True,
    skip_empty: bool = True,
    errors: Literal["raise", "skip", "warn"] = "raise",
    batch_size: int = 1000,
    **format_options: Any,
) -> None
```

**Parameters:**

- **data**: Input data (same as `flatten()`)
- **output_path** (*Union[str, Path]*): Directory or file path for output
- **name** (*str*, default="data"): Base name for output files
- **output_format** (*str*, default="json"): Output format ("json", "csv", "parquet")
- **\*\*format_options**: Format-specific options

**Output Formats:**

- **"json"**: JSON Lines format for efficient streaming
- **"csv"**: CSV files with proper escaping
- **"parquet"**: Columnar format for analytics (requires pyarrow)

**Returns:**

- **None**: Data is written directly to files

**Examples:**

```python
# Stream to JSON files
tm.flatten_stream(large_data, "output/", name="products", output_format="json")

# Stream to Parquet for analytics
tm.flatten_stream(data, "output/", output_format="parquet", batch_size=5000)

# Stream with custom options
tm.flatten_stream(
    data,
    "output/",
    name="logs",
    output_format="csv",
    arrays="skip",
    errors="warn"
)
```

## Classes

### FlattenResult

Container for flattened data with convenience methods for access and export.

#### Properties

**main** (*list[dict[str, Any]]*): Main flattened table

```python
result = tm.flatten(data)
main_table = result.main
```

**tables** (*dict[str, list[dict[str, Any]]]*): Child tables dictionary

```python
child_tables = result.tables
reviews = result.tables["products_reviews"]
```

**all_tables** (*dict[str, list[dict[str, Any]]]*): All tables including main

```python
all_data = result.all_tables
```

#### Methods

##### save(path, output_format=None)

Save all tables to files.

```python
save(
    path: Union[str, Path],
    output_format: Optional[str] = None
) -> Union[list[str], dict[str, str]]
```

**Parameters:**

- **path**: Output path (file or directory)
- **output_format**: Output format ("json", "csv", "parquet", auto-detected from extension)

**Returns:**

- **Union[list[str], dict[str, str]]**: Created file paths

**Examples:**

```python
# Save as JSON files in directory
paths = result.save("output/")

# Save as CSV with explicit format
paths = result.save("output/", output_format="csv")

# Save single table as JSON file
paths = result.save("data.json")
```

##### table_info()

Get metadata about all tables.

```python
table_info() -> dict[str, dict[str, Any]]
```

**Returns:**

- **Dict**: Table metadata including record counts, fields, and main table indicator

**Example:**

```python
info = result.table_info()
# {
#     "products": {
#         "records": 100,
#         "fields": ["name", "price", "_id"],
#         "is_main": True
#     },
#     "products_reviews": {
#         "records": 250,
#         "fields": ["rating", "comment", "_parent_id"],
#         "is_main": False
#     }
# }
```

#### Container Methods

FlattenResult supports standard container operations:

```python
# Length (main table record count)
count = len(result)

# Iteration (over main table)
for record in result:
    print(record)

# Key access
reviews = result["products_reviews"]
main = result["main"]  # or result[entity_name]

# Membership testing
if "products_tags" in result:
    print("Has tags table")

# Keys, values, items
table_names = list(result.keys())
table_data = list(result.values())
table_pairs = list(result.items())

# Safe access with default
tags = result.get_table("products_tags", default=[])
```

## Error Classes

### TransmogError

Base exception class for all Transmog errors.

```python
class TransmogError(Exception):
    """Base exception for Transmog operations."""
```

### ValidationError

Raised when input data or configuration is invalid.

```python
class ValidationError(TransmogError):
    """Raised for validation failures."""
```

**Common Causes:**

- Invalid configuration parameters
- Malformed input data
- Unsupported data types
- File format issues

**Example:**

```python
try:
    result = tm.flatten(data, arrays="invalid_option")
except tm.ValidationError as e:
    print(f"Configuration error: {e}")
```

## Type Aliases

### DataInput

Type alias for supported input data formats.

```python
DataInput = Union[dict[str, Any], list[dict[str, Any]], str, Path, bytes]
```

### ArrayHandling

Type alias for array processing options.

```python
ArrayHandling = Literal["separate", "inline", "skip"]
```

### ErrorHandling

Type alias for error handling strategies.

```python
ErrorHandling = Literal["raise", "skip", "warn"]
```

### IdSource

Type alias for ID field specifications.

```python
IdSource = Union[str, dict[str, str], None]
```

## Module Information

**Version**: Access current version

```python
import transmog
print(transmog.__version__)  # "1.1.0"
```

**Available Functions**: Check what's available

```python
print(transmog.__all__)
# ['flatten', 'flatten_file', 'flatten_stream', 'FlattenResult',
#  'TransmogError', 'ValidationError', '__version__']
```

## Advanced Usage

For advanced features like custom processing or configuration objects:

```python
# Import advanced components directly
from transmog.process import Processor
from transmog.config import TransmogConfig

# Create custom processor
processor = Processor()

# Use configuration objects
config = TransmogConfig(
    separator=".",
    array_handling="inline",
    preserve_types=True
)
```

See the [Developer Guide](../developer_guide/extending.md) for more advanced usage patterns.