Getting Started¶
This guide provides everything needed to get up and running quickly with data transformation.
What is Transmog?¶
Transmog transforms complex nested data structures into flat, tabular formats while preserving relationships between parent and child records. Perfect for:
Converting JSON data for database storage
Preparing API responses for analytics
Normalizing document data for SQL queries
ETL pipeline data transformation
Installation¶
Install Transmog using pip:
pip install transmog
Verify the installation:
import transmog as tm
print(tm.__version__) # Should print "1.1.0"
10 Minutes to Transmog¶
Basic Data Transformation¶
Transform nested data with a single function call:
import transmog as tm
# Sample nested data
data = {
"company": "TechCorp",
"location": {
"city": "San Francisco",
"country": "USA"
},
"employees": [
{"name": "Alice", "role": "Engineer", "salary": 95000},
{"name": "Bob", "role": "Designer", "salary": 75000}
]
}
# Transform the data
result = tm.flatten(data, name="companies")
# Explore the results
print("Main table:")
print(result.main)
print("\nEmployee table:")
print(result.tables["companies_employees"])
Output:
Main table:
[{
'company': 'TechCorp',
'location_city': 'San Francisco',
'location_country': 'USA',
'_id': 'auto_generated_id'
}]
Employee table:
[
{
'name': 'Alice',
'role': 'Engineer',
'salary': '95000',
'_parent_id': 'auto_generated_id'
},
{
'name': 'Bob',
'role': 'Designer',
'salary': '75000',
'_parent_id': 'auto_generated_id'
}
]
How It Works¶
The transformation process:
Flattens nested objects -
location.city
becomeslocation_city
Extracts arrays -
employees
array becomes a separate tablePreserves relationships - Links parent and child records with IDs
Working with Files¶
Process files directly:
# Process a JSON file
result = tm.flatten_file("data.json", name="products")
# Save results as CSV
result.save("output", output_format="csv")
# Save results as JSON
result.save("output", output_format="json")
Streaming Large Data¶
For large datasets that don’t fit in memory:
# Stream process directly to files
tm.flatten_stream(
large_data,
output_path="output/",
name="large_dataset",
output_format="parquet"
)
Core Functions¶
Transmog provides three main functions:
Function |
Purpose |
Use When |
---|---|---|
|
Transform data in memory |
Data fits in memory |
|
Process files directly |
Working with files |
|
Stream to files |
Large datasets |
Configuration Basics¶
Array Handling¶
Control how arrays are processed:
# Default: arrays become separate tables
result = tm.flatten(data, arrays="separate")
# Keep arrays as JSON strings in main table
result = tm.flatten(data, arrays="inline")
# Skip arrays entirely
result = tm.flatten(data, arrays="skip")
Field Naming¶
Customize how nested fields are named:
# Use dots instead of underscores
result = tm.flatten(data, separator=".")
# Simplify deeply nested paths
result = tm.flatten(data, nested_threshold=2)
ID Management¶
Control identifier fields:
# Use existing field as ID
result = tm.flatten(data, id_field="product_id")
# Custom parent ID field name
result = tm.flatten(data, parent_id_field="parent_ref")
# Add timestamp metadata
result = tm.flatten(data, add_timestamp=True)
Understanding the Results¶
The FlattenResult
object provides easy access to transformed data:
result = tm.flatten(data, name="products")
# Access main table
main_data = result.main
# Access specific child table
reviews = result.tables["products_reviews"]
# Get all tables including main
all_tables = result.all_tables
# Table information
info = result.table_info()
print(f"Tables: {list(result.keys())}")
print(f"Main table records: {len(result)}")
# Iterate over main table
for record in result:
print(record)
# Check if table exists
if "products_tags" in result:
print(result["products_tags"])
Error Handling¶
Configure how errors are handled using the unified error handling system:
# Raise errors (default) - stops on first error
result = tm.flatten(data, errors="raise")
# Skip problematic records - continues processing
result = tm.flatten(data, errors="skip")
# Warn about issues but continue - logs warnings
result = tm.flatten(data, errors="warn")
The error handling system provides consistent error messages with standardized templates and context information across all processing modules.
Common Patterns¶
JSON API Response Processing¶
# API response with nested user data
api_response = {
"users": [
{
"id": 1,
"profile": {"name": "Alice", "email": "alice@example.com"},
"preferences": {"theme": "dark", "notifications": True},
"posts": [
{"title": "Hello World", "likes": 10},
{"title": "Python Tips", "likes": 25}
]
}
]
}
result = tm.flatten(api_response["users"], name="users")
Log File Processing¶
# Process log entries
log_data = [
{
"timestamp": "2024-01-01T10:00:00Z",
"level": "INFO",
"source": {"service": "api", "version": "1.2.0"},
"metadata": {"request_id": "abc123", "user_id": "user456"}
}
]
result = tm.flatten(log_data, name="logs")
Configuration Data Normalization¶
# Application configuration
config = {
"database": {
"host": "localhost",
"port": 5432,
"credentials": {"username": "admin", "password": "secret"}
},
"features": {
"feature_flags": ["new_ui", "beta_api"],
"limits": {"max_users": 1000, "max_requests": 10000}
}
}
result = tm.flatten(config, name="config")
Next Steps¶
Understanding the basics:
User Guide - Comprehensive task-oriented guides
API Reference - Complete function documentation
Developer Guide - Advanced usage and customization
Quick Reference¶
import transmog as tm
# Basic usage
result = tm.flatten(data, name="table_name")
# File processing
result = tm.flatten_file("input.json", name="table_name")
# Streaming
tm.flatten_stream(data, "output/", name="table_name", output_format="parquet")
# Save results
result.save("output", output_format="csv")
result.save("output.json") # Single file for simple data
# Access data
main_table = result.main
child_tables = result.tables
all_tables = result.all_tables