Array Handling¶
Arrays are processed according to the array_mode configuration parameter.
Array Modes¶
SMART Mode (Default)¶
Processes arrays based on content type:
import transmog as tm
data = {
"product": {
"name": "Laptop",
"tags": ["electronics", "computers"], # Simple array - kept as native
"reviews": [ # Complex array - extracted to child table
{"rating": 5, "comment": "Excellent"},
{"rating": 4, "comment": "Good value"}
]
}
}
result = tm.flatten(data, name="products")
print(result.main)
# [
# {
# 'product_name': 'Laptop',
# 'product_tags': ['electronics', 'computers'], # Native array
# '_id': '...',
# '_timestamp': '...'
# }
# ]
print(result.tables["products_reviews"])
# [
# {'rating': 5, 'comment': 'Excellent', '_parent_id': '...', '_id': '...'},
# {'rating': 4, 'comment': 'Good value', '_parent_id': '...', '_id': '...'}
# ]
Simple arrays contain only primitive values (strings, numbers, booleans, null). Complex arrays contain objects or nested structures.
Tip
When to use SMART mode
Default choice for most use cases. Balances data normalization with simplicity by keeping simple lists inline while properly normalizing complex nested data.
SEPARATE Mode¶
Extract all arrays into child tables:
config = tm.TransmogConfig(array_mode=tm.ArrayMode.SEPARATE)
result = tm.flatten(data, name="products", config=config)
# All arrays become separate tables
print(result.tables.keys())
# ['products_tags', 'products_reviews']
Tip
When to use SEPARATE mode
Choose SEPARATE when:
Child records need to be queried independently
Building a fully normalized relational schema
Array items have their own identity or lifecycle
Performing analytics that aggregate across array items
INLINE Mode¶
Keep arrays as JSON strings:
config = tm.TransmogConfig(array_mode=tm.ArrayMode.INLINE)
result = tm.flatten(data, name="products", config=config)
print(result.main)
# [
# {
# 'product_name': 'Laptop',
# 'product_tags': '["electronics", "computers"]',
# 'product_reviews': '[{"rating": 5, ...}]',
# '_id': '...'
# }
# ]
Tip
When to use INLINE mode
Choose INLINE when:
Arrays are treated as opaque blobs
Downstream systems parse JSON natively
Preserving exact array structure is important
Minimizing table count is a priority
SKIP Mode¶
Ignore arrays entirely:
config = tm.TransmogConfig(array_mode=tm.ArrayMode.SKIP)
result = tm.flatten(data, name="products", config=config)
# Only scalar fields are included
print(result.main)
# [{'product_name': 'Laptop', '_id': '...'}]
Tip
When to use SKIP mode
Choose SKIP when:
Arrays are not relevant to the analysis
Extracting only top-level scalar fields
Reducing output size by excluding nested data
Nested Arrays¶
Arrays can contain objects with nested arrays, creating multi-level hierarchies:
data = {
"company": "TechCorp",
"departments": [
{
"name": "Engineering",
"teams": [
{"name": "Frontend", "size": 5},
{"name": "Backend", "size": 8}
]
}
]
}
config = tm.TransmogConfig(array_mode=tm.ArrayMode.SEPARATE)
result = tm.flatten(data, name="company", config=config)
# Creates multi-level hierarchy
print(list(result.all_tables.keys()))
# ['company', 'company_departments', 'company_departments_teams']
Each level maintains parent-child relationships through _parent_id fields.