Array Handling

Arrays are processed according to the array_mode configuration parameter.

Array Modes

SMART Mode (Default)

Processes arrays based on content type:

import transmog as tm

data = {
    "product": {
        "name": "Laptop",
        "tags": ["electronics", "computers"],  # Simple array - kept as native
        "reviews": [  # Complex array - extracted to child table
            {"rating": 5, "comment": "Excellent"},
            {"rating": 4, "comment": "Good value"}
        ]
    }
}

result = tm.flatten(data, name="products")

print(result.main)
# [
#   {
#     'product_name': 'Laptop',
#     'product_tags': ['electronics', 'computers'],  # Native array
#     '_id': '...',
#     '_timestamp': '...'
#   }
# ]

print(result.tables["products_reviews"])
# [
#   {'rating': 5, 'comment': 'Excellent', '_parent_id': '...', '_id': '...'},
#   {'rating': 4, 'comment': 'Good value', '_parent_id': '...', '_id': '...'}
# ]

Simple arrays contain only primitive values (strings, numbers, booleans, null). Complex arrays contain objects or nested structures.

Tip

When to use SMART mode

Default choice for most use cases. Balances data normalization with simplicity by keeping simple lists inline while properly normalizing complex nested data.

SEPARATE Mode

Extract all arrays into child tables:

config = tm.TransmogConfig(array_mode=tm.ArrayMode.SEPARATE)
result = tm.flatten(data, name="products", config=config)

# All arrays become separate tables
print(result.tables.keys())
# ['products_tags', 'products_reviews']

Tip

When to use SEPARATE mode

Choose SEPARATE when:

  • Child records need to be queried independently

  • Building a fully normalized relational schema

  • Array items have their own identity or lifecycle

  • Performing analytics that aggregate across array items

INLINE Mode

Keep arrays as JSON strings:

config = tm.TransmogConfig(array_mode=tm.ArrayMode.INLINE)
result = tm.flatten(data, name="products", config=config)

print(result.main)
# [
#   {
#     'product_name': 'Laptop',
#     'product_tags': '["electronics", "computers"]',
#     'product_reviews': '[{"rating": 5, ...}]',
#     '_id': '...'
#   }
# ]

Tip

When to use INLINE mode

Choose INLINE when:

  • Arrays are treated as opaque blobs

  • Downstream systems parse JSON natively

  • Preserving exact array structure is important

  • Minimizing table count is a priority

SKIP Mode

Ignore arrays entirely:

config = tm.TransmogConfig(array_mode=tm.ArrayMode.SKIP)
result = tm.flatten(data, name="products", config=config)

# Only scalar fields are included
print(result.main)
# [{'product_name': 'Laptop', '_id': '...'}]

Tip

When to use SKIP mode

Choose SKIP when:

  • Arrays are not relevant to the analysis

  • Extracting only top-level scalar fields

  • Reducing output size by excluding nested data

Nested Arrays

Arrays can contain objects with nested arrays, creating multi-level hierarchies:

data = {
    "company": "TechCorp",
    "departments": [
        {
            "name": "Engineering",
            "teams": [
                {"name": "Frontend", "size": 5},
                {"name": "Backend", "size": 8}
            ]
        }
    ]
}

config = tm.TransmogConfig(array_mode=tm.ArrayMode.SEPARATE)
result = tm.flatten(data, name="company", config=config)

# Creates multi-level hierarchy
print(list(result.all_tables.keys()))
# ['company', 'company_departments', 'company_departments_teams']

Each level maintains parent-child relationships through _parent_id fields.