Schema and Data Types

This page reviews how Gantry handles schema and data types.

What is a data type?

gantry.log_record(
    application="my_app",
    inputs={
        "numerical_feature_1": 1.1,
        "numerical_feature_2": 87,
        "numerical_feature_3": np.float32(1.7),
        "categorical_feature_1": 1,
        "categorical_feature_2": "cat_A",
        "text_feature_1": "cat_A is one of our categories",
        "str_array": ["this is", "an array", "of strings"],
        "embedding": [1.2, 3.4, 5.6, 7.8, 9.0]
    },
    ...
)

In this example, there are a few ambiguities in how the data will be handled:

  • All three of the inputs numericalfeature* are numbers, but they each have different Python data types.

  • The inputs numerical_feature_2 and categorical_feature_1 have the same Python data type, but the former is meant to be interpreted as a number and the latter as a category.

  • The inputs categorical_feature_2 and text_feature_1 are both represented as strings in Python, but the former is meant to be interpreted as a category and the latter as text.

Data types in Gantry encode the way you want your data to be interpreted, regardless of what format it is in when it is logged. For example, all of the numerical_feature_* inputs in the call to log_record above would be represented in Gantry as a Number.

How are data types used in Gantry?

Data types are stored in the application’s Schema. The Schema maps fields to Gantry data types. For example, the schema for the example application above would look like:

FieldType
numerical_feature_1Number
numerical_feature_2Number
numerical_feature_3Number
categorical_feature_1Category
categorical_feature_2Category
text_feature_1Text
str_arrayArray
embeddingArray

When new data arrives in Gantry, we attempt to coerce the data type to the one in the schema. If the type of the incoming data is incompatible with the schema (e.g., a string is logged to a Number field), then Gantry will store that data point as “wrong type”.

The data type for each field determines how that field can be used in Gantry. For example:

  • How the field can be summarized as a chart

  • What statistics, metrics, and projections can be computed on the field

  • How drift is detected

Schema inference

Gantry infers the schema for your application on the first batch of data that you log. As you log more data for the application, new fields that appear are automatically added to the schema. To change the type of a field, you’ll need to edit the schema.

Nested Schema

Gantry supports nested schema inference. . is reserved to represent a nested field. Any . characters in the field name will be converted to _.

Here is an example of data with nested fields.

{
  "numeric": {
    "feature.1": 1,
    "feature.2": 2,
    "feature_3": 3
  },
  "categorical": {
    "feature_1": "category.1",
    "feature.2": "category.2"
  },
  "text.feature.1": "Hello, World"
}

Gantry infers the above schema as follows:

FieldType
numeric.feature_1Number
numeric.feature_2Number
numeric.feature_3Number
categorical.feature_1Category
categorical.feature_2Category
text_feature_1Text

Editing the schema

Sometimes the schema will be inferred incorrectly, or the data type of a field will change.

First, use the sidebar to navigate to the ingestion page:

28802880

Click the three dots on the right side of the page to change the data type:

20862086

Changing the data type only impacts data logged in the future.

Available data types

The following types are available in Gantry:

Numbermean, pdf
Categoryentropy, top_category
Booleanfrac_true
Textaverage_len
Arrayvector.length
Arrayvector.norm.L1
Arrayvector.max
Arrayvector.length
Textaverage_len
Timestampmin
UUIDcount
Objectcount

Object is a catch-all type for any data you log to Gantry that doesn’t fit one of our existing types.

Array types

Gantry has built in support for one-dimensional Array types, which are logged as Python lists of native Python types: str, float, int, bool. Arrays are inherently high-dimensional, like Text, so we often use Projections to better understand their content. For example, length is a property that can be monitored for all arrays using the vector.length Projection. For numerical arrays, vector.<min, max. norm> Projections can be computed to in order to observe other scalar scalar properties such as statistics (mean), extrema (min, max), and norms (L1, L2). Filtering on Arrays is also possible using the “Array contains” filter, which will return any Record whose Array field contains an exact or partial match for at least one element. For example, an “Array contains” filter for the substring "is" would return any Record with an Array<str> field with any element containing "is", such as ["this is", "an array", "of strings"]. This can be useful for filtering multi-label lists or tag attributes that might be part of your Application.