Understand data types in Gantry#

What is a data type?#

Gantry is designed to ingest your data in any format. For example, you might log the following data:

gantry.log_record(
    application="my_app",
    inputs={
        "numerical_feature_1": 1.1,
        "numerical_feature_2": 87,
        "numerical_feature_3": np.float32(1.7),
        "categorical_feature_1": 1,
        "categorical_feature_2": "cat_A",
        "text_feature_1": "cat_A is one of our categories",
    },
    ...
)

In this example, there are a few ambiguities in how the data will be handled:

  • All three of the inputs numerical_feature_* are numbers, but they each have different Python data types.

  • The inputs numerical_feature_2 and categorical_feature_1 have the same Python data type, but the former is meant to be interpreted as a number and the latter as a category.

  • The inputs categorical_feature_2 and text_feature_1 are both represented as strings in Python, but the former is meant to be interpreted as a category and the latter as text.

Data types in Gantry encode the way you want your data to be interpreted, regardless of what format it is in when it is logged. For example, all of the numerical_feature_* inputs in the call to log_record above would be represented in Gantry as a Number.

How are data types used in Gantry?#

Data types are stored in the application’s Schema. The Schema maps fields to Gantry data types. For example, the schema for the example application above would look like:

Field

Type

numerical_feature_1

Number

numerical_feature_2

Number

numerical_feature_3

Number

categorical_feature_1

Category

categorical_feature_2

Category

text_feature_1

Text

When new data arrives in Gantry, we attempt to coerce the data type to the one in the schema. If the type of the incoming data is incompatible with the schema (e.g., a string is logged to a Number field), then Gantry will store that data point as “wrong type”.

The data type for each field determines how that field can be used in Gantry. For example:

  • How the field can be summarized as a chart

  • What statistics, metrics, and projections can be computed on the field

  • How drift is detected

Schema inference#

Gantry infers the schema for your application on the first batch of data that you log. As you log more data for the application, new fields that appear are automatically added to the schema. To change the type of a field, you’ll need to edit the schema.

Editing the schema#

Sometimes the schema will be inferred incorrectly, or the data type of a field will change.

First, use the sidebar to navigate to the ingestion page:

Ingestion Page

Click the three dots on the right side of the page to change the data type:

Change Data Type

Changing the data type only impacts data logged in the future.

Available data types#

The following types are available in Gantry

Type

Example statistics

Number

mean, pdf

Category

entropy, top_category

Boolean

frac_true

Text

average_len

Timestamp

min

UUID

count

Object

count

Object is a catch-all type for any data you log to Gantry that doesn’t fit one of our existing types.