Python SDK

Analyze your data in a more flexible and familiar notebook environment.

Programmatically access data in Gantry to:

  1. Build off the SDK to create new visualizations and analyses.
  2. Take custom actions, such as triggering retraining if performance dips below a threshold.
  3. Analyze data in Gantry in complex ways.

Install the SDK with pip install gantry.

The first step to take when using the SDK will always be to initialize Gantry with an API key. The Gantry module is initialized globally for per Python process. That means all logging calls in a process need to share an API key.

import gantry

gantry.init(api_key="YOUR_API_KEY")

Queries are explained in detail on the analyzing model performance page. At a high level, they're a specific way to filter your data, accessible in both the SDK and workspaces.

import datetime
import gantry
from gantry.query.time_window import RelativeTimeWindow

gantry.init(api_key=GANTRY_API_KEY)

# Get your application
app = gantry.get_application(GANTRY_APP_NAME)

# Create a window for the last 30 minutes of data.
time_window = RelativeTimeWindow(window_length = datetime.timedelta(minutes=30))

# Query the data
query = app.query(time_window)

# Fetch the data specified by this query
query.fetch()

# If you have a saved query, it can be imported
query = app.get_query("demo_query")

The Gantry dataframe object supports many of the pandas dataframe operations. Once you've queried your data locally, you can perform in depth analysis on it:

# See the first 5 rows of your data: inputs, outputs & labels
>>>> query.head(5)
# This is a pandas dataframe
inputs.feature_1      timestamp
0                      Wed, 12 Jan 2022 21:54:25 GMT
1                      Wed, 12 Jan 2022 21:54:30 GMT
2                      Wed, 12 Jan 2022 21:54:35 GMT
3                      Wed, 12 Jan 2022 21:54:40 GMT
4                      Wed, 12 Jan 2022 21:54:45 GMT


# Compute the mean value of a column.
>>>> query["inputs.feature_1"].mean()
10

# Compute the [0.1, 0.5, 0.9] quantiles for all columns.
>>>> query.quantile([0.1, 0.5, 0.9])

# Get a filtered query
>>>> filtered_query = query[query["inputs.feature_1"] > 100]
# Add filters together
>>>> filtered_query = query[(query["inputs.feature_2"] < 100) & (query["inputs.feature_1"] > 100)]

# Get a stat from that query
>>>> filtered_query["inputs.feature_1"].mean()
101

Computing statistics using group_by:

# Compute the mean value of a column.
>>>> query["inputs.feature_1"].mean()
10
# Compute the mean of a column grouped by another column.
>>>> query['inputs.feature_1'].mean(group_by='inputs.feature_2')
    inputs.feature_2   mean
0             value1    8.0
1             value2    9.0
2             value3   10.0
3             value4   11.0

Compute metrics on predictions and feedback

Gantry supports a host of metrics that can be run on any Gantry Dataframe object. A full list of supported metrics is listed in the metrics reference section of the SDK docs.

import gantry.query as gquery
# For categorical models:

# Get the confusion matrix.
gquery.metric.confusion_matrix(query["outputs"], query["feedback.label"])

Compute distribution distances

Distribution distances measure between two windows of data the similarity of the distributions of a feature. A full list of supported distances is listed in the distance reference section of the SDK docs.

import gantry.query as gquery

# Compute the d1 distance between two features.
gquery.distance.d1(query["inputs.feature_1"], query["inputs.feature_2"])

# Compute the Kolmogorov-Smirnov distance.
gquery.distance.ks(query["inputs.feature_1"], query["inputs.feature_2"])

# Compute the Kullback-Liebler divergence distance.
gquery.distance.kl(query["inputs.feature_1"], query["inputs.feature_2"])