Query your data using the Gantry SDK#

Gantry provides two ways to interact with your application’s data: the dashboard and the SDK query module.

To start using the SDK you’ll need an API key.

Using the Gantry SDK#

The goal of the Gantry SDK is to provide an elegant way to:

  • Programmatically access your data in Gantry, without going through a dashboard

  • Build off the SDK to create new visualizations and analyses (and let us know what you’re making!)

  • Take custom actions based off of Gantry data, such as triggering retraining if performance dips below a threshold

Connect to Gantry#

Use your API key to initialize the query module:

import gantry.query as gquery

gquery.init(api_key="<your-api-key>")

Create a query#

Let’s start querying some data:

import datetime

# Get all available applications
applications = gquery.list_applications()

# Create a window for the last 30 minutes of data.
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(minutes=30)
window = gquery.query(application="my-awesome-app", start_time=start, end_time=end, version="1.2.3")

# If you have a saved view, you can create a window with it as well
window = gquery.query(application="my-awesome-app", view="my-saved-view")

The output of .query() is a lightweight object that has metadata about your query. Because the computation happens server side, you don’t use unnecessary memory on your own machine.

Viewing data and stats#

Our custom dataframe object supports many of the pandas dataframe operations.

# See the first 5 rows of your data: inputs, outputs & labels
>>>> window.head()
# This is a pandas dataframe
inputs.feature_1      timestamp
0                      Wed, 12 Jan 2022 21:54:25 GMT
1                      Wed, 12 Jan 2022 21:54:30 GMT
2                      Wed, 12 Jan 2022 21:54:35 GMT
3                      Wed, 12 Jan 2022 21:54:40 GMT
4                      Wed, 12 Jan 2022 21:54:45 GMT


# Compute the mean value of a column.
>>>> window["inputs.feature_1"].mean()
10

# Compute the [0.1, 0.5, 0.9] quantiles for all columns.
>>>> window.quantile([0.1, 0.5, 0.9])

# Get a filtered window
>>>> filtered_window = window[window["inputs.feature_1"] > 100]
# You can add filters together as well
>>>> filtered_window = window[(window["inputs.feature_2"] < 100) & (window["inputs.feature_1"] > 100)]

# Get a stat from that window
>>>> filtered_window["inputs.feature_1"].mean()
101

Check out more of the available stats here.

Computing metrics#

You can also compute metrics on your predictions and feedback using the SDK.

# For categorical models:

# Compute the model's accuracy.
gquery.metric.accuracy(window["outputs"], window["feedback.label"])

# Or get the confusion matrix.
gquery.metric.confusion_matrix(window["outputs"], window["feedback.label"])

# For regression models:

# Compute the model's mean squared error.
gquery.metric.mean_squared_error(window["outputs"], window["feedback.label"])

# Or get the max error in the window.
gquery.metric.max_error(window["outputs"], window["feedback.label"])

Check out more of the available metrics here.

Computing distribution distances#

Distribution distances measure how similar the the distributions of a feature are between two windows of data.

# Compute the d1 distance between two features.
gquery.distance.d1(window["inputs.feature_1"], window["inputs.feature_2"])

# Compute the Kolmogorov-Smirnov distance.
gquery.distance.ks(window["inputs.feature_1"], window["inputs.feature_2"])

# Compute the Kullback-Liebler divergence distance.
gquery.distance.kl(window["inputs.feature_1"], window["inputs.feature_2"])

Check out more of the available distance metrics here.