Python SDK
Analyze your data in a more flexible and familiar notebook environment.
The SDK enables programmatically accessing data in Gantry without going through a dashboard. Accessing this data enables:
- Building off the SDK to create new visualizations and analyses.
- Taking custom actions, such as triggering retraining if performance dips below a threshold.
Note that Gantry is global
The Gantry module is initialized globally for per Python process. That means all logging calls in a process need to share an API key, though the parameters are left to the logging call site.
The first step to take when using the SDK will always be to initialize Gantry with an API key.
import gantry
gantry.init(
api_key="YOUR_API_KEY", # see above docs
)
Creating a query
import datetime
import gantry.query as gquery
# Get all available applications
applications = gquery.list_applications()
# Create a window for the last 30 minutes of data.
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(minutes=30)
window = gquery.query(
application="my-awesome-app",
start_time=start,
end_time=end,
version="1.2.3",
)
# If you have a saved view, you can create a window with it as well
window = gquery.query(
application="my-awesome-app",
view="my-saved-view",
)
Viewing data and statistics
The Gantry dataframe object supports many of the pandas dataframe operations.
# See the first 5 rows of your data: inputs, outputs & labels
>>>> window.head()
# This is a pandas dataframe
inputs.feature_1 timestamp
0 Wed, 12 Jan 2022 21:54:25 GMT
1 Wed, 12 Jan 2022 21:54:30 GMT
2 Wed, 12 Jan 2022 21:54:35 GMT
3 Wed, 12 Jan 2022 21:54:40 GMT
4 Wed, 12 Jan 2022 21:54:45 GMT
# Compute the mean value of a column.
>>>> window["inputs.feature_1"].mean()
10
# Compute the [0.1, 0.5, 0.9] quantiles for all columns.
>>>> window.quantile([0.1, 0.5, 0.9])
# Get a filtered window
>>>> filtered_window = window[window["inputs.feature_1"] > 100]
# Add filters together
>>>> filtered_window = window[(window["inputs.feature_2"] < 100) & (window["inputs.feature_1"] > 100)]
# Get a stat from that window
>>>> filtered_window["inputs.feature_1"].mean()
101
Computing statistics using group_by
:
# Compute the mean value of a column.
>>>> window["inputs.feature_1"].mean()
10
# Compute the mean of a column grouped by another column.
>>>> window['inputs.feature_1'].mean('group_by='inputs.feature_2')
inputs.feature_2 mean
0 value1 8.0
1 value2 9.0
2 value3 10.0
3 value4 11.0
Compute metrics on predictions and feedback
# For categorical models:
# Compute the model's accuracy.
gquery.metric.accuracy(window["outputs"], window["feedback.label"])
# Get the confusion matrix.
gquery.metric.confusion_matrix(window["outputs"], window["feedback.label"])
# For regression models:
# Compute the model's mean squared error.
gquery.metric.mean_squared_error(window["outputs"], window["feedback.label"])
# Or get the max error in the window.
gquery.metric.max_error(window["outputs"], window["feedback.label"])
Compute distribution distances
Distribution distances measure between two windows of data the similarity of the distributions of a feature.
# Compute the d1 distance between two features.
gquery.distance.d1(window["inputs.feature_1"], window["inputs.feature_2"])
# Compute the Kolmogorov-Smirnov distance.
gquery.distance.ks(window["inputs.feature_1"], window["inputs.feature_2"])
# Compute the Kullback-Liebler divergence distance.
gquery.distance.kl(window["inputs.feature_1"], window["inputs.feature_2"])
Updated 16 days ago