- Build off the SDK to create new visualizations and analyses.
- Take custom actions, such as triggering retraining if performance dips below a threshold.
- Analyze data in Gantry in complex ways.
Install the SDK with
pip install gantry.
The first step to take when using the SDK will always be to initialize Gantry with an API key. The Gantry module is initialized globally for per Python process. That means all logging calls in a process need to share an API key.
import gantry gantry.init(api_key="YOUR_API_KEY")
Queries are explained in detail on the analyzing model performance page. At a high level, they're a specific way to filter your data, accessible in both the SDK and workspaces.
import datetime import gantry from gantry.query.time_window import RelativeTimeWindow gantry.init(api_key=GANTRY_API_KEY) # Get your application app = gantry.get_application(GANTRY_APP_NAME) # Create a window for the last 30 minutes of data. time_window = RelativeTimeWindow(window_length = datetime.timedelta(minutes=30)) # Query the data query = app.query(time_window) # Fetch the data specified by this query query.fetch() # If you have a saved query, it can be imported query = app.get_query("demo_query")
The Gantry dataframe object supports many of the pandas dataframe operations. Once you've queried your data locally, you can perform in depth analysis on it:
# See the first 5 rows of your data: inputs, outputs & labels >>>> query.head(5) # This is a pandas dataframe inputs.feature_1 timestamp 0 Wed, 12 Jan 2022 21:54:25 GMT 1 Wed, 12 Jan 2022 21:54:30 GMT 2 Wed, 12 Jan 2022 21:54:35 GMT 3 Wed, 12 Jan 2022 21:54:40 GMT 4 Wed, 12 Jan 2022 21:54:45 GMT # Compute the mean value of a column. >>>> query["inputs.feature_1"].mean() 10 # Compute the [0.1, 0.5, 0.9] quantiles for all columns. >>>> query.quantile([0.1, 0.5, 0.9]) # Get a filtered query >>>> filtered_query = query[query["inputs.feature_1"] > 100] # Add filters together >>>> filtered_query = query[(query["inputs.feature_2"] < 100) & (query["inputs.feature_1"] > 100)] # Get a stat from that query >>>> filtered_query["inputs.feature_1"].mean() 101
Computing statistics using
# Compute the mean value of a column. >>>> query["inputs.feature_1"].mean() 10 # Compute the mean of a column grouped by another column. >>>> query['inputs.feature_1'].mean(group_by='inputs.feature_2') inputs.feature_2 mean 0 value1 8.0 1 value2 9.0 2 value3 10.0 3 value4 11.0
Gantry supports a host of metrics that can be run on any Gantry Dataframe object. A full list of supported metrics is listed in the metrics reference section of the SDK docs.
import gantry.query as gquery # For categorical models: # Get the confusion matrix. gquery.metric.confusion_matrix(query["outputs"], query["feedback.label"])
Distribution distances measure between two windows of data the similarity of the distributions of a feature. A full list of supported distances is listed in the distance reference section of the SDK docs.
import gantry.query as gquery # Compute the d1 distance between two features. gquery.distance.d1(query["inputs.feature_1"], query["inputs.feature_2"]) # Compute the Kolmogorov-Smirnov distance. gquery.distance.ks(query["inputs.feature_1"], query["inputs.feature_2"]) # Compute the Kullback-Liebler divergence distance. gquery.distance.kl(query["inputs.feature_1"], query["inputs.feature_2"])
Updated 5 months ago