Learn how to use Gantry to monitor and improve an ML powered product.

This tutorial is a great place to start if you've never used Gantry before. By the time you complete this tutorial, you will have an elementary understanding of the following concepts:

  • How to use Gantry workspaces to find areas where your model is underperforming
  • How to query data of interest from Gantry into a DataFrame
  • How to easily set up a job using Datasets and Curation to query certain data on specified intervals to help continuously train new models.

In this tutorial, we'll walk through how to use Gantry to monitor and improve an ML-powered grammar correction application. The application consists of Gradio-powered UI, which is simply a text box, and a Huggingface-powered ML backend that corrects the grammar of the input text when the user clicks "Submit".

The user can provide feedback on the suggested corrections by accepting or rejecting them. We'll start by logging the input text, suggested correction, and feedback to Gantry using the Python SDK, as illustrated in the diagram below:

The color-codings highlight the relationship between the different kinds of data an application generate about predictions, and that data forms a row in Gantry.

Let's get started!


You'll need an API key to run the tutorial. Learn how to get one here, or navigate to the settings page and create one. You'll need to set this key as the environment variable GANTRY_API_KEY.

If you don't have a Gantry account, contact us to request access to our Beta.

You'll also need to clone the sample code:

git clone https://github.com/gantry-ml/gantry-demos.git \
	&& cd gantry-demos/grammar-error-corrector

Due to dependencies, this tutorial requires a version of python < 3.11. We recommend creating a virtual environment to make sure this tutorial does not interfere with your native or default Python environment:

python -m venv venv
source venv/bin/activate

Now, install the Python dependencies:

pip install -r requirements.txt

Finally, load the example so we have something to investigate:

python backfill.py --load-data --create-queries

Now that we are setup, let's jump into the product!

Monitoring and Observability

From the overview page, we'll see that we've ingested some data. To dive into the data, create a workspace by clicking the "Go to workspaces" button. The workspace will default to the last day of data, so we'll be dropped into April 2022, where the action takes place. The page should look like this:

What we see above is some default charts and the data panel, containing of a table of inputs, predictions, and metadata tags, such as username, that were provided with those predictions. There is also feedback, where users indicated whether or not they accepted our model's corrections.

To assess model performance, we need a metric. We can use the percent_true aggregation function to transform that feedback into an "acceptance rate" for our model's predictions:

Gantry allows users to save queries on their production data-stream. These can also be created programmatically; the setup script that we ran earlier actually pre-populated queries for this demo. Notice that we are also able to save views created in the web app.

Let's jump into this-week, which is from the third week of April 2022, when our story will unfold!

We can see the query builder the date range and filters that form this view:

Looking at the percent_true metric we built earlier, we can see a clear dip after which performance becomes volatile:

This definitely feels like something worth investigating. Let's head back to the raw data and drill in on the predictions which not accepted by our users:

On a hunch, let's create a chart that shows the age of accounts:

What really stands out here is that the records with non-accepted corrections seem heavily concentrated in newer accounts. We discovered this by observing the distribution of the account_age_days field shifting as we toggled the filter for corrections being accepted.

So, a reasonable hypothesis is that our newer users are presenting data that is confusing our model. This a simple hypothesis that we would investigate more rigorously in a real-world setting, for the purpose of this demo let's say we want to test it out.

One way of testing this out is to grab some data and train a model. Gantry makes it easy to do that straight from the UI by grabbing an SDK query:

The code snippet provided replicates the filters and will return the data in a pandas.DataFrame object. We can now perform whatever pre-processing is needed to train a model, including sending the data to a labeling service.

In this section we used the Gantry's monitoring and observability features to formulate a hypothesis for how to create a better model: gather data from newer users. We then learned how to easily pull data off our production data stream using an SDK query.

Continual Learning

In this section we will explore the question, what if we want pull data on ongoing basis?


Gantry Curators are essentially a way to tell Gantry what data you want and on what cadence. A curator is a scheduled job that produces versioned output completely managed by Gantry.

Let's dive right into creating a curator that will replicate the SDK query we saw in the previous section. You can find the code in the tutorial repository. This tutorial will finish up in that notebook.

from gantry.automations.curators import BoundedRangeCurator
from gantry.automations.triggers import IntervalTrigger
from gantry.automations import Automation
application = gantry.get_application(GantryConfig.GANTRY_APP_NAME)
new_accounts_curator_name = f"{GantryConfig.GANTRY_APP_NAME}-new-account-curator-{str(random.randrange(0, 10000001))}"
interval_trigger = IntervalTrigger(start_on = DataStorageConfig.MIN_DATE, interval = datetime.timedelta(days=1))

new_accounts_curator = BoundedRangeCurator(
curator_automation = Automation(name="curator-automation", trigger=interval_trigger, action=new_accounts_curator)

Executing the third cell in the notebook should produce the following output:

Curator(name='gec-demo-app-7346948', curated_dataset_name='gec-demo-app-new-account-curator-7346948', application_name='gec-demo-app', start_on=2022-03-30 00:00:00+00:00, curation_interval=1 day, 0:00:00, curate_past_intervals=True, selectors=[Selector(method=OrderedSampler(sample=<SamplingMethod.ordered: 'ordered'>, field='inputs.account_age_days', sort=<DruidDimensionOrderingDirections.ASCENDING: 'ascending'>), limit=1000, filters=[BoundsFilter(field='inputs.account_age_days', upper=7.0, inclusive_upper=None, lower=0.0, inclusive_lower=None)], tags=[])])

A few things to note here:

  • We provide the application name to tie this curator to the associated application
  • We use GantryConfig and DataStorageConfig to grab some configurations, like the earliest timestamp of the data which will trigger Gantry to backfill all of the intervals between that date and now
  • We use the BoundedRangeCurator because it makes it easy to specify a curator according the bounds on a field, namely account_age_days
  • We choose the, somewhat arbitrary, limit of 1000 records to represent our daily labeling budget

Being able to easily define jobs that Gantry takes care of executing reduces the cost of the data engineering required to turn production data into training and evaluation datasets on an ongoing basis. Gantry just marshals your intervals of data into versions in a Dataset.

The following code will do two things for us:

  1. Grab the dataset that our curator is populating.
  2. List the versions in the dataset. Each version represents a historical interval (in this case day) of data.
dataset = new_accounts_curator.get_curated_dataset()

Listing the versions makes the relationship between curators and datasets explicit: as curator jobs run, one for each interval, they create versions of a dataset. The versions contain metadata about what fraction of the specified limit is being met, as you can see below:

[{'version_id': 'f4251481-2c9e-4bfc-b061-9a45d97d8799',
  'dataset': 'gec_demo_app_new_accounts_curator',
  'message': 'Added Gantry data from model gec-demo-app from start time 2022-04-04T00:00:00 to end time 2022-04-05T00:00:00 - 120 records added from interval of size 193. Commit automatically created by curator.',
  'created_at': 'Wed, 01 Feb 2023 05:13:40 GMT',
  'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
  'is_latest_version': True},
 {'version_id': 'd94387eb-d6b5-4745-9912-ded44538e4f8',
  'dataset': 'gec_demo_app_new_accounts_curator',
  'message': 'Added Gantry data from model gec-demo-app from start time 2022-04-03T00:00:00 to end time 2022-04-04T00:00:00 - 110 records added from interval of size 174. Commit automatically created by curator.',
  'created_at': 'Wed, 01 Feb 2023 05:13:39 GMT',
  'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
  'is_latest_version': False},
 {'version_id': '061aa10e-2f85-4173-b419-286d2ac5a44f',
  'dataset': 'gec_demo_app_new_accounts_curator',
  'message': 'Added Gantry data from model gec-demo-app from start time 2022-04-02T00:00:00 to end time 2022-04-03T00:00:00 - 76 records added from interval of size 120. Commit automatically created by curator.',
  'created_at': 'Wed, 01 Feb 2023 05:13:38 GMT',
  'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
  'is_latest_version': False},
 {'version_id': '62842450-38a4-4c8e-b7b8-af38665c6af9',
  'dataset': 'gec_demo_app_new_accounts_curator',
  'message': 'Added Gantry data from model gec-demo-app from start time 2022-04-01T00:00:00 to end time 2022-04-02T00:00:00 - 51 records added from interval of size 96. Commit automatically created by curator.',
  'created_at': 'Wed, 01 Feb 2023 05:13:38 GMT',
  'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
  'is_latest_version': False},
 {'version_id': '6af2a754-738f-4e1c-907d-180d6266c63b',
  'dataset': 'gec_demo_app_new_accounts_curator',
  'message': 'initial dataset commit',
  'created_at': 'Wed, 01 Feb 2023 05:13:06 GMT',
  'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
  'is_latest_version': False}]

Now that we have seen how curators can make it easy to tell Gantry what data you want to gather, and let Gantry take care of gathering it, let's dive into datasets.


Gantry datasets are a lightweight container for your data with simple versioning semantics that aim to make it straightforward to build more robust ML pipelines. Versioning is at the file level, and centers on two operations: push and pull. Curators push data to Gantry Datasets. Users pull data from them to analyze, label, or train on. Users can also make local modifications to datasets, and push them back. All push operations create a new version, and each version is written to underlying S3 storage. You can read about datasets in more detail in the Datasets guide.

Let's pull the latest version of the dataset from our curator:


Again your local output will be slightly different. If it still says initial commit and doesn't pull any data, wait a bit for the curator job to finish running and try pulling again:

{'version_id': 'e4da77a1-de3e-43d9-95ff-4e1213c36be4',
 'dataset': 'gec_demo_app_new_accounts_curator',
 'message': 'Added Gantry data from model gec-demo-app from start time 2022-04-30T00:00:00 to end time 2022-05-01T00:00:00 - 11 records added from interval of size 1479. Commit automatically created by curator.',
 'created_at': 'Wed, 01 Feb 2023 05:13:59 GMT',
 'created_by': '54369935-4749-492e-961d-2fc596d2d51c',
 'is_latest_version': True}

Gantry datasets can be loaded them directly into HuggingFace Datasets, which can in turn be turned into Pandas DataFrame objects. All the type mappings are handled by the integration:

hfds = dataset.get_huggingface_dataset()
df = hfds.to_pandas()

This will produce output looking something like this, depending on how many segments have backfilled:

We can now analyze, train on, or label this dataset. If we want to do manual data manipulation, we can modify it and push a new version back to Gantry.


After setting up a demo application with some dummy data, we:

  • Used Gantry's monitoring and observability capabilities to identify a hypothesis for why our model was exhibiting diminished performance
  • Saw how to query that data straight into a DataFrame
  • Showed how we could transform that query into a Curator that would tell Gantry what data want and on what interval (to help us continuously train new models).
  • Unpacked the versioned Gantry Dataset which holds the data from the curator
  • Loaded data from a Dataset to Pandas via HuggingFace

We hope this gave you can idea of what Gantry can do and what a general workflow through Gantry looks like. Gantry helps you turn the data that passes through your ML powered products into better models, and ultimately user experiences.