Ingest your data into Gantry#

Before getting started, make sure you have your API key.

Log a batch of data#

The easiest way to start using Gantry for your application is to log a batch of historical data using the SDK.

Choosing your data#

Start with a model that uses data types that are currently supported out-of-the-box in Gantry.

We recommend sampling 50,000 - 100,000 historical production datapoints to send to Gantry. If you don’t have any handy, your training or validation data will work. Alternatively just pick a benchmark dataset you like to work with.

Make sure your initial dataset can fit into memory. We’ll cover logging more data than can fit in memory later in the guide.

Prepare your data#

Prepare the inputs, predictions, and labels for your model by loading them into three Pandas DataFrames:

inputs: pd.DataFrame = ... # Contains the inputs to your model
outputs: pd.DataFrame = ... # Contains your model's predictions
feedback: pd.DataFrame = ... # Contains your labels

Log your data#

First, initialize Gantry with your API key. Choose a name APP_NAME for your new application. Log the inputs and outputs you loaded in the previous step by calling gantry.log_records().

import pandas as pd
import gantry

gantry.init(api_key="<my-api-key>")

APP_NAME = "<my-app-name>"

inputs: pd.DataFrame = ... # Contains the inputs to your model
outputs: pd.DataFrame = ... # Contains your model's predictions
feedback: pd.DataFrame = ... # Contains your labels

gantry.log_records(APP_NAME, inputs=inputs, outputs=outputs, as_batch=True)

Setting as_batch here makes Gantry associate all the data we just sent with a unique identifier. The unique identifier for that batch can be used to filter and create groups of data. You can track the progress of this batch of data in dashboard at https://app.gantry.com/applications/<MODEL_NAME>/batches. After logging, the console will return the unique identifier for the batch.

Note: creating a batch is a blocking action, it is not recommended in real-time production settings. Use it for historical data, training data, and production use cases where predictions are made in batches.

Stream predictions to Gantry#

Logging Records#

The simplest way to stream data into Gantry is using gantry.log_record() to send your predictions as you make them, like so:

import numpy as np

import gantry

gantry.init(api_key="<my-api-key>")
APP_NAME = "<my-app-name>"

my_model = load_model()

def load_and_predict(context):
    inputs: np.ndarray = load_features(context)
    prediction = model.predict(inputs)

    gantry.log_record(
        APP_NAME,
        inputs={"feature_1": inputs[0], "feature_2": inputs[1]},
        outputs={"pred": prediction}
    )

Log Feedback to Gantry#

Feedback events, like user behavior, ground truth, or labels, provide context on the quality of your models’ predictions.

You can send a single piece of feedback using gantry.log_record(), or a batch of feedback using gantry.log_records(). This is the same method we used to log predictions, but with different parameters.

The simplest way to log feedback is to provide it alongside your inputs and predictions. For example:

gantry.log_records(
    'my_app',
    inputs=inputs,
    outputs=outputs,
    feedback=feedback # A DataFrame with the same number of rows as inputs/outputs
)

Logging delayed feedback#

Feedback can occur long after the model made its original prediction.

Gantry takes care of tying your feedback back to the prediction it corresponds to. All you have to do is pick a unique identifier for each piece of data, and send feedback with the same unique identifier.

If you have a unique identifier handy, you can simply provide it alongside both the original prediction and its feedback to tie them together.

gantry.log_record(
  'my_app',
  inputs={'a': 1, 'b': 2},
  outputs={'output': False},
  feedback_id={'id': feedback_id}
)

Later, in another process, you can log feedback with the same id:

gantry.log_record(
  'my_app',
  feedback={'output': True},
  feedback_id={'id': feedback_id}
)

Logging delayed feedback without a unique identifier#

Sometimes it’s difficult to pass a unique identifier around your system to use to tie your feedback to the corresponding predictions. Gantry provides an alternate way to tie feedback with predictions by using the inputs to the model.

For example, if we log the following prediction event:

gantry.log_record(
    'my_app',
    inputs={'a': 1, 'b': 2},
    outputs={'output': False}
)

Then Gantry will hash the inputs to create a feedback identifier. The following feedback event would be matched with it, since it uses the same inputs:

gantry.log_record(
    'my_app',
    inputs={'a': 1, 'b': 2},
    feedback={'output': True}
)

If a is a unique identifier of the prediction event for the perspective of providing feedback, we can avoid needing to specify b when logging prediction by using feedback_keys:

gantry.log_record(
  'my_app',
  inputs={'a': 1, 'b': 2},
  outputs={'output': False},
  feedback_keys=['a']
)

# No need to provide 'b' in the feedback event
gantry.log_record(
  'my_app',
  inputs={'a': 1},
  feedback={'output': True}
)

Logging data with the API#

Instead of using our Python SDK to log data, you are also able to use the same API key to log data using our API.

The API Endpoint is as follows:

POST https://app.gantry.io/api/v1/ingest/raw Ingest data into Gantry

This endpoint will receive the list of data for your Machine Learning model and process it for the proper calculations for use with Gantry.

There are three types of request bodies that are accepted with this API Endpoint. Each type will determine how the data is processed by Gantry.

Prediction Request Body

When sending Model Prediction data only.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "JSON Object - A generic object of key/values to store metadata"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  inputs: "JSON Object - An object of key/values intended to specify the inputs provided \
  to a Machine Learning Model"

  outputs: "(REQUIRED) ANY - A value(s) to denote the expected output from \
  a Machine Learning Model based on the provided inputs"

  tags: "JSON Object - An object of key/values to tag the data set with \
  (ie. `env` for a specific environment tag for this data set)"
}

Feedback Request Body

When sending Model Feedback data against previously ingested predictions.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "JSON Object - A generic object of key/values to store metadata"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  feedback_id_inputs: "JSON Object - An object of key/values to map feedback \
  to specific inputs."

  feedback: "JSON Object - An object of key/values to specifying the feedback \
  for the Machine Learning Model"
}

Record Request Body

When you want to send the prediction and feedback together.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "JSON Object - A generic object of key/values to store metadata"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  inputs: "JSON Object - An object of key/values intended to specify the inputs provided \
  to a Machine Learning Model"

  outputs: "(REQUIRED) ANY - A value(s) to denote the expected output from \
  a Machine Learning Model based on the provided inputs"

  tags: "JSON Object - An object of key/values to tag the data set with \
  (ie. `env` for a specific environment tag for this data set)"

  feedback_id_inputs: "JSON Object - An object of key/values to map feedback \
  to specific inputs."

  feedback: "JSON Object - An object of key/values to specifying the feedback \
  for the Machine Learning Model"
}

Logging data with the API#

Instead of using our Python SDK to log data, you are also able to use the same API key to log data using our API.

The API Endpoint is as follows:

POST https://app.gantry.io/api/v1/ingest/raw Ingest data into Gantry

This endpoint will recieve the list of data for your Machine Learning model and process it for the proper calculations for use with Gantry.

There are three types of request bodies that are accepted with this API Endpoint. Each type will determine how the data is processed by Gantry.

Prediction Request Body

When sending Model Prediction data only.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "(REQUIRED) JSON Metadata Object - see below"
    func_name: "(REQUIRED) String – application name"
    version: "Integer or String – application version"
    environment: "String - logging environment (dev, prod etc)"
    feedback_keys: "List of String – prediction feedback keys"
    ignore_inputs: "List of String – list of input fields to ignore"
    provided_feedback_id: "String – id of prediction's feedback event"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  inputs: "JSON Object - An object of key/values intended to specify the inputs provided \
  to a Machine Learning Model"

  outputs: "(REQUIRED) ANY - A value(s) to denote the expected output from \
  a Machine Learning Model based on the provided inputs"

  tags: "JSON Object - An object of key/values to tag the data set with \
  (ie. `env` for a specific environment tag for this data set)"
}

Feedback Request Body

When sending Model Feedback data against previously ingested predictions.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "JSON Metadata Object – See below"
    func_name: "(REQUIRED) String – application name"
    environment: "String - logging environment (dev, prod etc)"
    feedback_version: "Integer – version of feedback"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  feedback_id_inputs: "JSON Object - An object of key/values to map feedback \
  to specific inputs."

  feedback: "JSON Object - An object of key/values to specifying the feedback \
  for the Machine Learning Model"
}

Record Request Body

When you want to send the prediction and feedback together.

{
  event_id: "(REQUIRED) UUID - A UUID string to map the ingested data to"

  log_timestamp: "ISO DateTime String - The Timestamp for this specific ingestion"

  timestamp: "ISO DateTime String - The Timestamp for the event data"

  metadata: "(REQUIRED) Metadata Object - A composite of Feedback and Prediction Metadata"
    func_name: "(REQUIRED) String – application name"
    environment: "String - logging environment (dev, prod etc)"
    feedback_version: "Integer – version of feedback"

  feedback_id: "(REQUIRED) String - The ID of the field to map for future feedback data"

  batch_id: "UUID - An optional UUID string to map to a specific ingestion batch"

  inputs: "JSON Object - An object of key/values intended to specify the inputs provided \
  to a Machine Learning Model"

  outputs: "(REQUIRED) ANY - A value(s) to denote the expected output from \
  a Machine Learning Model based on the provided inputs"

  tags: "JSON Object - An object of key/values to tag the data set with \
  (ie. `env` for a specific environment tag for this data set)"

  feedback_id_inputs: "JSON Object - An object of key/values to map feedback \
  to specific inputs."

  feedback: "JSON Object - An object of key/values to specifying the feedback \
  for the Machine Learning Model"
}

Debugging#

Get Debug Logs#

There are two ways to specify debug logging levels with Gantry.

  1. Use gantry.setup_logger(level="<log level>").

  2. Use the logging_level parameter in gantry.init(api_key="<your_api_key>", logging_level="<log_level>")

The Gantry logger supports the python logging library levels. So, level= and logging_level= must be one of the following values: DEBUG, INFO, WARNING, CRITICAL, ERROR.

The default Gantry logging level is INFO.