Logging to Gantry

Sending data to Gantry to understand the behavior of your model.

Logging overview

This page introduces the concept of Logging in Gantry. Logging is the concept of sending data to Gantry. Gantry accumulates records corresponding to predictions (inputs and outputs) and feedback (the ground truth output). Feedback helps the quality of predictions. Predictions can also be enriched with tags and projections (derived values) to provide a clearer picture of model behavior.

The terms introduced above will be described in detail in the sections that follow. Explanations will be in the context of the following record from Gantry:

1290

In this example we have a text generation model with a single input and output. The table below explains the each column:

ColumnDescription
record_keyA unique and stable identifier for this prediction. This key can be used to apply feedback at any point in time.
application_nameThe name of this application within Gantry. Roughly speaking, each application has a corresponding "infinite DataFrame" consisting of rows like this.
tags.envAn example of how to use tags to indicate this record was captured in production.
tags.user_typeAn example of enriching a prediction with data that might not be an input, but helps add context to how the model impacts users.
inputs.promptThe prompt provided by the user, the model input.
outputs.generationThe output produced by text generation.
feedback.thumbs_upAn example of feedback that is not "ground truth", but merely the opinion of the user. This type of feedback is well suited for assessing how well users are receiving the model's predictions.
projections.word_countAn example of using projections to "project" a higher dimensional input, raw text, into a scalar. This helps understand the model's behavior more systematically.

There are 3 main ways to log data to Gantry: via stream, via batch, and via data connector. The section below describes how to log the first few columns in the record displayed above.

1310

Regardless of the logging type, Gantry needs to be initialized.

Note that Gantry is global
The Gantry module is initialized globally for per Python process. That means all logging calls in a process need to share an API key, though the parameters are left to the logging call site.

import gantry

gantry.init(
    api_key="YOUR_API_KEY",
)

📘

Logging Media Data

The process for logging image and audio data is different. Skip to that documentation here.

Logging via Stream

inputs = {
  "prompt": “I read the news today oh boy”,
}

outputs = {
  "generation": “About a lucky man who made the grade
}

gantry.log_record(
  "my-awesome-app",
  inputs=inputs,
  outputs=outputs,
)

Logging via Batch

inputs: pd.DataFrame = ...  # Contains the inputs to your model
outputs: pd.DataFrame = ... # Contains your model's predictions

gantry.log_records(
  "my-awesome-app",
  inputs=inputs,
  outputs=outputs,
  as_batch=True,
)

Logging via Data Connector

Gantry supports logging records directly from your databases with data connectors.
Currently the only supported data connector is Snowflake.

🚧

Alpha Release

Note that the Gantry data connector is currently in the Alpha release.

Register the secret with privileges granted to the table or view of the source database:

$ export GANTRY_API_KEY = "YOUR_API_KEY"

$ gantry-cli secret create \
    --name "MY_SECRET" \
    --secret-type="SNOWFLAKE_CONN_STR" \
    --secret-file="./credentials.json"

Your secret may look like this:

{
    "server_name": "SERVE_NAME",
    "user_name": "PASSWORD",
    "password": "PASSWORD",
    "warehouse_name": "DEV" // Specific for Snowflake
}

Register a data connector:

$ gantry-cli data-connector create \
    --name "my-snowflake-connector" \
    --connection-type="SNOWFLAKE" \
    --database-name="MY_DB" \
    --secret-name="MY_SNOWFLAKE_SECRET" \
    --description="Data connector to log records from my snowflake database" \
    --options='{"schema_name": "MY_SCHEMA","table_name": "MY_TABLE"}'

Submit the logging request to Gantry via the SDK:

inputs: List[str] = ["prompt"]
outputs: List[str] = ["generation"]
timestamp: str = "updated_at"
  
 gantry.log_records_from_data_connector(
    application="my-awesome-app",
    source_data_connector_name="my-snowflake-connector",
    timestamp=timestamp,
    inputs=inputs,
    outputs=outputs,
)

Logging from data connectors also supports scheduling requests. The below example shows such a request that will trigger every 8 hours from the indicated start date. The data that will be ingested will be filtered by the column name specified in watermark_key. delay_time defined in ScheduleOptions ensures that late-arriving data in the source table/view will be included up to the specified number of seconds.

from gantry.logger.types import Schedule, ScheduleFrequency, ScheduleType, ScheduleOptions

gantry.log_from_data_connector(
    application="my-awesome-app",
    source_data_connector_name="my-snowflake-connector",
    timestamp=timestamp,
    inputs=inputs,
    outputs=outputs,
    global_tags=tags,
    schedule=Schedule(
        start_on="2023-01-14T08:00:00.000000",
        frequency=ScheduleFrequency.EVERY_8_HOURS,
        type=ScheduleType.INCREMENTAL_APPEND,
        options=ScheduleOptions(watermark_key=timestamp, delay_time=300),
    )
)

Logging Media Data

Prerequisites

Image and audio data must be stored in a GCS or S3 bucket. That bucket needs to be registered with Gantry if it is private.

Logging images and audio to Gantry

import datetime
import gantry

inputs = [
      {
# Note: Gantry also supports presigned URLs or the normal "https://" url for public objects
          "s3_image": "s3://{bucket_name}/images/kitty.jpeg",
          "gcs_image": "gs://{bucket_name}/kitty.jpeg",
          "gcs_audio": "gs://{bucket_name}/audio_0.wav",
          "s3_audio": "s3://{bucket_name}/audio/audio_0.wav",
      },
      {
          "s3_image": "s3://{bucket_name}/images/kitty.jpeg",
          "gcs_image": "gs://{bucket_name}/kitty.jpeg",
          "gcs_audio": "gs://{bucket_name}/audio_0.wav",
          "s3_audio": "s3://{bucket_name}/audio/audio_0.wav",
      }]
outputs = [
      {
          "predict": "demo output",
      },
      {
          "predict": "demo output",
      }]
join_keys = [
      "4741fb17-5942-4dba-9057-6ddf43237e0a",
      "69bd0afc-115a-4394-8740-cf173fb7be3b",
  ]
timestamps = [
      datetime.datetime(2023,3, 7, 8, 5, 3, 125524),
      datetime.datetime(2023,3, 7, 8, 5, 4, 661582),
  ]
  
gantry.init()
gantry.log_records(
      application="image_demo",
      inputs=inputs,
      outputs=outputs,
      join_keys=join_keys,
      timestamps=timestamps,
      as_batch=True,
  )

Once the data ingested into Gantry, it can be reviewed from from the dashboard:

698