Logging LLM Data

The logging API is designed to log every request and response made to OpenAI.

📘

This page describes a Gantry workflow specific to OpenAI Completion data. If you're using other data, follow custom model logging instructions. This workflow will support other LLM model providers and OpenAI chat soon.

Logging OpenAI requests to Gantry

Let's say we're developing a chatbot called CoolLLMApp and we want to use the OpenAI chat endpoint to power it. We want to leverage Gantry for our model analysis. Log the OpenAI inputs and outputs to Gantry with the following steps:

Make a request to the OpenAI Chat API. The request body should look something like this:

// POST https://api.openai.com/v1/chat/completions

{
  // OpenAI required parameters
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"}
  ],
  // Some optional parameters
  "temperature": 0.7,
  "n": 1
}

Given this request, an example OpenAI Chat API response would be:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "It was played at Globe Life Field.",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Request Sample: POST api/v1/chat/completions

Both of these can be logged to Gantry via a POST request to https://app.gantry.io/api/v1/chat/completions . The request and response fields are directly dumped from the OpenAI request and response. Only three fields are required to be manually provided: application, request, and response. If there are additional information to be associated with request and response, those can be sent over via request_attributes and response_attributes. An example of this:

// POST https://app.gantry.io/api/v1/chat/completions
{
    // These are the required parameters in the request
    "application": "CoolLLMApp"
    // We just dump the request body directly into this field
    "request": {
        "model": "gpt-3.5-turbo",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who won the world series in 2020?"
            },
            {
                "role": "assistant",
                "content": "The Los Angeles Dodgers won the World Series in 2020."
            },
            {
                "role": "user",
                "content": "Where was it played?"
            }
        ],
        "temperature": 0.7,
        "n": 1
    },
        // Any information we want to associate with the request
        "request_attributes": {
                "user_id": 123,
        },
        // we dump the response JSON directly into this field
    "response": {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": 1677652288,
        "choices": [
            {
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": "It was played at Globe Life Field.",
                },
                "finish_reason": "stop"
            }
        ],
        "usage": {
            "prompt_tokens": 9,
            "completion_tokens": 12,
            "total_tokens": 21
        }
    },
        // Any information we want to associate with the response
        "response_attributes": {
                "latency_ms": 100,
        },
      // Any tags we might want to associate with the call
      "tags": {
        // environment metadata
        "env": "prod",
    }
}

In Gantry, the data schema for the CoolLLMApp application would look something like:

Gantry always stores all optional parameters the OpenAI API accepts. If the parameters are not explicitly set in the request, Gantry uses the default values from the Open AI chat API documentation.

Requesting multiple answers from OpenAI

OpenAI supports requesting multiple answers from the model output using the n parameter for the endpoint. For example, if we set {n: 3} in the example POST request body to the OpenAI API, the choices field in the response might look something like this:

{
    ...,
    "choices": [
        {
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "It was played at Globe Life Field.",
        },
        "finish_reason": "stop"
      },
        {
        "index": 1,
        "message": {
          "role": "assistant",
          "content": "The 2020 World Series was played at Globe Life Field.",
        },
        "finish_reason": "stop"
      },
        {
        "index": 2,
        "message": {
          "role": "assistant",
          "content": "Globe Life Field.",
        },
        "finish_reason": "stop"
      }
    ]

To choose option 1 (index = 1) as the best choice to present to the user and log it to Gantry, set the selected_choice_index field in the logging request body. Note that if OpenAI returns multiple choices and the selected_choice_index field is not manually set, it will default to index 0.

{
    "application": "CoolLLMApp",
    "request": ...,
    "response": {
        ...
        "choices": [...]
    },
    "selected_choice_index": 1
}

The logged selected_choice_message , selected_choice_index, and choices field will be as follows:

Streaming Data

Since generating full completions takes time, a fairly common way to use the OpenAI Chat API is to use it in streaming mode. If you’re using the API in streaming mode, partial messages will be sent back as server side events rather than a single response JSON. To log these to Gantry, store the events in an array and pass it into the response field in the POST request body for logging to Gantry.

Here is an example of the OpenAI Chat API response in streaming mode:

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"The"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" "},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"202"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"0"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" World"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Series"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" was"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" played"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" at"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Globe"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Life"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Field"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" in"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Arlington"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":","},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Texas"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"."},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]

To send this to the Gantry logging API, set the response field to one of:

  • One giant string with the data events
  • A list of strings with each being a single data event
  • A list of dictionaries with each dictionary being the parsed data message

This is an example of passing it as a list of event strings:

{
    "application": "CoolLLMApp",
    "request": ...,
    "response": [
                                'data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}',
                                'data: {"id":"chatcmpl-70d5QwJAPWrAcr9WxnG0YJD4FReGj","object":"chat.completion.chunk","created":1680384604,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"The"},"index":0,"finish_reason":null}]}',
                                    ...,
                                    'data: [DONE]'
                ]
}

Gantry handles extracting the response from the data strings. This can also be used with requesting multiple choices from the OpenAI API - just set the selected_choice_index field in the response to tell Gantry the selected choice.

Session tracking

Gantry offers basic support for logging sessions. The chat logging API accepts a session_id parameter in the request body. This could be useful for assigning each OpenAI API call to a particular chat session for later analysis:

{
    "application": "CoolLLMApp",
    "request": {...},
    "response": {...},
    "session_id": "session_123"
}

The session id is something that is generated by your application logic. If you log the session id, we can provide additional features such as calculating session length and grouping sessions together.

Feedback

Gantry offers support for logging user feedback on a per-message/per-OpenAI-API-call basis. Set the feedback field when logging the OpenAI API call:

{
    "application": "CoolLLMApp",
    "request": {...},
    "response": {...},
    "feedback": {"user_feedback": "Helpful message!"} // a dict with any fields
}

To log feedback asynchronously, Gantry also offers a simple smaller POST /v1/chat/completions/feedback. To use this, keep track of the id field returned in the OpenAI API response. Let’s say we have the following OpenAI response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  ...
}

After logging this to Gantry, the user submits some feedback Helpful message for this particular message. To log this to Gantry, make a POST request to /v1/chat/completions/feedback with the following body:

{
    "application": "CoolLLMApp",
    "chat_id": "chatcmpl-123",
  "feedback": {"user_feedback": "Helpful message!"}
}

This feedback will be joined together with the previously logged OpenAI call.

API Reference

Authentication

To authenticate the following requests, pass your Gantry API Key into the request as the X-Gantry-Api-Key header.

API Docs

POST /v1/chat/completions

Request:

{
  // Required
  "application": // <string> Gantry application for logging
  "request": {// the OpenAI request, schema from the OpenAI API documentation}
    "request_attributes": // any additional info you want to associate with the request
  "response": { // the OpenAI API response, schema from the OpenAI API documentation}
    "response_attributes": // any additional info you want to associate with the response
  // Optional
    "feedback": // <dict> a dict representing the feedback on the message
    "selected_choice_index": // what choice was selected from the OpenAI API response. Defaults to 0
    "session_id": // an id to track each user session
    "tags": {}, // Any tags you'd like to apply to the record e.g. {"region": "us-west-2"}
} 

Response: HTTP Status Code:

  • 200: Response OK
  • 400: Malformed Request

POST /v1/chat/completions/feedback

{
  "application": // <string> Gantry application for logging
    "chat_id": // <string> <open_ai response id>,
  "feedback": // <dict> a dict representing the user feedback on the message
}

Response HTTP Status Code:

  • 200: Response OK
  • 400: Malformed Request