Projections
Projections are computed columns used to help interpret unstructured data.
Overview
Projections are functions that take one or more columns from a record as inputs, apply a function to them, and append the result as a new column on the record. They are mainly used for projecting, hence the name, higher dimensional inputs and outputs to values that can monitored more tractably. There are a few projections in defined by default, but if these are not sufficient, you can create your own custom projections. The example below demonstrates using the word_count
built-in projection to understand how many words are in the input to the model.
Projections can act as proxies for possible model quality issues by mapping fields to scalars. They don't slow down inference since they're computed on Gantry’s infrastructure, and they can be iterated on without model redeployment.
As an example, let's walk through the following scenario:
Suppose we are building a machine learning application to classify support tickets and want to detect changes to the text being passed in to the model. We can define summaries that we can monitor by adding Length
and Sentiment
projections. We can then use these projections to set alerts or slice our data. If our application suddenly starts receiving many long support tickets with unusually negative sentiment, the alert will make us aware.
New projections will only be processed for new data. This behavior can be modified by triggering a backfill (click "backfill all projections" in the UI) to run the projection on existing historical data. Records can only be backfilled within 90 days of their initial creation.
Built-in Projections
Projections can be added in the projection sidebar. Click the + button, select the projection to add, and define it with the field(s) on which it should be computed.
Custom Projections
If the projections that are built in to Gantry are not sufficient for your use case, you can define your own. Custom projections are global within an organization. This means that if a custom projection has been defined for one application, you can also use it in a different application.
There are two parts to a custom projection:
- A projection function defined in Python or Go that takes one or more fields from a logged record and returns a single output of any supported type.
- a YAML config file that contains some basic information about how to package and run the projection function.
Are you interested in defining custom projections in other languages? Let us know!
Python
Step 1: Prerequisites
Create a directory for your projection, for example:
mkdir my-custom-projection
Make sure you have the Python SDK & CLI installed.
Step 2: Write the Projection
The example below uses Spacy to detect whether an input contains a proper noun:
import spacy
nlp = spacy.load('en_core_web_sm')
def contains_proper_noun(text):
pos = [token.pos_ for token in nlp(text)]
if 'PROPN' in pos:
return 1.0
return 0.0
Step 3: Configure the Projection
Create a YAML file config.yaml in your projection directory. For supported data types, see the Schema and Data Types page.
version: 1
function_definition:
projection_name: proper_noun_detection
description: Detect proper noun in provided text
entrypoint: custom_projections.py
function_name: contains_proper_noun
output:
type: Float
inputs:
- type: Text
requirements:
- spacy #optionally define the version spacy==3.5.0
- https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.4.1/en_core_web_sm-3.4.1.tar.gz
lambda_definition:
runtime_lang: python
runtime_version: "3.9"
memory_size: 256
The function_definition
section tells Gantry to create a custom projection called proper_noun_detection
using the function contains_proper_noun
found in the custom_projections.py
Python file defined in step 2.
The projection takes in one input text of type Text
and outputs a scalar of type Float
.
Additional Python libraries are required to build and execute this function, specified in requirements
.
Note that the lambda_definition
section is optional. If it is not provided in the config file, Gantry will automatically assign the following defaults:
lambda_definition:
runtime_lang: python # currently only python and go are supported
runtime_version: "3.9"
memory_size: 256 # memory limit in MB
timeout: 5 # execution timeout in seconds
Step 4: [Optional] Install private packages
If the custom projection relies on private packages that can only be accessed locally, they must be installed in the extra_deps
directory before the custom projection is submitted:
cd my-custom-projection
pip install --target ./extra_deps requests --index-url https://your-private-pypi/
Step 5: Submit custom projection
Use the CLI to submit the custom projection definition to Gantry.
GANTRY_API_KEY=$MY_API_KEY gantry-cli projection create --projection-dir my-custom-projection/
The build process including error messages can be monitored from the console. Once the build succeeds, the custom projection function will appear in the Add projection
list and can be added like any other projection.
Go
Step 1: Prerequisites
Create a directory for your projection, for example:
mkdir my-custom-projection
Initialize a new Go module:
go mod init example.com/my-custom-projection
Add the AWS Lambda library as a dependency of your new module:
go get github.com/aws/aws-lambda-go/lambda
Make sure you have the Python SDK & CLI installed. We will use the CLI to create the projection in Gantry.
Step 2: Write the Projection
package main
import (
"context"
"strings"
"github.com/aws/aws-lambda-go/lambda"
)
type RequestEvent struct {
// each element in Events is a list that represents the inputs to this
// function from one record
// e.g. if the projection is defined with inputs field1 and field2
// then Events might look like
// [
// ["record_1_field_1_value", "record_1_field_2_value"],
// ["record_2_field_1_value", "record_2_field_2_value"]
// ]
Events [][]string `json:"events"`
}
type ResultResponse struct {
// Results can be an array of any supported type
// in this example we are returning a bool value per record in RequestEvent.Events
Results []bool `json:"results"`
TotalCount int `json:"total_count"`
ErrorCount int `json:"error_count"`
}
func MyCustomFn(value string) (bool, error) {
containsPuppy := strings.Contains(value, "puppy")
return !containsPuppy, nil
}
func HandleRequest(ctx context.Context, event RequestEvent) (ResultResponse, error) {
res := make([]bool, len(event.Events))
errorCount := 0
for i, ev := range event.Events {
// call your custom logic here
// you can also batch process the events
ev_res, ev_err := MyCustomFn(ev[0])
if ev_err == nil {
res[i] = ev_res
} else {
errorCount += 1
}
}
response := ResultResponse{
Results: res,
TotalCount: len(event.Events),
ErrorCount: errorCount,
}
return response, nil
}
func main() {
lambda.Start(HandleRequest)
}
Step 3: Build the Go projection
GOOS=linux GOARCH=amd64 go build -o main projection.go
Step 4: Configure the Projection
Create a YAML file config.yaml in your projection directory:
version: 1
function_definition:
projection_name: my_projection
inputs:
- type: Text
output:
type: Boolean
lambda_definition:
runtime_lang: go
runtime_version: "1"
handler_file_name: main
memory_size: 256 # optional. defaults to 256MB
timeout: 5 # optional. defaults to 5s
The function_definition
section tells Gantry to create a custom projection called my_projection
using the binary called main
that we just built in step 3.
The projection takes in one input text of type Text
and outputs a value of type Bool
.
Step 5: Submit custom projection
Use the CLI to submit the custom projection definition to Gantry.
GANTRY_API_KEY=$MY_API_KEY gantry-cli projection create --projection-dir my-custom-projection/
The build process including error messages can be monitored from the console. Once the build succeeds, the custom projection function will appear in the Add projection
list and can be added like any other projection.
Updated about 2 months ago