Evaluators

Overview

Evaluators are automated quality assessment tools that score and validate AI-generated outputs against your quality standards. They can use exact matching, pattern matching, or AI-powered judging.

Evaluator detail page with testing interface

Evaluator Types

System Evaluators

  • Exact Match - Compares outputs character-for-character
  • Contains Match - Checks if expected text appears in output
  • Regex Match - Validates using regular expressions

Custom Evaluators

  • Judge LLM - Uses an AI model to assess quality based on custom criteria
  • Regex Match - Custom regex patterns for your specific validation needs

Creating a Judge LLM Evaluator

  1. Click New Evaluator
  2. Enter a descriptive name
  3. Select Judge LLM as the type
  4. Select an LLM Provider
  5. Choose a model (consider cost-effective models like gpt-4o-mini)
  6. Select an Evaluation Prompt (must have LLM Judge flag enabled)
  7. Click Create Evaluator

Testing an Evaluator

  1. Navigate to the evaluator's detail page
  2. Scroll to the Testing section
  3. Enter sample text in Actual Output
  4. Enter Expected Output (if required)
  5. Click Test
  6. Review the score and reasoning

Using Evaluators

In Test Runs

  1. Create a new test run
  2. Click Add Evaluator
  3. Select the evaluator
  4. Specify the output column from your dataset
  5. Add additional evaluators if needed

In API Inference

Include evaluator IDs in your API request:

{
  "messages": [...],
  "evaluators": ["evaluator-id-1", "evaluator-id-2"]
}

Best Practices

  • Start with system evaluators before creating custom ones
  • Use cost-effective models for Judge LLM evaluators
  • Test before deploying to production
  • Create focused judge prompts with clear criteria
  • Combine multiple evaluators for comprehensive assessment
  • Name descriptively