Test Runs

Overview

Test Runs allow you to systematically evaluate your prompts against entire datasets to measure performance, quality, and consistency. Run automated tests to compare different prompt versions, models, and configurations before deploying to production.

Test run detail page showing results and metrics

Creating a Test Run

  1. Click + New Test
  2. Select LLM Provider and Model
  3. Select your Prompt
  4. Select your Dataset
  5. Click Add Evaluator to configure evaluation
  6. For each evaluator, choose an output column if needed
  7. Toggle Quick Run to test 20 items (optional)
  8. Click Advanced Settings to adjust temperature and max tokens (optional)
  9. Click Run Test

Monitoring Progress

  • Test runs start in Preparing status
  • Status changes to Running with live progress updates
  • Results update in real-time
  • Click a running test to view individual results as they complete

Viewing Results

  1. Click a completed test run
  2. Review the success rate and metrics
  3. Scroll through test items to see individual results
  4. Click any item to view full details, evaluation scores, and reasoning
  5. Use Previous and Next to navigate items

Filtering Results

Use the Filter dropdown to show:

  • All - All test items
  • Completed - Successfully completed items
  • Failed - Items that failed evaluation
  • Error - Items with inference errors
  • Pending - Items not yet processed
  • Running - Items currently being processed

Exporting Results

  1. Find a completed test run
  2. Click the Download icon
  3. CSV file downloads with all results, scores, and reasoning

Best Practices

  • Start with Quick Run for rapid iteration
  • Use multiple evaluators for comprehensive assessment
  • Maintain representative datasets
  • Track success rates over time
  • Review failed items to understand patterns
  • Set realistic inference settings
  • Export results for deeper analysis