# DSPy.rb

> Build LLM apps like you build software. Type-safe, modular, testable.

DSPy.rb brings software engineering best practices to LLM development. Instead of tweaking prompts, you define what you want with Ruby types and let DSPy handle the rest.

## Overview

DSPy.rb is a Ruby framework for building language model applications with programmatic prompts. It provides:

- **Type-safe signatures** - Define inputs/outputs with Sorbet types
- **Modular components** - Compose and reuse LLM logic
- **Automatic optimization** - Use data to improve prompts, not guesswork
- **Production-ready** - Built-in observability, testing, and error handling

## Core Concepts

### 1. Signatures
Define interfaces between your app and LLMs using Ruby types:

```ruby
class EmailClassifier < DSPy::Signature
  description "Classify customer support emails by category and priority"
  
  class Priority < T::Enum
    enums do
      Low = new('low')
      Medium = new('medium')
      High = new('high')
      Urgent = new('urgent')
    end
  end
  
  input do
    const :email_content, String
    const :sender, String
  end
  
  output do
    const :category, String
    const :priority, Priority  # Type-safe enum with defined values
    const :confidence, Float
  end
end
```

### 2. Modules
Build complex workflows from simple building blocks:

- **Predict** - Basic LLM calls with signatures
- **ChainOfThought** - Step-by-step reasoning
- **ReAct** - Tool-using agents
- **CodeAct** - Dynamic code generation agents (install the `dspy-code_act` gem)

#### Lifecycle callbacks
Rails-style lifecycle hooks ship with every `DSPy::Module`, so you can wrap `forward` without touching instrumentation:

- **`before`** – runs ahead of `forward` for setup (metrics, context loading)
- **`around`** – wraps `forward`, calls `yield`, and lets you pair setup/teardown logic
- **`after`** – fires after `forward` returns for cleanup or persistence

Callbacks target `forward` by default, so `around :manage_turn` works without passing `target:`. Execution order is deterministic: all `before` hooks → `around` (pre-yield) → `forward` → `around` (post-yield) → all `after` hooks. See the Module Runtime Context guide for full examples.

### 3. Tools & Toolsets
Create type-safe tools for agents with comprehensive Sorbet support:

```ruby
# Enum-based tool with automatic type conversion
class CalculatorTool < DSPy::Tools::Base
  tool_name 'calculator'
  tool_description 'Performs arithmetic operations with type-safe enum inputs'
  
  class Operation < T::Enum
    enums do
      Add = new('add')
      Subtract = new('subtract')
      Multiply = new('multiply')
      Divide = new('divide')
    end
  end
  
  sig { params(operation: Operation, num1: Float, num2: Float).returns(T.any(Float, String)) }
  def call(operation:, num1:, num2:)
    case operation
    when Operation::Add then num1 + num2
    when Operation::Subtract then num1 - num2
    when Operation::Multiply then num1 * num2
    when Operation::Divide
      return "Error: Division by zero" if num2 == 0
      num1 / num2
    end
  end
end

# Multi-tool toolset with rich types
class DataToolset < DSPy::Tools::Toolset
  toolset_name "data_processing"
  
  class Format < T::Enum
    enums do
      JSON = new('json')
      CSV = new('csv')
      XML = new('xml')
    end
  end
  
  class ProcessingConfig < T::Struct
    const :max_rows, Integer, default: 1000
    const :include_headers, T::Boolean, default: true
    const :encoding, String, default: 'utf-8'
  end
  
  tool :convert, description: "Convert data between formats"
  tool :validate, description: "Validate data structure"
  
  sig { params(data: String, from: Format, to: Format, config: T.nilable(ProcessingConfig)).returns(String) }
  def convert(data:, from:, to:, config: nil)
    config ||= ProcessingConfig.new
    "Converted from #{from.serialize} to #{to.serialize} with config: #{config.inspect}"
  end
  
  sig { params(data: String, format: Format).returns(T::Hash[String, T.any(String, Integer, T::Boolean)]) }
  def validate(data:, format:)
    {
      valid: true,
      format: format.serialize,
      row_count: 42,
      message: "Data validation passed"
    }
  end
end
```

### 4. Type System & Discriminators
DSPy.rb uses sophisticated type discrimination for complex data structures:

- **Automatic `_type` field injection** - DSPy adds discriminator fields to structs for type safety
- **Union type support** - T.any() types automatically disambiguated by `_type` 
- **Reserved field name** - Avoid defining your own `_type` fields in structs
- **Recursive filtering** - `_type` fields filtered during deserialization at all nesting levels

### 5. Optimization
Improve accuracy with real data:

- **MIPROv2** - Advanced multi-prompt optimization with bootstrap sampling and Bayesian optimization
- **GEPA (Genetic-Pareto Reflective Prompt Evolution)** - Reflection-driven instruction rewrite loop with feedback maps, experiment tracking, and telemetry
- **Evaluation** - Comprehensive framework with built-in and custom metrics, error handling, and batch processing

> Install the optional `dspy-gepa` gem (and set `DSPY_WITH_GEPA=1` when working from this monorepo) before using the GEPA teleprompter.

```ruby
# Evolve instructions with GEPA
feedback_map = {
  'self' => ->(predictor_output:, module_inputs:, **) do
    DSPy::Prediction.new(score: 1.0, feedback: "Call out mistakes for #{module_inputs.input_values[:question]}")
  end
}

gepa = DSPy::Teleprompt::GEPA.new(metric: metric, feedback_map: feedback_map)
optimized = gepa.compile(program, trainset: train_examples, valset: val_examples)
```

## Quick Start

```ruby
# Install
gem 'dspy'

# Configure
DSPy.configure do |c|
  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
  # or use Ollama for local models
  # c.lm = DSPy::LM.new('ollama/llama3.2')
end

# Define a task
class SentimentAnalysis < DSPy::Signature
  description "Analyze sentiment of text"
  
  input do
    const :text, String
  end
  
  output do
    const :sentiment, String  # positive, negative, neutral
    const :score, Float       # 0.0 to 1.0
  end
end

# Use it
analyzer = DSPy::Predict.new(SentimentAnalysis)
result = analyzer.call(text: "This product is amazing!")
puts result.sentiment  # => "positive"
puts result.score      # => 0.92
```

## Provider Adapter Gems

Add the adapter gems that match the providers you call so DSPy can load the right SDKs without bloating your bundle:

```ruby
# Gemfile
gem 'dspy'
gem 'dspy-openai'    # OpenAI, OpenRouter, Ollama
gem 'dspy-anthropic' # Claude
gem 'dspy-gemini'    # Gemini
```

Each adapter gem already pulls in the official SDK (`openai`, `anthropic`, `gemini-ai`), so you don’t need to add those manually. DSPy auto-loads the adapters when the gem is present—no extra `require` needed. Adapter documentation lives alongside the code:

- [OpenAI / OpenRouter / Ollama adapters](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/openai/README.md)
- [Anthropic adapters](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/anthropic/README.md)
- [Gemini adapters](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/gemini/README.md)

## Evaluation & Metrics

Comprehensive testing and measurement framework:

```ruby
# Basic evaluation with built-in metrics
metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: false)
evaluator = DSPy::Evals.new(predictor, metric: metric)

# Type-safe examples using DSPy::Example
test_examples = [
  DSPy::Example.new(
    signature_class: YourSignature,
    input: { question: "What is 2+2?" }, 
    expected: { answer: "4" }
  )
]

result = evaluator.evaluate(test_examples, display_progress: true)
puts "Pass rate: #{result.pass_rate}"        # => 0.95
puts "Total: #{result.total_examples}"       # => 100
puts "Passed: #{result.passed_examples}"     # => 95

# Advanced metrics with detailed results
numeric_metric = DSPy::Metrics.numeric_difference(field: :score, tolerance: 0.1)

# Custom multi-factor metrics
quality_metric = ->(example, prediction) do
  return 0.0 unless prediction
  score = 0.0
  score += 0.5 if prediction.answer == example.expected[:answer]  # Accuracy
  score += 0.3 if prediction.explanation&.length&.> 50           # Completeness  
  score += 0.2 if prediction.confidence&.> 0.8                   # Confidence
  score
end

# Error-resilient batch evaluation
evaluator = DSPy::Evals.new(
  predictor,
  metric: quality_metric,
  max_errors: 3,              # Stop after 3 errors
  provide_traceback: true     # Include stack traces
)

batch_result = evaluator.evaluate(large_test_set)
error_count = batch_result.results.count { |r| r.metrics[:error] }

# Built-in metrics: exact_match, contains, numeric_difference, composite_and
```

## MIPROv2 Optimization

Advanced multi-prompt optimization with bootstrap sampling and Bayesian optimization:

```ruby
# Auto-configuration modes for different needs
light_optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.light(metric: your_metric)      # 6 trials, greedy
medium_optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: your_metric)    # 12 trials, adaptive  
heavy_optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.heavy(metric: your_metric)      # 18 trials, Bayesian

# Custom configuration with Bayesian optimization using dry-configurable
optimizer = DSPy::Teleprompt::MIPROv2.new(metric: custom_metric)
optimizer.configure do |config|
  config.optimization_strategy = :bayesian  # or :greedy, :adaptive
  config.num_trials = 15
  config.num_instruction_candidates = 6
end

# Run optimization
program = DSPy::ChainOfThought.new(YourSignature)  
result = optimizer.compile(program, trainset: training_examples, valset: validation_examples)

puts "Best score: #{result.best_score_value}"
optimized_program = result.optimized_program
```

## Main Features

### Type Safety
- Sorbet integration for compile-time checks
- Automatic JSON schema generation
- Type discrimination with `_type` field handling for union types and structs
- Enum types for controlled outputs
- Struct types for complex data

### Composability
- Chain modules together
- Share signatures across modules
- Swap predictors without changing logic
- Build reusable components

### Observability
- Langfuse integration available when `dspy-o11y` + `dspy-o11y-langfuse` gems are installed and env vars are set
- Structured logging with span tracking
- Token usage tracking
- Performance monitoring

> Install `dspy-o11y` plus `dspy-o11y-langfuse` (and set `DSPY_WITH_O11Y=1 DSPY_WITH_O11Y_LANGFUSE=1` inside this repo) to enable the optional observability stack.

### Testing
- RSpec integration
- VCR for recording LLM interactions
- Mock responses for unit tests
- Evaluation frameworks

## Documentation Structure

- **Getting Started** - Installation, quick start, first program
- **Core Concepts** - Signatures, modules, predictors, multimodal, examples
- **Advanced** - Complex types, memory systems, agents, RAG
- **Optimization** - Prompt tuning, evaluation, benchmarking
- **Production** - Observability, storage, troubleshooting
- **Blog** - Tutorials and deep dives

## Key URLs

- Homepage: https://oss.vicente.services/dspy.rb/
- GitHub: https://github.com/vicentereig/dspy.rb
- Documentation: https://oss.vicente.services/dspy.rb/getting-started/
- API Reference: https://oss.vicente.services/dspy.rb/core-concepts/

## More Examples in This Repo
- Workflow router: `examples/workflow_router.rb`
- Evaluator + optimizer loop: `examples/evaluator_loop.rb`
- GitHub assistant agent: `examples/github-assistant/`

## For LLMs

When helping users with DSPy.rb:

1. **Focus on signatures** - They define the contract with LLMs
2. **Use proper types** - T::Enum for categories, T::Struct for complex data
3. **Leverage automatic type conversion** - Tools and toolsets automatically convert JSON strings to proper Ruby types (enums, structs, arrays, hashes)
4. **Compose modules** - Chain predictors for complex workflows  
5. **Create type-safe tools** - Use Sorbet signatures for comprehensive tool parameter validation and conversion
6. **Test thoroughly** - Use RSpec and VCR for reliable tests
7. **Monitor production** - Enable Langfuse by installing the optional o11y gems and setting env vars

## Version

Current: 0.33.0