Model Context Protocol reference for evals.do - Evaluate performance of functions, workflows, and agents

Evals MCP

Evaluate performance of functions, workflows, and agents

Overview

The Model Context Protocol (MCP) provides AI models with direct access to evals.do through a standardized interface.

Installation

pnpm add @modelcontextprotocol/sdk

Configuration

Add to your MCP server configuration:

{
  "mcpServers": {
    "evals": {
      "command": "npx",
      "args": ["-y", "@dotdo/mcp-server"],
      "env": {
        "DO_API_KEY": "your-api-key"
      }
    }
  }
}

Tools

evals/invoke

Main tool for evals.do operations.

{
  "name": "evals/invoke",
  "description": "Evaluate performance of functions, workflows, and agents",
  "inputSchema": {
    "type": "object",
    "properties": {
      "operation": {
        "type": "string",
        "description": "Operation to perform"
      },
      "parameters": {
        "type": "object",
        "description": "Operation parameters"
      }
    },
    "required": ["operation"]
  }
}

Usage in AI Models

Claude Desktop

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "evals": {
      "command": "npx",
      "args": ["-y", "@dotdo/mcp-server", "--tool=evals"],
      "env": {
        "DO_API_KEY": "undefined"
      }
    }
  }
}

OpenAI GPTs

# Custom GPT configuration
tools:
  - type: mcp
    server: evals
    operations:
      - invoke
      - query
      - execute

Custom Integration

import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['-y', '@dotdo/mcp-server', '--tool=evals'],
})

const client = new Client(
  {
    name: 'evals-client',
    version: '1.0.0',
  },
  {
    capabilities: {},
  }
)

await client.connect(transport)

// Call tool
const result = await client.callTool({
  name: 'evals/invoke',
  arguments: {
    operation: 'evals',
    parameters: {},
  },
})

Tool Definitions

Available Tools

{
  "tools": [
    {
      "name": "evals/invoke",
      "description": "Invoke evals.do",
      "inputSchema": {
        /* ... */
      }
    },
    {
      "name": "evals/query",
      "description": "Query evals.do resources",
      "inputSchema": {
        /* ... */
      }
    },
    {
      "name": "evals/status",
      "description": "Check evals.do status",
      "inputSchema": {
        /* ... */
      }
    }
  ]
}

Resources

Available Resources

{
  "resources": [
    {
      "uri": "evals://config",
      "name": "Evals Configuration",
      "mimeType": "application/json"
    },
    {
      "uri": "evals://docs",
      "name": "Evals Documentation",
      "mimeType": "text/markdown"
    }
  ]
}

Prompts

Pre-configured Prompts

{
  "prompts": [
    {
      "name": "evals-quick-start",
      "description": "Quick start guide for evals.do",
      "arguments": []
    },
    {
      "name": "evals-best-practices",
      "description": "Best practices for evals.do",
      "arguments": []
    }
  ]
}

Examples

Basic Usage

// AI model calls tool via MCP
mcp call evals/run

With Parameters

// Call with parameters
await mcp.callTool('evals/invoke', {
  operation: 'process',
  parameters: {
    // Operation-specific parameters
  },
  options: {
    timeout: 30000,
  },
})

Error Handling

try {
  const result = await mcp.callTool('evals/invoke', {
    operation: 'process',
  })
  return result
} catch (error) {
  if (error.code === 'TOOL_NOT_FOUND') {
    console.error('Evals tool not available')
  } else {
    throw error
  }
}

AI Integration Patterns

Agentic Workflows

// AI agent uses evals.do in workflow
const workflow = {
  steps: [
    {
      tool: 'evals/invoke',
      operation: 'analyze',
      input: 'user-data',
    },
    {
      tool: 'evals/process',
      operation: 'transform',
      input: 'analysis-result',
    },
  ],
}

Chain of Thought

AI models can reason about evals.do operations:

User: "I need to process this data"

AI: "I'll use the evals tool to:
1. Validate the data format
2. Process it through evals.do
3. Return the results

Let me start..."

[Calls: mcp call evals/run]

Server Implementation

Custom MCP Server

import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'

const server = new Server(
  {
    name: 'evals-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
      resources: {},
      prompts: {},
    },
  }
)

// Register tool
server.setRequestHandler('tools/call', async (request) => {
  if (request.params.name === 'evals/invoke') {
    // Handle evals.do operation
    return {
      content: [
        {
          type: 'text',
          text: JSON.stringify(result),
        },
      ],
    }
  }
})

const transport = new StdioServerTransport()
await server.connect(transport)

Best Practices

Tool Design - Keep tools focused and single-purpose
Error Messages - Provide clear, actionable errors
Documentation - Include examples in tool descriptions
Rate Limiting - Implement appropriate limits
Security - Validate all inputs from AI models
Monitoring - Track tool usage and errors

Evals MCP

On this page