Mcp
Evals MCP
Model Context Protocol reference for evals.do - Evaluate performance of functions, workflows, and agents
Evals MCP
Evaluate performance of functions, workflows, and agents
Overview
The Model Context Protocol (MCP) provides AI models with direct access to evals.do through a standardized interface.
Installation
pnpm add @modelcontextprotocol/sdkConfiguration
Add to your MCP server configuration:
{
"mcpServers": {
"evals": {
"command": "npx",
"args": ["-y", "@dotdo/mcp-server"],
"env": {
"DO_API_KEY": "your-api-key"
}
}
}
}Tools
evals/invoke
Main tool for evals.do operations.
{
"name": "evals/invoke",
"description": "Evaluate performance of functions, workflows, and agents",
"inputSchema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"description": "Operation to perform"
},
"parameters": {
"type": "object",
"description": "Operation parameters"
}
},
"required": ["operation"]
}
}Usage in AI Models
Claude Desktop
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"evals": {
"command": "npx",
"args": ["-y", "@dotdo/mcp-server", "--tool=evals"],
"env": {
"DO_API_KEY": "undefined"
}
}
}
}OpenAI GPTs
# Custom GPT configuration
tools:
- type: mcp
server: evals
operations:
- invoke
- query
- executeCustom Integration
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'
const transport = new StdioClientTransport({
command: 'npx',
args: ['-y', '@dotdo/mcp-server', '--tool=evals'],
})
const client = new Client(
{
name: 'evals-client',
version: '1.0.0',
},
{
capabilities: {},
}
)
await client.connect(transport)
// Call tool
const result = await client.callTool({
name: 'evals/invoke',
arguments: {
operation: 'evals',
parameters: {},
},
})Tool Definitions
Available Tools
{
"tools": [
{
"name": "evals/invoke",
"description": "Invoke evals.do",
"inputSchema": {
/* ... */
}
},
{
"name": "evals/query",
"description": "Query evals.do resources",
"inputSchema": {
/* ... */
}
},
{
"name": "evals/status",
"description": "Check evals.do status",
"inputSchema": {
/* ... */
}
}
]
}Resources
Available Resources
{
"resources": [
{
"uri": "evals://config",
"name": "Evals Configuration",
"mimeType": "application/json"
},
{
"uri": "evals://docs",
"name": "Evals Documentation",
"mimeType": "text/markdown"
}
]
}Prompts
Pre-configured Prompts
{
"prompts": [
{
"name": "evals-quick-start",
"description": "Quick start guide for evals.do",
"arguments": []
},
{
"name": "evals-best-practices",
"description": "Best practices for evals.do",
"arguments": []
}
]
}Examples
Basic Usage
// AI model calls tool via MCP
mcp call evals/runWith Parameters
// Call with parameters
await mcp.callTool('evals/invoke', {
operation: 'process',
parameters: {
// Operation-specific parameters
},
options: {
timeout: 30000,
},
})Error Handling
try {
const result = await mcp.callTool('evals/invoke', {
operation: 'process',
})
return result
} catch (error) {
if (error.code === 'TOOL_NOT_FOUND') {
console.error('Evals tool not available')
} else {
throw error
}
}AI Integration Patterns
Agentic Workflows
// AI agent uses evals.do in workflow
const workflow = {
steps: [
{
tool: 'evals/invoke',
operation: 'analyze',
input: 'user-data',
},
{
tool: 'evals/process',
operation: 'transform',
input: 'analysis-result',
},
],
}Chain of Thought
AI models can reason about evals.do operations:
User: "I need to process this data"
AI: "I'll use the evals tool to:
1. Validate the data format
2. Process it through evals.do
3. Return the results
Let me start..."
[Calls: mcp call evals/run]Server Implementation
Custom MCP Server
import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
const server = new Server(
{
name: 'evals-server',
version: '1.0.0',
},
{
capabilities: {
tools: {},
resources: {},
prompts: {},
},
}
)
// Register tool
server.setRequestHandler('tools/call', async (request) => {
if (request.params.name === 'evals/invoke') {
// Handle evals.do operation
return {
content: [
{
type: 'text',
text: JSON.stringify(result),
},
],
}
}
})
const transport = new StdioServerTransport()
await server.connect(transport)Best Practices
- Tool Design - Keep tools focused and single-purpose
- Error Messages - Provide clear, actionable errors
- Documentation - Include examples in tool descriptions
- Rate Limiting - Implement appropriate limits
- Security - Validate all inputs from AI models
- Monitoring - Track tool usage and errors