Mcp
Benchmarks MCP
Model Context Protocol reference for benchmarks.do - Benchmarks for evaluating models, workflows, or agents
Benchmarks MCP
Benchmarks for evaluating models, workflows, or agents
Overview
The Model Context Protocol (MCP) provides AI models with direct access to benchmarks.do through a standardized interface.
Installation
pnpm add @modelcontextprotocol/sdkConfiguration
Add to your MCP server configuration:
{
"mcpServers": {
"benchmarks": {
"command": "npx",
"args": ["-y", "@dotdo/mcp-server"],
"env": {
"DO_API_KEY": "your-api-key"
}
}
}
}Tools
benchmarks/invoke
Main tool for benchmarks.do operations.
{
"name": "benchmarks/invoke",
"description": "Benchmarks for evaluating models, workflows, or agents",
"inputSchema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"description": "Operation to perform"
},
"parameters": {
"type": "object",
"description": "Operation parameters"
}
},
"required": ["operation"]
}
}Usage in AI Models
Claude Desktop
// ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"benchmarks": {
"command": "npx",
"args": ["-y", "@dotdo/mcp-server", "--tool=benchmarks"],
"env": {
"DO_API_KEY": "undefined"
}
}
}
}OpenAI GPTs
# Custom GPT configuration
tools:
- type: mcp
server: benchmarks
operations:
- invoke
- query
- executeCustom Integration
import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'
const transport = new StdioClientTransport({
command: 'npx',
args: ['-y', '@dotdo/mcp-server', '--tool=benchmarks'],
})
const client = new Client(
{
name: 'benchmarks-client',
version: '1.0.0',
},
{
capabilities: {},
}
)
await client.connect(transport)
// Call tool
const result = await client.callTool({
name: 'benchmarks/invoke',
arguments: {
operation: 'benchmarks',
parameters: {},
},
})Tool Definitions
Available Tools
{
"tools": [
{
"name": "benchmarks/invoke",
"description": "Invoke benchmarks.do",
"inputSchema": {
/* ... */
}
},
{
"name": "benchmarks/query",
"description": "Query benchmarks.do resources",
"inputSchema": {
/* ... */
}
},
{
"name": "benchmarks/status",
"description": "Check benchmarks.do status",
"inputSchema": {
/* ... */
}
}
]
}Resources
Available Resources
{
"resources": [
{
"uri": "benchmarks://config",
"name": "Benchmarks Configuration",
"mimeType": "application/json"
},
{
"uri": "benchmarks://docs",
"name": "Benchmarks Documentation",
"mimeType": "text/markdown"
}
]
}Prompts
Pre-configured Prompts
{
"prompts": [
{
"name": "benchmarks-quick-start",
"description": "Quick start guide for benchmarks.do",
"arguments": []
},
{
"name": "benchmarks-best-practices",
"description": "Best practices for benchmarks.do",
"arguments": []
}
]
}Examples
Basic Usage
// AI model calls tool via MCP
mcp call benchmarks/runWith Parameters
// Call with parameters
await mcp.callTool('benchmarks/invoke', {
operation: 'process',
parameters: {
// Operation-specific parameters
},
options: {
timeout: 30000,
},
})Error Handling
try {
const result = await mcp.callTool('benchmarks/invoke', {
operation: 'process',
})
return result
} catch (error) {
if (error.code === 'TOOL_NOT_FOUND') {
console.error('Benchmarks tool not available')
} else {
throw error
}
}AI Integration Patterns
Agentic Workflows
// AI agent uses benchmarks.do in workflow
const workflow = {
steps: [
{
tool: 'benchmarks/invoke',
operation: 'analyze',
input: 'user-data',
},
{
tool: 'benchmarks/process',
operation: 'transform',
input: 'analysis-result',
},
],
}Chain of Thought
AI models can reason about benchmarks.do operations:
User: "I need to process this data"
AI: "I'll use the benchmarks tool to:
1. Validate the data format
2. Process it through benchmarks.do
3. Return the results
Let me start..."
[Calls: mcp call benchmarks/run]Server Implementation
Custom MCP Server
import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
const server = new Server(
{
name: 'benchmarks-server',
version: '1.0.0',
},
{
capabilities: {
tools: {},
resources: {},
prompts: {},
},
}
)
// Register tool
server.setRequestHandler('tools/call', async (request) => {
if (request.params.name === 'benchmarks/invoke') {
// Handle benchmarks.do operation
return {
content: [
{
type: 'text',
text: JSON.stringify(result),
},
],
}
}
})
const transport = new StdioServerTransport()
await server.connect(transport)Best Practices
- Tool Design - Keep tools focused and single-purpose
- Error Messages - Provide clear, actionable errors
- Documentation - Include examples in tool descriptions
- Rate Limiting - Implement appropriate limits
- Security - Validate all inputs from AI models
- Monitoring - Track tool usage and errors