Model Context Protocol reference for benchmarks.do - Benchmarks for evaluating models, workflows, or agents

Benchmarks MCP

Benchmarks for evaluating models, workflows, or agents

Overview

The Model Context Protocol (MCP) provides AI models with direct access to benchmarks.do through a standardized interface.

Installation

pnpm add @modelcontextprotocol/sdk

Configuration

Add to your MCP server configuration:

{
  "mcpServers": {
    "benchmarks": {
      "command": "npx",
      "args": ["-y", "@dotdo/mcp-server"],
      "env": {
        "DO_API_KEY": "your-api-key"
      }
    }
  }
}

Tools

benchmarks/invoke

Main tool for benchmarks.do operations.

{
  "name": "benchmarks/invoke",
  "description": "Benchmarks for evaluating models, workflows, or agents",
  "inputSchema": {
    "type": "object",
    "properties": {
      "operation": {
        "type": "string",
        "description": "Operation to perform"
      },
      "parameters": {
        "type": "object",
        "description": "Operation parameters"
      }
    },
    "required": ["operation"]
  }
}

Usage in AI Models

Claude Desktop

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "benchmarks": {
      "command": "npx",
      "args": ["-y", "@dotdo/mcp-server", "--tool=benchmarks"],
      "env": {
        "DO_API_KEY": "undefined"
      }
    }
  }
}

OpenAI GPTs

# Custom GPT configuration
tools:
  - type: mcp
    server: benchmarks
    operations:
      - invoke
      - query
      - execute

Custom Integration

import { Client } from '@modelcontextprotocol/sdk/client/index.js'
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js'

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['-y', '@dotdo/mcp-server', '--tool=benchmarks'],
})

const client = new Client(
  {
    name: 'benchmarks-client',
    version: '1.0.0',
  },
  {
    capabilities: {},
  }
)

await client.connect(transport)

// Call tool
const result = await client.callTool({
  name: 'benchmarks/invoke',
  arguments: {
    operation: 'benchmarks',
    parameters: {},
  },
})

Tool Definitions

Available Tools

{
  "tools": [
    {
      "name": "benchmarks/invoke",
      "description": "Invoke benchmarks.do",
      "inputSchema": {
        /* ... */
      }
    },
    {
      "name": "benchmarks/query",
      "description": "Query benchmarks.do resources",
      "inputSchema": {
        /* ... */
      }
    },
    {
      "name": "benchmarks/status",
      "description": "Check benchmarks.do status",
      "inputSchema": {
        /* ... */
      }
    }
  ]
}

Resources

Available Resources

{
  "resources": [
    {
      "uri": "benchmarks://config",
      "name": "Benchmarks Configuration",
      "mimeType": "application/json"
    },
    {
      "uri": "benchmarks://docs",
      "name": "Benchmarks Documentation",
      "mimeType": "text/markdown"
    }
  ]
}

Prompts

Pre-configured Prompts

{
  "prompts": [
    {
      "name": "benchmarks-quick-start",
      "description": "Quick start guide for benchmarks.do",
      "arguments": []
    },
    {
      "name": "benchmarks-best-practices",
      "description": "Best practices for benchmarks.do",
      "arguments": []
    }
  ]
}

Examples

Basic Usage

// AI model calls tool via MCP
mcp call benchmarks/run

With Parameters

// Call with parameters
await mcp.callTool('benchmarks/invoke', {
  operation: 'process',
  parameters: {
    // Operation-specific parameters
  },
  options: {
    timeout: 30000,
  },
})

Error Handling

try {
  const result = await mcp.callTool('benchmarks/invoke', {
    operation: 'process',
  })
  return result
} catch (error) {
  if (error.code === 'TOOL_NOT_FOUND') {
    console.error('Benchmarks tool not available')
  } else {
    throw error
  }
}

AI Integration Patterns

Agentic Workflows

// AI agent uses benchmarks.do in workflow
const workflow = {
  steps: [
    {
      tool: 'benchmarks/invoke',
      operation: 'analyze',
      input: 'user-data',
    },
    {
      tool: 'benchmarks/process',
      operation: 'transform',
      input: 'analysis-result',
    },
  ],
}

Chain of Thought

AI models can reason about benchmarks.do operations:

User: "I need to process this data"

AI: "I'll use the benchmarks tool to:
1. Validate the data format
2. Process it through benchmarks.do
3. Return the results

Let me start..."

[Calls: mcp call benchmarks/run]

Server Implementation

Custom MCP Server

import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'

const server = new Server(
  {
    name: 'benchmarks-server',
    version: '1.0.0',
  },
  {
    capabilities: {
      tools: {},
      resources: {},
      prompts: {},
    },
  }
)

// Register tool
server.setRequestHandler('tools/call', async (request) => {
  if (request.params.name === 'benchmarks/invoke') {
    // Handle benchmarks.do operation
    return {
      content: [
        {
          type: 'text',
          text: JSON.stringify(result),
        },
      ],
    }
  }
})

const transport = new StdioServerTransport()
await server.connect(transport)

Best Practices

Tool Design - Keep tools focused and single-purpose
Error Messages - Provide clear, actionable errors
Documentation - Include examples in tool descriptions
Rate Limiting - Implement appropriate limits
Security - Validate all inputs from AI models
Monitoring - Track tool usage and errors

Benchmarks MCP

On this page