cc-relay

🥈Silver

CC-Relay is a fast LLM API gateway written in Go. It routes Claude Code requests to multiple Anthropic-compatible providers. Operations teams use it to optimize LLM API calls, reduce costs, and improve reliability. It connects to various LLM providers like Anthropic, OpenAI, and Mistral.

7770Updated 2mo ago

Intermediate30min to implementautomation

Saves ~180 min per use

Quick InstallView Source

git clone https://github.com/omarluq/cc-relay.git

Works with:

Claude

Overview

About This Skill

CC-Relay is a fast LLM API gateway written in Go that solves the single-provider limitation of Claude Code by routing requests across multiple Anthropic-compatible providers. It enables operations teams to pool rate limits across multiple API keys, automatically route simpler tasks to lighter and cheaper models, and implement failover to prevent service interruption. The gateway supports Anthropic, AWS Bedrock, Azure, Google Vertex, and Anthropic-compatible services like MiniMax, Z.AI, and Ollama. Teams use CC-Relay to reduce LLM API costs, improve reliability through automatic provider failover, and integrate both enterprise and personal API credentials in a single unified gateway.

How to Use

[{"step":"Install and configure cc-relay","description":"Download cc-relay from GitHub and set up provider credentials in the config file (e.g., `~/.cc-relay/config.yaml`). Add your Anthropic, OpenAI, and Mistral API keys with appropriate rate limits.","tips":"Use environment variables for sensitive keys. Test with `--dry-run` flag first to verify provider connectivity."},{"step":"Define your routing criteria","description":"Create a routing profile in cc-relay for your specific use case (e.g., `marketing-content`, `code-review`, or `customer-support`). Specify quality thresholds, cost limits, and latency tolerances.","tips":"Start with conservative thresholds (e.g., max latency 3s, min quality 8/10) and adjust based on initial results."},{"step":"Integrate with your workflow","description":"Replace direct Anthropic API calls in your scripts/tools with cc-relay endpoints (e.g., `http://localhost:8080/v1/messages`). Use the `--provider` flag to force a specific provider for testing.","tips":"Use cc-relay's `--stats` endpoint to monitor performance in real-time. Set up Grafana dashboards for long-term tracking."},{"step":"Optimize and scale","description":"Analyze cc-relay logs and provider metrics weekly. Adjust routing rules based on actual performance data. Consider adding new providers (e.g., Google Vertex AI) for redundancy.","tips":"Use cc-relay's `--benchmark` mode to compare providers under load. Implement circuit breakers for providers with >5% failure rates."}]

Use Cases

Pool rate limits across multiple Anthropic API keys to increase throughput

Route simple tasks to cheaper models while reserving Claude for complex requests

Implement automatic failover between providers to prevent downtime

Integrate AWS Bedrock, Azure, or Google Vertex alongside personal API keys

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/omarluq/cc-relay

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use cc-relay to route [TASK_DESCRIPTION] to the most cost-effective Anthropic-compatible provider. Compare response quality, latency, and cost across providers like Anthropic, OpenAI, and Mistral. Return a summary with provider recommendations and estimated savings compared to direct Anthropic API calls. Include the raw response from the selected provider.

Example Output

### CC-Relay Routing Report: Marketing Content Generation
**Task:** Generate 10 variations of a blog post introduction for a SaaS product targeting HR managers, with a focus on employee engagement metrics.

**Provider Comparison (Tested 50 requests):
1. **Anthropic Claude 3.5 Sonnet**
   - Avg Latency: 2.1s
   - Cost per 1000 tokens: $3.00
   - Quality Score (1-10): 9.2
   - Success Rate: 98%

2. **OpenAI o1-mini**
   - Avg Latency: 1.8s
   - Cost per 1000 tokens: $1.10
   - Quality Score: 8.7
   - Success Rate: 99%

3. **Mistral Large**
   - Avg Latency: 1.5s
   - Cost per 1000 tokens: $0.45
   - Quality Score: 8.5
   - Success Rate: 97%

**Recommendation:** Route all requests to **Mistral Large** for this task. While quality is slightly lower than Anthropic, the 85% cost reduction ($3.00 → $0.45 per 1K tokens) justifies the trade-off for marketing content generation. The 1.5s latency is acceptable for this use case.

**Sample Output (Mistral Large):**
```markdown
**Variation 1:**
"Unlock the power of employee engagement with [Product Name]'s AI-driven analytics. Our platform transforms raw HR data into actionable insights, helping you identify top performers and address retention risks before they escalate."

**Variation 2:**
"Boost team productivity by 34% with [Product Name]'s real-time engagement dashboards. Designed for HR leaders who need to move fast without sacrificing accuracy, our solution provides the clarity you need to make data-backed decisions."
```

**Estimated Monthly Savings:** $1,245 (assuming 500K tokens/month for this task)
**Next Steps:** Monitor Mistral's quality consistency over 1 week before full migration. Set up cc-relay alerts for latency spikes >2s or quality drops below 8.0.

Apply to these tools

Browse all tools

Mistral AI

Open-weight LLMs for enterprise AI deployment

Ollama

Run large language models locally on your machine

OpenAI

Pioneering accessible, high-performance AI models

Relay

Access business banking with customizable permissions, real-time notifications, and expense tracking.

Respell

Agentic AI Workflow platform

Notion

Connected workspace for docs, wikis, and projects

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Pool rate limits across multiple Anthropic API keys to increase throughput

Route simple tasks to cheaper models while reserving Claude for complex requests

Implement automatic failover between providers to prevent downtime

Integrate AWS Bedrock, Azure, or Google Vertex alongside personal API keys

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/omarluq/cc-relay

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use cc-relay to route [TASK_DESCRIPTION] to the most cost-effective Anthropic-compatible provider. Compare response quality, latency, and cost across providers like Anthropic, OpenAI, and Mistral. Return a summary with provider recommendations and estimated savings compared to direct Anthropic API calls. Include the raw response from the selected provider.

Example Output

### CC-Relay Routing Report: Marketing Content Generation
**Task:** Generate 10 variations of a blog post introduction for a SaaS product targeting HR managers, with a focus on employee engagement metrics.

**Provider Comparison (Tested 50 requests):
1. **Anthropic Claude 3.5 Sonnet**
   - Avg Latency: 2.1s
   - Cost per 1000 tokens: $3.00
   - Quality Score (1-10): 9.2
   - Success Rate: 98%

2. **OpenAI o1-mini**
   - Avg Latency: 1.8s
   - Cost per 1000 tokens: $1.10
   - Quality Score: 8.7
   - Success Rate: 99%

3. **Mistral Large**
   - Avg Latency: 1.5s
   - Cost per 1000 tokens: $0.45
   - Quality Score: 8.5
   - Success Rate: 97%

**Recommendation:** Route all requests to **Mistral Large** for this task. While quality is slightly lower than Anthropic, the 85% cost reduction ($3.00 → $0.45 per 1K tokens) justifies the trade-off for marketing content generation. The 1.5s latency is acceptable for this use case.

**Sample Output (Mistral Large):**
```markdown
**Variation 1:**
"Unlock the power of employee engagement with [Product Name]'s AI-driven analytics. Our platform transforms raw HR data into actionable insights, helping you identify top performers and address retention risks before they escalate."

**Variation 2:**
"Boost team productivity by 34% with [Product Name]'s real-time engagement dashboards. Designed for HR leaders who need to move fast without sacrificing accuracy, our solution provides the clarity you need to make data-backed decisions."
```

**Estimated Monthly Savings:** $1,245 (assuming 500K tokens/month for this task)
**Next Steps:** Monitor Mistral's quality consistency over 1 week before full migration. Set up cc-relay alerts for latency spikes >2s or quality drops below 8.0.

cc-relay

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Mistral AI

Ollama

OpenAI

Relay

Respell

Notion

Compatible MCP servers

context sync

mcp notion server

src to kb

notion mcp

slime

notion

Find the right skills for your stack

cc-relay

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Mistral AI

Ollama

OpenAI

Relay

Respell

Notion

Compatible MCP servers

context sync

mcp notion server

src to kb

notion mcp

slime

notion

Find the right skills for your stack