agtrace

🥈Silver

agtrace provides visibility into AI agent actions, helping operations teams monitor and debug agent behavior. It connects to Claude agents and integrates with existing workflows to track agent activities, improving reliability and performance.

3720Updated 2mo ago

Intermediate30min to implementautomation

Saves ~45 min per use

Quick InstallView Source

git clone https://github.com/lanegrid/agtrace.git

Works with:

Claude

Overview

About This Skill

agtrace is an observability tool that lets you see what your AI coding agent is actually doing between prompts. It displays context window usage, token consumption trends, and live activity traces as your agent executes tasks. The tool works with Claude Code, OpenAI Codex, and Google Gemini with zero configuration—it auto-discovers existing agent logs. Beyond monitoring, agtrace integrates with MCP to give agents access to their own execution history, enabling them to search past sessions, identify errors, and learn from previous runs. Operations teams use agtrace to make informed decisions about session management, detect agent loops, and optimize prompt scoping.

How to Use

["Set up agtrace monitoring for your AI agent by integrating it with your agent's runtime environment. Use the provided SDK or CLI to start tracing agent actions.","Define the scope of analysis by specifying the agent name, time range, and tasks/actions to monitor. Use the prompt template to customize your query.","Run the analysis and review the output. Focus on critical and high-severity issues first, as these have the most significant impact on performance.","Implement the suggested fixes or optimizations in your agent's codebase. Use agtrace's real-time monitoring to validate the changes.","Schedule regular reviews (e.g., weekly) to monitor agent behavior and identify new anomalies. Adjust the monitoring scope as your agent evolves."]

Use Cases

Monitor context window consumption to know when to start a new agent session

Debug agent behavior by viewing token trends and live tool calls in real time

Search agent execution history across past sessions to identify recurring errors

Give agents access to their own logs via MCP for self-improvement

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/lanegrid/agtrace

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use agtrace to analyze the execution logs of my AI agent [AGENT_NAME] from [START_DATE] to [END_DATE]. Identify any anomalies or failures in the agent's behavior, particularly around [SPECIFIC_TASKS_OR_ACTIONS]. Provide a summary of the root causes for each anomaly and suggest concrete fixes or optimizations. Format the output as a prioritized list with severity levels (Critical/High/Medium/Low).

Example Output

## AI Agent Execution Analysis Report (June 1-15, 2024)

**Agent:** `CustomerSupportBot` (v2.3.1)
**Scope:** Order status inquiry handling
**Total Executions:** 1,247
**Critical Issues:** 3 | **High Issues:** 8 | **Medium Issues:** 15

### Critical Issues (Immediate Action Required)

1. **Order Status API Timeout**
   - **Severity:** Critical
   - **Occurrences:** 12 (0.96% failure rate)
   - **Root Cause:** External API `orders.status.v2` returned 503 errors in 68% of cases during peak hours (2-4 PM EST).
   - **Impact:** Customers received "Unable to retrieve order status" errors, leading to 23 escalations to human agents.
   - **Fix:** Implemented retry logic with exponential backoff (max 3 attempts). Added circuit breaker pattern to prevent cascading failures. Deployed on 2024-06-12. **Status:** Resolved (0 failures since deployment).

2. **Database Deadlock**
   - **Severity:** Critical
   - **Occurrences:** 8 (0.64% failure rate)
   - **Root Cause:** Concurrent writes to `user_orders` table during high-traffic periods caused deadlocks. Identified in `postgresql.log` at 2024-06-07 14:23:45.
   - **Impact:** Agent froze for 45 seconds during each deadlock event, causing 100+ users to abandon their sessions.
   - **Fix:** Added database index on `order_id` and implemented application-level locking. Deployed on 2024-06-08. **Status:** Resolved (0 deadlocks since deployment).

### High Issues (Schedule for Next Sprint)

1. **Slow Response Time**
   - **Severity:** High
   - **Occurrences:** 28 (2.25% of executions)
   - **Root Cause:** Agent spent 85% of execution time waiting for `payment.gateway.status` API (avg response time: 4.2s).
   - **Impact:** 95th percentile response time exceeded 6s, violating SLA (max 5s).
   - **Fix:** Cached payment status for 30 seconds to reduce API calls. Implemented parallel processing for status checks. Target deployment: 2024-06-20.

2. **Incorrect Order Status**
   - **Severity:** High
   - **Occurrences:** 15 (1.2% failure rate)
   - **Root Cause:** Race condition in `update_order_status` function caused stale data to be returned in 60% of cases.
   - **Impact:** 15 customers received incorrect status updates, leading to 5 complaints.
   - **Fix:** Added transactional consistency checks and implemented idempotency keys. Target deployment: 2024-06-18.

### Recommendations
- **Monitoring:** Add alerts for API response times >3s and database query times >1s.
- **Testing:** Implement load testing for 2x peak traffic to validate fixes.
- **Documentation:** Update agent runbook with troubleshooting steps for API timeouts and deadlocks.

**Next Steps:**
1. Deploy fixes for High Issues by June 20.
2. Schedule a post-mortem for Critical Issues to prevent recurrence.
3. Review agent architecture for potential improvements in scalability.

Apply to these tools

Browse all tools

One Codex

Metagenomic analysis for microbiome research

IronCalc

IronCalc is a spreadsheet engine and ecosystem

Notion

Connected workspace for docs, wikis, and projects

ServiceNow

Enterprise workflow automation and service management platform

GPT for work

Automate your spreadsheet tasks with AI power

Respell

Agentic AI Workflow platform

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Monitor context window consumption to know when to start a new agent session

Debug agent behavior by viewing token trends and live tool calls in real time

Search agent execution history across past sessions to identify recurring errors

Give agents access to their own logs via MCP for self-improvement

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/lanegrid/agtrace

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use agtrace to analyze the execution logs of my AI agent [AGENT_NAME] from [START_DATE] to [END_DATE]. Identify any anomalies or failures in the agent's behavior, particularly around [SPECIFIC_TASKS_OR_ACTIONS]. Provide a summary of the root causes for each anomaly and suggest concrete fixes or optimizations. Format the output as a prioritized list with severity levels (Critical/High/Medium/Low).

Example Output

## AI Agent Execution Analysis Report (June 1-15, 2024)

**Agent:** `CustomerSupportBot` (v2.3.1)
**Scope:** Order status inquiry handling
**Total Executions:** 1,247
**Critical Issues:** 3 | **High Issues:** 8 | **Medium Issues:** 15

### Critical Issues (Immediate Action Required)

1. **Order Status API Timeout**
   - **Severity:** Critical
   - **Occurrences:** 12 (0.96% failure rate)
   - **Root Cause:** External API `orders.status.v2` returned 503 errors in 68% of cases during peak hours (2-4 PM EST).
   - **Impact:** Customers received "Unable to retrieve order status" errors, leading to 23 escalations to human agents.
   - **Fix:** Implemented retry logic with exponential backoff (max 3 attempts). Added circuit breaker pattern to prevent cascading failures. Deployed on 2024-06-12. **Status:** Resolved (0 failures since deployment).

2. **Database Deadlock**
   - **Severity:** Critical
   - **Occurrences:** 8 (0.64% failure rate)
   - **Root Cause:** Concurrent writes to `user_orders` table during high-traffic periods caused deadlocks. Identified in `postgresql.log` at 2024-06-07 14:23:45.
   - **Impact:** Agent froze for 45 seconds during each deadlock event, causing 100+ users to abandon their sessions.
   - **Fix:** Added database index on `order_id` and implemented application-level locking. Deployed on 2024-06-08. **Status:** Resolved (0 deadlocks since deployment).

### High Issues (Schedule for Next Sprint)

1. **Slow Response Time**
   - **Severity:** High
   - **Occurrences:** 28 (2.25% of executions)
   - **Root Cause:** Agent spent 85% of execution time waiting for `payment.gateway.status` API (avg response time: 4.2s).
   - **Impact:** 95th percentile response time exceeded 6s, violating SLA (max 5s).
   - **Fix:** Cached payment status for 30 seconds to reduce API calls. Implemented parallel processing for status checks. Target deployment: 2024-06-20.

2. **Incorrect Order Status**
   - **Severity:** High
   - **Occurrences:** 15 (1.2% failure rate)
   - **Root Cause:** Race condition in `update_order_status` function caused stale data to be returned in 60% of cases.
   - **Impact:** 15 customers received incorrect status updates, leading to 5 complaints.
   - **Fix:** Added transactional consistency checks and implemented idempotency keys. Target deployment: 2024-06-18.

### Recommendations
- **Monitoring:** Add alerts for API response times >3s and database query times >1s.
- **Testing:** Implement load testing for 2x peak traffic to validate fixes.
- **Documentation:** Update agent runbook with troubleshooting steps for API timeouts and deadlocks.

**Next Steps:**
1. Deploy fixes for High Issues by June 20.
2. Schedule a post-mortem for Critical Issues to prevent recurrence.
3. Review agent architecture for potential improvements in scalability.

agtrace

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

One Codex

IronCalc

Notion

ServiceNow

GPT for work

Respell

Compatible MCP servers

mcp codex keeper

codex

codex

s

s

s

Find the right skills for your stack

agtrace

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

One Codex

IronCalc

Notion

ServiceNow

GPT for work

Respell

Compatible MCP servers

mcp codex keeper

codex

codex

s

s

s

Find the right skills for your stack