dpm-finder is a tool for Grafana Professional Services that analyzes Prometheus metrics to identify which ones drive high Data Points per Minute (DPM). It provides a per-label breakdown to assist in optimizing Grafana Cloud costs.
$ npx skills add https://github.com/grafana/skills --skill dpm-finderdpm-finder is a Python script that retrieves all metrics from a Prometheus instance and calculates their data points per minute (DPM) rate using PromQL queries. It automatically filters out histogram/summary components, Grafana internal metrics, and metrics with aggregation rules, then identifies metrics exceeding a configurable DPM threshold. The tool operates in two modes: one-time execution that outputs results to CSV, JSON, text, or Prometheus exposition formats, or as a Prometheus exporter server exposing live DPM metrics. It includes detailed logging, Docker and Docker Compose support, and is designed specifically for Grafana Professional Services to help teams diagnose and control high-volume metric ingestion.
Install using the command: `$ npx skills add https://github.com/grafana/skills --skill dpm-finder`
Optimize costs in Grafana Cloud by analyzing DPM
Break down metric performances by labels
Identify critical metrics for operational efficiency
$ npx skills add https://github.com/grafana/skills --skill dpm-findergit clone https://github.com/grafana-ps/dpm-finderCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Analyze the Prometheus metrics for [COMPANY]'s [INDUSTRY] workload in Grafana Cloud to identify which metrics are driving the highest Data Points per Minute (DPM). Provide a breakdown by label names and values, and suggest optimizations to reduce costs. Use the following query as a starting point: `sum by (job, __name__) (rate(prometheus_tsdb_head_samples_appended_total[5m]))`. Focus on metrics with DPM > 1000.
# High-DPM Metrics Analysis for Acme Corp's E-commerce Platform ## Top 5 Metrics by DPM | Metric Name | DPM | Labels (Top 3) | Suggested Action | |-------------|-----|----------------|------------------| | `http_requests_total` | 12,450 | `path=/api/checkout`, `status=200`, `method=POST` | Review checkout endpoint logging frequency | | `kafka_messages_consumed` | 8,200 | `topic=orders`, `partition=0`, `consumer_group=orders-service` | Reduce Kafka consumer logging granularity | | `container_memory_usage_bytes` | 6,800 | `namespace=production`, `pod=checkout-v2-abc123`, `container=checkout` | Optimize memory metric collection interval | | `http_request_duration_seconds_sum` | 5,300 | `path=/api/products`, `status=2xx`, `method=GET` | Consider sampling high-traffic endpoints | | `prometheus_notifications_total` | 4,100 | `rule_group=alerting-rules`, `alertname=HighLatency` | Review alerting rule efficiency | ## Cost Impact Analysis - Current monthly DPM: ~1.2B - Estimated cost reduction potential: 35-40% by implementing suggested optimizations - Top 20 metrics account for 85% of total DPM costs ## Recommended Next Steps 1. **Immediate**: Implement label filtering for `http_requests_total` to exclude `/health` and `/metrics` endpoints 2. **Short-term**: Reduce collection frequency for memory metrics from 15s to 30s during non-peak hours 3. **Long-term**: Evaluate metric cardinality for `kafka_messages_consumed` to identify redundant labels Would you like me to generate specific Prometheus relabeling configurations for any of these metrics?
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan