Token Usage Dashboard
Track and analyze token consumption across your AI model deployments.
Overview
The Token Usage Dashboard provides comprehensive analytics on your AI model token consumption. Track prompt and completion tokens across time periods, monitor usage by model and endpoint, and export data for billing reconciliation or capacity planning.
Key Features
- Usage Tracking - Monitor prompt, completion, and total token counts
- Time-Based Analysis - View usage by hour, day, week, or month
- Model Breakdown - See which models consume the most tokens
- Cost Estimation - Approximate costs vs pay-per-token APIs
- Export Capabilities - Download reports as CSV or JSON
- Real-Time Updates - Live counters during active inference
Understanding Token Metrics
| Metric | Description |
|---|---|
| Prompt Tokens | Tokens in your input messages (you send) |
| Completion Tokens | Tokens generated by the model (model sends) |
| Total Tokens | Prompt + Completion combined |
| Requests | Number of API calls made |
| Avg Tokens/Request | Average tokens per inference call |
Dashboard Sections
Summary Cards
Quick overview showing:
- Total tokens used in period
- Prompt vs completion breakdown
- Number of requests
- Comparison to previous period
Time Series Chart
Interactive chart showing token usage over time:
- Total, prompt, and completion lines
- Hover for exact values
- Zoom and pan controls
- Multiple time granularities
Model Breakdown
Pie chart showing usage distribution by model:
- Percentage of total tokens per model
- Click to filter by specific model
- Identify high-usage models
Time Range Selection
Preset Ranges
- Today - Current day, hourly breakdown
- Yesterday - Previous 24 hours
- Last 7 Days - Weekly view, daily breakdown
- Last 30 Days - Monthly view
- This Month - Current calendar month
- Custom Range - Select start and end dates
Granularity Options
| Range | Default | Options |
|---|---|---|
| Today | Hourly | 15min, Hourly |
| 7 Days | Daily | Hourly, Daily |
| 30 Days | Daily | Daily, Weekly |
| 90 Days | Weekly | Daily, Weekly, Monthly |
Cost Estimation
While Packet.ai charges by GPU-hour (not tokens), the dashboard provides comparative cost estimates to show your savings:
Comparison Example
Cost Comparison (if using pay-per-token APIs)
─────────────────────────────────────────────
OpenAI GPT-4 equivalent: $1,254.38
Claude 3.5 Sonnet equiv: $876.45
Packet.ai GPU-hour cost: $32.40
─────────────────────────────────────────────
Savings this period: $1,221.98 (97.4%)Token Counting
How Tokens Are Counted
| Content | Approximate Tokens |
|---|---|
| 1 word | ~1.3 tokens |
| 1 sentence | ~15-20 tokens |
| 1 paragraph | ~100-150 tokens |
| 1 page (500 words) | ~650 tokens |
| Code (1 function) | ~50-200 tokens |
Estimating Before Requests
# Python example using tiktoken
import tiktoken
def estimate_tokens(text, model="gpt-4"):
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# For LLaMA-style models, use cl100k_base
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode("Your prompt here"))Data Export
Export Formats
- CSV - Spreadsheet-compatible
- JSON - API/programmatic use
CSV Format
timestamp,model,prompt_tokens,completion_tokens,total_tokens,requests
2025-01-17T00:00:00Z,meta-llama/Llama-3.1-70B-Instruct,45678,98765,144443,234
2025-01-17T01:00:00Z,meta-llama/Llama-3.1-70B-Instruct,52341,112456,164797,267JSON Format
{
"period": {
"start": "2025-01-17T00:00:00Z",
"end": "2025-01-17T23:59:59Z"
},
"summary": {
"total_tokens": 12543890,
"prompt_tokens": 4231456,
"completion_tokens": 8312434,
"requests": 45678
},
"by_model": [
{
"model": "meta-llama/Llama-3.1-70B-Instruct",
"tokens": 5644750,
"percentage": 45.0
}
]
}Usage Alerts
Set notifications for usage thresholds:
| Alert Type | Example | Use Case |
|---|---|---|
| Daily Limit | > 1M tokens/day | Budget control |
| Hourly Spike | > 200% of average | Detect anomalies |
| Low Usage | < 10K tokens/day | Monitor for issues |
| Per-Model | Model X > 500K/day | Track specific workloads |
Best Practices
Reducing Token Usage
- Concise Prompts - Remove unnecessary words
- Reuse System Prompts - Cache across conversations
- Implement Caching - Cache common responses
- Summarize History - Compress conversation history
- Set max_tokens - Limit response length appropriately
Monitoring Efficiency
| Metric | Target | Meaning |
|---|---|---|
| Completion/Prompt Ratio | 1.5-3x | Normal response expansion |
| Avg Tokens/Request | < 1000 | Efficient requests |
| Empty Responses | < 1% | Healthy generation |
API Reference
Get Usage Summary
GET /api/usage/summary?from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEYGet Hourly Breakdown
GET /api/usage/hourly?date=2025-01-17
Authorization: Bearer YOUR_API_KEYExport Usage Data
GET /api/usage/export?format=csv&from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEYNeed Help?
Contact us at support@packet.ai
