Token Usage Dashboard

Track and analyze token consumption across your AI model deployments.

Overview

The Token Usage Dashboard provides comprehensive analytics on your AI model token consumption. Track prompt and completion tokens across time periods, monitor usage by model and endpoint, and export data for billing reconciliation or capacity planning.

Key Features

Usage Tracking - Monitor prompt, completion, and total token counts
Time-Based Analysis - View usage by hour, day, week, or month
Model Breakdown - See which models consume the most tokens
Cost Estimation - Approximate costs vs pay-per-token APIs
Export Capabilities - Download reports as CSV or JSON
Real-Time Updates - Live counters during active inference

Understanding Token Metrics

Metric	Description
Prompt Tokens	Tokens in your input messages (you send)
Completion Tokens	Tokens generated by the model (model sends)
Total Tokens	Prompt + Completion combined
Requests	Number of API calls made
Avg Tokens/Request	Average tokens per inference call

Dashboard Sections

Summary Cards

Quick overview showing:

Total tokens used in period
Prompt vs completion breakdown
Number of requests
Comparison to previous period

Time Series Chart

Interactive chart showing token usage over time:

Total, prompt, and completion lines
Hover for exact values
Zoom and pan controls
Multiple time granularities

Model Breakdown

Pie chart showing usage distribution by model:

Percentage of total tokens per model
Click to filter by specific model
Identify high-usage models

Time Range Selection

Preset Ranges

Today - Current day, hourly breakdown
Yesterday - Previous 24 hours
Last 7 Days - Weekly view, daily breakdown
Last 30 Days - Monthly view
This Month - Current calendar month
Custom Range - Select start and end dates

Granularity Options

Range	Default	Options
Today	Hourly	15min, Hourly
7 Days	Daily	Hourly, Daily
30 Days	Daily	Daily, Weekly
90 Days	Weekly	Daily, Weekly, Monthly

Cost Estimation

While Packet.ai charges by GPU-hour (not tokens), the dashboard provides comparative cost estimates to show your savings:

Comparison Example

Cost Comparison (if using pay-per-token APIs)
─────────────────────────────────────────────
OpenAI GPT-4 equivalent:    $1,254.38
Claude 3.5 Sonnet equiv:      $876.45
Packet.ai GPU-hour cost:       $32.40
─────────────────────────────────────────────
Savings this period:         $1,221.98 (97.4%)

Token Counting

How Tokens Are Counted

Content	Approximate Tokens
1 word	~1.3 tokens
1 sentence	~15-20 tokens
1 paragraph	~100-150 tokens
1 page (500 words)	~650 tokens
Code (1 function)	~50-200 tokens

Estimating Before Requests

# Python example using tiktoken
import tiktoken

def estimate_tokens(text, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# For LLaMA-style models, use cl100k_base
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode("Your prompt here"))

Data Export

Export Formats

CSV - Spreadsheet-compatible
JSON - API/programmatic use

CSV Format

timestamp,model,prompt_tokens,completion_tokens,total_tokens,requests
2025-01-17T00:00:00Z,meta-llama/Llama-3.1-70B-Instruct,45678,98765,144443,234
2025-01-17T01:00:00Z,meta-llama/Llama-3.1-70B-Instruct,52341,112456,164797,267

JSON Format

{
  "period": {
    "start": "2025-01-17T00:00:00Z",
    "end": "2025-01-17T23:59:59Z"
  },
  "summary": {
    "total_tokens": 12543890,
    "prompt_tokens": 4231456,
    "completion_tokens": 8312434,
    "requests": 45678
  },
  "by_model": [
    {
      "model": "meta-llama/Llama-3.1-70B-Instruct",
      "tokens": 5644750,
      "percentage": 45.0
    }
  ]
}

Usage Alerts

Set notifications for usage thresholds:

Alert Type	Example	Use Case
Daily Limit	> 1M tokens/day	Budget control
Hourly Spike	> 200% of average	Detect anomalies
Low Usage	< 10K tokens/day	Monitor for issues
Per-Model	Model X > 500K/day	Track specific workloads

Best Practices

Reducing Token Usage

Concise Prompts - Remove unnecessary words
Reuse System Prompts - Cache across conversations
Implement Caching - Cache common responses
Summarize History - Compress conversation history
Set max_tokens - Limit response length appropriately

Monitoring Efficiency

Metric	Target	Meaning
Completion/Prompt Ratio	1.5-3x	Normal response expansion
Avg Tokens/Request	< 1000	Efficient requests
Empty Responses	< 1%	Healthy generation

API Reference

Get Usage Summary

GET /api/usage/summary?from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEY

Get Hourly Breakdown

GET /api/usage/hourly?date=2025-01-17
Authorization: Bearer YOUR_API_KEY

Export Usage Data

GET /api/usage/export?format=csv&from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEY

Token Usage Dashboard

Token Usage Dashboard

Overview

Key Features

Understanding Token Metrics

Dashboard Sections

Summary Cards

Time Series Chart

Model Breakdown

Time Range Selection

Preset Ranges

Granularity Options

Cost Estimation

Comparison Example

Token Counting

How Tokens Are Counted

Estimating Before Requests

Data Export

Export Formats

CSV Format

JSON Format

Usage Alerts

Best Practices

Reducing Token Usage

Monitoring Efficiency

API Reference

Get Usage Summary

Get Hourly Breakdown

Export Usage Data

Need Help?