Back to Docs

Token Usage Dashboard

Track and analyze your token consumption

Token Usage Dashboard

Track and analyze token consumption across your AI model deployments.

Overview

The Token Usage Dashboard provides comprehensive analytics on your AI model token consumption. Track prompt and completion tokens across time periods, monitor usage by model and endpoint, and export data for billing reconciliation or capacity planning.

Key Features

  • Usage Tracking - Monitor prompt, completion, and total token counts
  • Time-Based Analysis - View usage by hour, day, week, or month
  • Model Breakdown - See which models consume the most tokens
  • Cost Estimation - Approximate costs vs pay-per-token APIs
  • Export Capabilities - Download reports as CSV or JSON
  • Real-Time Updates - Live counters during active inference

Understanding Token Metrics

MetricDescription
Prompt TokensTokens in your input messages (you send)
Completion TokensTokens generated by the model (model sends)
Total TokensPrompt + Completion combined
RequestsNumber of API calls made
Avg Tokens/RequestAverage tokens per inference call

Dashboard Sections

Summary Cards

Quick overview showing:

  • Total tokens used in period
  • Prompt vs completion breakdown
  • Number of requests
  • Comparison to previous period

Time Series Chart

Interactive chart showing token usage over time:

  • Total, prompt, and completion lines
  • Hover for exact values
  • Zoom and pan controls
  • Multiple time granularities

Model Breakdown

Pie chart showing usage distribution by model:

  • Percentage of total tokens per model
  • Click to filter by specific model
  • Identify high-usage models

Time Range Selection

Preset Ranges

  • Today - Current day, hourly breakdown
  • Yesterday - Previous 24 hours
  • Last 7 Days - Weekly view, daily breakdown
  • Last 30 Days - Monthly view
  • This Month - Current calendar month
  • Custom Range - Select start and end dates

Granularity Options

RangeDefaultOptions
TodayHourly15min, Hourly
7 DaysDailyHourly, Daily
30 DaysDailyDaily, Weekly
90 DaysWeeklyDaily, Weekly, Monthly

Cost Estimation

While Packet.ai charges by GPU-hour (not tokens), the dashboard provides comparative cost estimates to show your savings:

Comparison Example

Cost Comparison (if using pay-per-token APIs)
─────────────────────────────────────────────
OpenAI GPT-4 equivalent:    $1,254.38
Claude 3.5 Sonnet equiv:      $876.45
Packet.ai GPU-hour cost:       $32.40
─────────────────────────────────────────────
Savings this period:         $1,221.98 (97.4%)

Token Counting

How Tokens Are Counted

ContentApproximate Tokens
1 word~1.3 tokens
1 sentence~15-20 tokens
1 paragraph~100-150 tokens
1 page (500 words)~650 tokens
Code (1 function)~50-200 tokens

Estimating Before Requests

# Python example using tiktoken
import tiktoken

def estimate_tokens(text, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# For LLaMA-style models, use cl100k_base
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode("Your prompt here"))

Data Export

Export Formats

  • CSV - Spreadsheet-compatible
  • JSON - API/programmatic use

CSV Format

timestamp,model,prompt_tokens,completion_tokens,total_tokens,requests
2025-01-17T00:00:00Z,meta-llama/Llama-3.1-70B-Instruct,45678,98765,144443,234
2025-01-17T01:00:00Z,meta-llama/Llama-3.1-70B-Instruct,52341,112456,164797,267

JSON Format

{
  "period": {
    "start": "2025-01-17T00:00:00Z",
    "end": "2025-01-17T23:59:59Z"
  },
  "summary": {
    "total_tokens": 12543890,
    "prompt_tokens": 4231456,
    "completion_tokens": 8312434,
    "requests": 45678
  },
  "by_model": [
    {
      "model": "meta-llama/Llama-3.1-70B-Instruct",
      "tokens": 5644750,
      "percentage": 45.0
    }
  ]
}

Usage Alerts

Set notifications for usage thresholds:

Alert TypeExampleUse Case
Daily Limit> 1M tokens/dayBudget control
Hourly Spike> 200% of averageDetect anomalies
Low Usage< 10K tokens/dayMonitor for issues
Per-ModelModel X > 500K/dayTrack specific workloads

Best Practices

Reducing Token Usage

  1. Concise Prompts - Remove unnecessary words
  2. Reuse System Prompts - Cache across conversations
  3. Implement Caching - Cache common responses
  4. Summarize History - Compress conversation history
  5. Set max_tokens - Limit response length appropriately

Monitoring Efficiency

MetricTargetMeaning
Completion/Prompt Ratio1.5-3xNormal response expansion
Avg Tokens/Request< 1000Efficient requests
Empty Responses< 1%Healthy generation

API Reference

Get Usage Summary

GET /api/usage/summary?from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEY

Get Hourly Breakdown

GET /api/usage/hourly?date=2025-01-17
Authorization: Bearer YOUR_API_KEY

Export Usage Data

GET /api/usage/export?format=csv&from=2025-01-01&to=2025-01-17
Authorization: Bearer YOUR_API_KEY

Need Help?

Contact us at support@packet.ai