Back to Docs

Inference Playground

Interactive chat interface for testing your models

Inference Playground

Interactive chat interface for testing and experimenting with your deployed AI models. No code required.

Overview

The Inference Playground provides a real-time chat interface with streaming responses, allowing you to quickly test model behavior, experiment with prompts, and demonstrate capabilities without writing any code.

Key Features

  • Real-time Streaming - See responses token-by-token as they generate
  • System Prompts - Configure model behavior with custom instructions
  • Parameter Controls - Fine-tune temperature, max tokens, and more
  • Conversation History - Maintain context across multiple turns
  • Export Conversations - Copy or download as JSON

Getting Started

Accessing the Playground

  1. Navigate to your Packet.ai Dashboard
  2. Select a GPU instance with a deployed model
  3. Click the "Playground" tab
  4. The playground loads automatically with your model

Your First Conversation

  1. Enter a message in the text input at the bottom
  2. Press Enter or click the send button
  3. Watch the response stream in real-time
  4. Continue the conversation with follow-up messages

Configuration Options

Temperature

Controls randomness in generation:

ValueBehaviorUse Case
0.0Deterministic, focusedCode generation, factual Q&A
0.3-0.5BalancedGeneral assistant tasks
0.7-0.9CreativeWriting, brainstorming
1.0+Very randomPoetry, experimental content

Max Tokens

Limits response length:

  • 50-200 - Quick answers, one-liners
  • 200-500 - Explanations, short articles
  • 500-2000 - Detailed analysis, stories
  • 2000+ - Full articles, comprehensive guides

System Prompt

Defines the model's persona and behavior. Example prompts:

Technical Assistant:
"You are a senior software engineer. Provide detailed,
accurate technical explanations. Include code examples
when relevant. Use markdown formatting."

Creative Writer:
"You are a creative writing assistant. Write engaging,
vivid prose. Use metaphors and sensory details."

Data Analyst:
"You are a data analyst expert. Explain concepts
clearly with examples. Suggest visualizations."

Customer Support:
"You are a friendly customer support agent. Be
helpful and empathetic. Provide step-by-step
solutions."

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness:

  • 0.1 - Very focused, only most likely tokens
  • 0.5 - Moderately diverse
  • 0.9 - Quite diverse, occasional surprises
  • 1.0 - Consider all tokens

Tip: Use either temperature OR top_p, not both. Set the unused one to 1.0.

Interface Components

Chat Area

  • User Messages - Your inputs (right-aligned)
  • Assistant Messages - Model responses (left-aligned)
  • Streaming Indicator - Pulsing cursor during generation
  • Timestamps - When each message was sent

Input Area

  • Message Input - Multi-line text field for prompts
  • Send Button - Submit the message
  • Stop Button - Interrupt generation in progress
  • Clear Button - Start a new conversation

Settings Panel

SettingRangeDescription
ModelDropdownSelect from deployed models
System PromptText areaInstructions for model behavior
Temperature0.0 - 2.0Creativity/randomness level
Max Tokens1 - 32768Maximum response length
Top P0.0 - 1.0Nucleus sampling threshold

Conversation Patterns

Single-Turn Q&A

Best for quick factual questions and simple tasks:

User: What is the capital of Japan?
Assistant: The capital of Japan is Tokyo.

Multi-Turn Conversations

Best for complex problem-solving and iterative refinement:

User: I'm building a web app. What tech stack should I use?
Assistant: For a modern web app, I'd recommend...

User: I have experience with Python. How does that change?
Assistant: Great! With Python experience, I'd suggest...

User: What about the database?
Assistant: For your Python-based stack, consider...

Chain-of-Thought Prompting

For complex reasoning tasks, add this to your system prompt:

"Think through problems step by step before giving the final answer."

Exporting Conversations

You can export your conversations in JSON format:

{
  "model": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
  ],
  "parameters": {
    "temperature": 0.7,
    "max_tokens": 1000
  }
}

Tips & Best Practices

  • Be specific - Clear prompts get better responses
  • Use system prompts - Define consistent behavior
  • Iterate on prompts - Refine based on responses
  • Set appropriate max_tokens - Avoid unnecessarily long responses
  • Use stop sequences - Control where responses end

Need Help?

Contact us at support@packet.ai