Inference Playground

Interactive chat interface for testing and experimenting with your deployed AI models. No code required.

Overview

The Inference Playground provides a real-time chat interface with streaming responses, allowing you to quickly test model behavior, experiment with prompts, and demonstrate capabilities without writing any code.

Key Features

Real-time Streaming - See responses token-by-token as they generate
System Prompts - Configure model behavior with custom instructions
Parameter Controls - Fine-tune temperature, max tokens, and more
Conversation History - Maintain context across multiple turns
Export Conversations - Copy or download as JSON

Getting Started

Accessing the Playground

Navigate to your Packet.ai Dashboard
Select a GPU instance with a deployed model
Click the "Playground" tab
The playground loads automatically with your model

Your First Conversation

Enter a message in the text input at the bottom
Press Enter or click the send button
Watch the response stream in real-time
Continue the conversation with follow-up messages

Configuration Options

Temperature

Controls randomness in generation:

Value	Behavior	Use Case
`0.0`	Deterministic, focused	Code generation, factual Q&A
`0.3-0.5`	Balanced	General assistant tasks
`0.7-0.9`	Creative	Writing, brainstorming
`1.0+`	Very random	Poetry, experimental content

Max Tokens

Limits response length:

50-200 - Quick answers, one-liners
200-500 - Explanations, short articles
500-2000 - Detailed analysis, stories
2000+ - Full articles, comprehensive guides

System Prompt

Defines the model's persona and behavior. Example prompts:

Technical Assistant:
"You are a senior software engineer. Provide detailed,
accurate technical explanations. Include code examples
when relevant. Use markdown formatting."

Creative Writer:
"You are a creative writing assistant. Write engaging,
vivid prose. Use metaphors and sensory details."

Data Analyst:
"You are a data analyst expert. Explain concepts
clearly with examples. Suggest visualizations."

Customer Support:
"You are a friendly customer support agent. Be
helpful and empathetic. Provide step-by-step
solutions."

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness:

0.1 - Very focused, only most likely tokens
0.5 - Moderately diverse
0.9 - Quite diverse, occasional surprises
1.0 - Consider all tokens

Tip: Use either temperature OR top_p, not both. Set the unused one to 1.0.

Interface Components

Chat Area

User Messages - Your inputs (right-aligned)
Assistant Messages - Model responses (left-aligned)
Streaming Indicator - Pulsing cursor during generation
Timestamps - When each message was sent

Input Area

Message Input - Multi-line text field for prompts
Send Button - Submit the message
Stop Button - Interrupt generation in progress
Clear Button - Start a new conversation

Settings Panel

Setting	Range	Description
Model	Dropdown	Select from deployed models
System Prompt	Text area	Instructions for model behavior
Temperature	0.0 - 2.0	Creativity/randomness level
Max Tokens	1 - 32768	Maximum response length
Top P	0.0 - 1.0	Nucleus sampling threshold

Conversation Patterns

Single-Turn Q&A

Best for quick factual questions and simple tasks:

User: What is the capital of Japan?
Assistant: The capital of Japan is Tokyo.

Multi-Turn Conversations

Best for complex problem-solving and iterative refinement:

User: I'm building a web app. What tech stack should I use?
Assistant: For a modern web app, I'd recommend...

User: I have experience with Python. How does that change?
Assistant: Great! With Python experience, I'd suggest...

User: What about the database?
Assistant: For your Python-based stack, consider...

Chain-of-Thought Prompting

For complex reasoning tasks, add this to your system prompt:

"Think through problems step by step before giving the final answer."

Exporting Conversations

You can export your conversations in JSON format:

{
  "model": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there! How can I help?"}
  ],
  "parameters": {
    "temperature": 0.7,
    "max_tokens": 1000
  }
}

Tips & Best Practices

Be specific - Clear prompts get better responses
Use system prompts - Define consistent behavior
Iterate on prompts - Refine based on responses
Set appropriate max_tokens - Avoid unnecessarily long responses
Use stop sequences - Control where responses end

Inference Playground

Inference Playground

Overview

Key Features

Getting Started

Accessing the Playground

Your First Conversation

Configuration Options

Temperature

Max Tokens

System Prompt

Top P (Nucleus Sampling)

Interface Components

Chat Area

Input Area

Settings Panel

Conversation Patterns

Single-Turn Q&A

Multi-Turn Conversations

Chain-of-Thought Prompting

Exporting Conversations

Tips & Best Practices

Need Help?