Inference Playground
Interactive chat interface for testing and experimenting with your deployed AI models. No code required.
Overview
The Inference Playground provides a real-time chat interface with streaming responses, allowing you to quickly test model behavior, experiment with prompts, and demonstrate capabilities without writing any code.
Key Features
- Real-time Streaming - See responses token-by-token as they generate
- System Prompts - Configure model behavior with custom instructions
- Parameter Controls - Fine-tune temperature, max tokens, and more
- Conversation History - Maintain context across multiple turns
- Export Conversations - Copy or download as JSON
Getting Started
Accessing the Playground
- Navigate to your Packet.ai Dashboard
- Select a GPU instance with a deployed model
- Click the "Playground" tab
- The playground loads automatically with your model
Your First Conversation
- Enter a message in the text input at the bottom
- Press Enter or click the send button
- Watch the response stream in real-time
- Continue the conversation with follow-up messages
Configuration Options
Temperature
Controls randomness in generation:
| Value | Behavior | Use Case |
|---|---|---|
0.0 | Deterministic, focused | Code generation, factual Q&A |
0.3-0.5 | Balanced | General assistant tasks |
0.7-0.9 | Creative | Writing, brainstorming |
1.0+ | Very random | Poetry, experimental content |
Max Tokens
Limits response length:
- 50-200 - Quick answers, one-liners
- 200-500 - Explanations, short articles
- 500-2000 - Detailed analysis, stories
- 2000+ - Full articles, comprehensive guides
System Prompt
Defines the model's persona and behavior. Example prompts:
Technical Assistant:
"You are a senior software engineer. Provide detailed,
accurate technical explanations. Include code examples
when relevant. Use markdown formatting."
Creative Writer:
"You are a creative writing assistant. Write engaging,
vivid prose. Use metaphors and sensory details."
Data Analyst:
"You are a data analyst expert. Explain concepts
clearly with examples. Suggest visualizations."
Customer Support:
"You are a friendly customer support agent. Be
helpful and empathetic. Provide step-by-step
solutions."Top P (Nucleus Sampling)
Alternative to temperature for controlling randomness:
0.1- Very focused, only most likely tokens0.5- Moderately diverse0.9- Quite diverse, occasional surprises1.0- Consider all tokens
Tip: Use either temperature OR top_p, not both. Set the unused one to 1.0.
Interface Components
Chat Area
- User Messages - Your inputs (right-aligned)
- Assistant Messages - Model responses (left-aligned)
- Streaming Indicator - Pulsing cursor during generation
- Timestamps - When each message was sent
Input Area
- Message Input - Multi-line text field for prompts
- Send Button - Submit the message
- Stop Button - Interrupt generation in progress
- Clear Button - Start a new conversation
Settings Panel
| Setting | Range | Description |
|---|---|---|
| Model | Dropdown | Select from deployed models |
| System Prompt | Text area | Instructions for model behavior |
| Temperature | 0.0 - 2.0 | Creativity/randomness level |
| Max Tokens | 1 - 32768 | Maximum response length |
| Top P | 0.0 - 1.0 | Nucleus sampling threshold |
Conversation Patterns
Single-Turn Q&A
Best for quick factual questions and simple tasks:
User: What is the capital of Japan?
Assistant: The capital of Japan is Tokyo.Multi-Turn Conversations
Best for complex problem-solving and iterative refinement:
User: I'm building a web app. What tech stack should I use?
Assistant: For a modern web app, I'd recommend...
User: I have experience with Python. How does that change?
Assistant: Great! With Python experience, I'd suggest...
User: What about the database?
Assistant: For your Python-based stack, consider...Chain-of-Thought Prompting
For complex reasoning tasks, add this to your system prompt:
"Think through problems step by step before giving the final answer."Exporting Conversations
You can export your conversations in JSON format:
{
"model": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"}
],
"parameters": {
"temperature": 0.7,
"max_tokens": 1000
}
}Tips & Best Practices
- Be specific - Clear prompts get better responses
- Use system prompts - Define consistent behavior
- Iterate on prompts - Refine based on responses
- Set appropriate max_tokens - Avoid unnecessarily long responses
- Use stop sequences - Control where responses end
Need Help?
Contact us at support@packet.ai
