Getting Started with Packet.ai

Get your first GPU running in minutes. This guide covers everything from account setup to running your first training job.

Platform Overview

Packet.ai is a cloud GPU platform designed for AI/ML workloads. Whether you're training models, running inference, or deploying LLMs, we provide the infrastructure you need.

Key Features

Feature	Description
GPU Compute	NVIDIA A100, H100, RTX 4090 and more. Scale from 1-8 GPUs per instance.
HuggingFace Integration	One-click deployment of any HuggingFace model with vLLM inference server.
Token Factory	OpenAI-compatible API for LLM inference with batch processing and LoRA support.
Persistent Storage	NFS-based storage that survives instance restarts. Perfect for datasets and checkpoints.
Service Exposure	Expose any port to the internet with a public URL. Run APIs, notebooks, or web apps.
Browser Terminal	Full shell access directly from your browser. No SSH setup required.

Prerequisites

Before you begin, make sure you have:

A Packet.ai account (Sign up here if you don't have one)
Credits in your account (prepaid balance or subscription)
A payment method on file

New User?

New accounts get $10 free credits to try the platform. Add funds via the Billing tab when you need more.

Step 1: Launch a GPU

From your dashboard, click Launch GPU
Select GPU Pool - Choose from available GPU types and regions. Popular options include:
- RTX 4090 - Great for inference and smaller training jobs
- A100 40GB - Ideal for training and large model inference
- H100 - Maximum performance for demanding workloads
Instance Type - Select CPU/RAM allocation for your container. More RAM is useful for data preprocessing.
Storage (optional):
- Ephemeral Storage - Fast local NVMe, cleared on restart (default)
- Persistent Storage - NFS-based, survives restarts. Choose 50GB-1TB.
GPU Count - Select 1-8 GPUs depending on your workload. Start with 1 for most tasks, scale up for distributed training.
Click Launch GPU

Your GPU will begin provisioning. This typically takes 30-60 seconds.

Step 2: Connect to Your GPU

Once your GPU shows "Running" status, you have three options:

Option A: Browser Terminal (Easiest)

Click the Terminal icon on your GPU card to open a browser-based terminal directly in the dashboard. No setup required.

Option B: SSH with Key

Go to Account Settings and add your SSH public key
Copy the SSH command from your GPU card
Connect from your terminal:

# Connect to your GPU instance
ssh -p <port> ubuntu@<host>

# Example
ssh -p 30123 ubuntu@35.190.160.152

Option C: SSH with Password

A password is shown on your GPU card. Click to reveal it:

# Connect with password
ssh -p <port> ubuntu@<host>
# Enter the password when prompted

SSH Key Recommended

For the best experience, add your SSH key in Account Settings. This enables passwordless authentication and VS Code Remote SSH integration.

Step 3: Start Working

Your GPU instance comes pre-configured with:

Software	Details
Operating System	Ubuntu 22.04 LTS
NVIDIA Drivers	Latest stable drivers
CUDA Toolkit	CUDA 12.x with cuDNN
Python	Python 3.10+ with pip
Package Managers	apt, pip, conda (miniconda available)

Quick Test: Verify GPU Access

# Check GPU is available and see memory/utilization
nvidia-smi

# Test CUDA with PyTorch
pip install torch
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"

# Test with TensorFlow
pip install tensorflow
python3 -c "import tensorflow as tf; print(f'GPUs: {tf.config.list_physical_devices("GPU")}')"

# Check CUDA version
nvcc --version

Common Workflows

Training a Model

# Clone your repository
git clone https://github.com/your/repo.git
cd repo

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start training
python train.py --epochs 100 --batch-size 32

# Monitor GPU usage in another terminal
watch -n 1 nvidia-smi

Using Persistent Storage

If you selected persistent storage, it's mounted at /mnt/<volume-name>:

# Check your mounted volumes
df -h | grep mnt

# Store datasets (persists across restarts)
cp -r ./data /mnt/your-volume/datasets/

# Store model checkpoints
mkdir -p /mnt/your-volume/checkpoints
cp model_checkpoint.pt /mnt/your-volume/checkpoints/

# Link to your project directory
ln -s /mnt/your-volume/datasets ./data

Running Jupyter Notebook

# Install Jupyter
pip install jupyter

# Start Jupyter (accessible via port forwarding or service exposure)
jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser

# Or use JupyterLab
pip install jupyterlab
jupyter lab --ip 0.0.0.0 --port 8888 --no-browser

Then either use SSH port forwarding (ssh -L 8888:localhost:8888 ...) or expose port 8888 using the Service Exposure feature.

Exposing a Service

To make a web service accessible from the internet:

Start your service on a port (e.g., --host 0.0.0.0 --port 8000)
Click Expose Port in the Exposed Services section of your GPU card
Enter the port number and a service name
Copy the external URL provided (e.g., https://abc123.packet.ai)

# Example: Expose a FastAPI server
pip install fastapi uvicorn
cat > app.py << 'EOF'
from fastapi import FastAPI
app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from Packet.ai GPU!"}
EOF

# Start on port 8000
uvicorn app:app --host 0.0.0.0 --port 8000

Deploying a HuggingFace Model

For quick model deployment, use the HuggingFace integration:

Click HuggingFace in the sidebar
Search for a model (e.g., "Llama 3.1", "Mistral", "Qwen")
Select your GPU configuration
Click Deploy

In 5-10 minutes, you'll have an OpenAI-compatible API endpoint running vLLM. See the HuggingFace Deployment docs for details.

Managing Your GPU

Action	Description	Billing Impact
Stop	Pause the instance. State is preserved.	GPU billing stops. Storage continues.
Start	Resume a stopped instance.	GPU billing resumes.
Restart	Reboot the container.	No change.
Scale	Change the number of GPUs.	Billing adjusts to new GPU count.
Terminate	Delete the instance permanently.	All charges stop. Data is deleted.

Warning: Terminate is Permanent

Terminating an instance deletes all data including ephemeral storage. Make sure to save important files to persistent storage or download them first.

Cost Management

GPUs are billed per hour while running (prorated by minute)
Stopped instances don't incur GPU charges
Persistent storage is billed continuously while it exists
Check your balance and usage in the Billing section

Quick Tips to Save Money

Stop when not using - GPU billing pauses immediately
Right-size your GPU - Start small, scale up only if needed
Use ephemeral storage - Persistent storage has ongoing costs
Terminate when done - Delete instances you no longer need
Monitor usage - Check the Billing tab regularly

Next Steps

Now that you're set up, explore these features:

Feature	Description	Documentation
Token Factory	Use our hosted LLM inference API with pay-per-token pricing	Token Factory Docs
HuggingFace Deployment	One-click deployment of any HuggingFace model	HuggingFace Docs
OpenAI Gateway	Use your models with OpenAI SDKs and tools	OpenAI Gateway Docs
SSH Access	Advanced SSH configuration and VS Code Remote	SSH Docs
Service Exposure	Make ports publicly accessible	Service Exposure Docs

Troubleshooting

GPU Not Launching

Insufficient balance: Add funds in the Billing section
No availability: Try a different GPU pool or region
Stuck in "Pending": Wait 2-3 minutes, then try terminating and relaunching

Can't Connect via SSH

Connection refused: Wait 30 seconds after instance shows "Running"
Permission denied: Verify your SSH key is added in Account Settings
Host key changed: Run ssh-keygen -R "[host]:port"

CUDA Not Working

# Check NVIDIA drivers
nvidia-smi

# If drivers not loaded, try:
sudo nvidia-smi

# Check CUDA installation
nvcc --version

# Test PyTorch CUDA
python3 -c "import torch; print(torch.cuda.is_available())"

# If False, reinstall PyTorch with CUDA support:
pip install torch --index-url https://download.pytorch.org/whl/cu121

Out of GPU Memory

Reduce batch size: Lower --batch-size in your training script
Enable gradient checkpointing: Trade compute for memory
Use mixed precision: Add --fp16 or --bf16 flags
Scale up GPUs: Use the Scale feature to add more GPUs
Try a smaller model: Consider a quantized version

Slow Performance

Check GPU utilization: Run nvidia-smi - should be near 100%
Enable DataLoader workers: Add num_workers=4 to your DataLoader
Use persistent storage wisely: It's slower than local NVMe for random access
Pin memory: Add pin_memory=True to DataLoader

Getting Started