Getting Started with Packet.ai
Get your first GPU running in minutes. This guide covers everything from account setup to running your first training job.
Platform Overview
Packet.ai is a cloud GPU platform designed for AI/ML workloads. Whether you're training models, running inference, or deploying LLMs, we provide the infrastructure you need.
Key Features
| Feature | Description |
|---|---|
| GPU Compute | NVIDIA A100, H100, RTX 4090 and more. Scale from 1-8 GPUs per instance. |
| HuggingFace Integration | One-click deployment of any HuggingFace model with vLLM inference server. |
| Token Factory | OpenAI-compatible API for LLM inference with batch processing and LoRA support. |
| Persistent Storage | NFS-based storage that survives instance restarts. Perfect for datasets and checkpoints. |
| Service Exposure | Expose any port to the internet with a public URL. Run APIs, notebooks, or web apps. |
| Browser Terminal | Full shell access directly from your browser. No SSH setup required. |
Prerequisites
Before you begin, make sure you have:
- A Packet.ai account (Sign up here if you don't have one)
- Credits in your account (prepaid balance or subscription)
- A payment method on file
New User?
New accounts get $10 free credits to try the platform. Add funds via the Billing tab when you need more.
Step 1: Launch a GPU
- From your dashboard, click Launch GPU
- Select GPU Pool - Choose from available GPU types and regions. Popular options include:
- RTX 4090 - Great for inference and smaller training jobs
- A100 40GB - Ideal for training and large model inference
- H100 - Maximum performance for demanding workloads
- Instance Type - Select CPU/RAM allocation for your container. More RAM is useful for data preprocessing.
- Storage (optional):
- Ephemeral Storage - Fast local NVMe, cleared on restart (default)
- Persistent Storage - NFS-based, survives restarts. Choose 50GB-1TB.
- GPU Count - Select 1-8 GPUs depending on your workload. Start with 1 for most tasks, scale up for distributed training.
- Click Launch GPU
Your GPU will begin provisioning. This typically takes 30-60 seconds.
Step 2: Connect to Your GPU
Once your GPU shows "Running" status, you have three options:
Option A: Browser Terminal (Easiest)
Click the Terminal icon on your GPU card to open a browser-based terminal directly in the dashboard. No setup required.
Option B: SSH with Key
- Go to Account Settings and add your SSH public key
- Copy the SSH command from your GPU card
- Connect from your terminal:
# Connect to your GPU instance
ssh -p <port> ubuntu@<host>
# Example
ssh -p 30123 ubuntu@35.190.160.152Option C: SSH with Password
A password is shown on your GPU card. Click to reveal it:
# Connect with password
ssh -p <port> ubuntu@<host>
# Enter the password when promptedSSH Key Recommended
For the best experience, add your SSH key in Account Settings. This enables passwordless authentication and VS Code Remote SSH integration.
Step 3: Start Working
Your GPU instance comes pre-configured with:
| Software | Details |
|---|---|
| Operating System | Ubuntu 22.04 LTS |
| NVIDIA Drivers | Latest stable drivers |
| CUDA Toolkit | CUDA 12.x with cuDNN |
| Python | Python 3.10+ with pip |
| Package Managers | apt, pip, conda (miniconda available) |
Quick Test: Verify GPU Access
# Check GPU is available and see memory/utilization
nvidia-smi
# Test CUDA with PyTorch
pip install torch
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"
# Test with TensorFlow
pip install tensorflow
python3 -c "import tensorflow as tf; print(f'GPUs: {tf.config.list_physical_devices("GPU")}')"
# Check CUDA version
nvcc --versionCommon Workflows
Training a Model
# Clone your repository
git clone https://github.com/your/repo.git
cd repo
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Start training
python train.py --epochs 100 --batch-size 32
# Monitor GPU usage in another terminal
watch -n 1 nvidia-smiUsing Persistent Storage
If you selected persistent storage, it's mounted at /mnt/<volume-name>:
# Check your mounted volumes
df -h | grep mnt
# Store datasets (persists across restarts)
cp -r ./data /mnt/your-volume/datasets/
# Store model checkpoints
mkdir -p /mnt/your-volume/checkpoints
cp model_checkpoint.pt /mnt/your-volume/checkpoints/
# Link to your project directory
ln -s /mnt/your-volume/datasets ./dataRunning Jupyter Notebook
# Install Jupyter
pip install jupyter
# Start Jupyter (accessible via port forwarding or service exposure)
jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser
# Or use JupyterLab
pip install jupyterlab
jupyter lab --ip 0.0.0.0 --port 8888 --no-browserThen either use SSH port forwarding (ssh -L 8888:localhost:8888 ...) or expose port 8888 using the Service Exposure feature.
Exposing a Service
To make a web service accessible from the internet:
- Start your service on a port (e.g.,
--host 0.0.0.0 --port 8000) - Click Expose Port in the Exposed Services section of your GPU card
- Enter the port number and a service name
- Copy the external URL provided (e.g.,
https://abc123.packet.ai)
# Example: Expose a FastAPI server
pip install fastapi uvicorn
cat > app.py << 'EOF'
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def read_root():
return {"message": "Hello from Packet.ai GPU!"}
EOF
# Start on port 8000
uvicorn app:app --host 0.0.0.0 --port 8000Deploying a HuggingFace Model
For quick model deployment, use the HuggingFace integration:
- Click HuggingFace in the sidebar
- Search for a model (e.g., "Llama 3.1", "Mistral", "Qwen")
- Select your GPU configuration
- Click Deploy
In 5-10 minutes, you'll have an OpenAI-compatible API endpoint running vLLM. See the HuggingFace Deployment docs for details.
Managing Your GPU
| Action | Description | Billing Impact |
|---|---|---|
| Stop | Pause the instance. State is preserved. | GPU billing stops. Storage continues. |
| Start | Resume a stopped instance. | GPU billing resumes. |
| Restart | Reboot the container. | No change. |
| Scale | Change the number of GPUs. | Billing adjusts to new GPU count. |
| Terminate | Delete the instance permanently. | All charges stop. Data is deleted. |
Warning: Terminate is Permanent
Terminating an instance deletes all data including ephemeral storage. Make sure to save important files to persistent storage or download them first.
Cost Management
- GPUs are billed per hour while running (prorated by minute)
- Stopped instances don't incur GPU charges
- Persistent storage is billed continuously while it exists
- Check your balance and usage in the Billing section
Quick Tips to Save Money
- Stop when not using - GPU billing pauses immediately
- Right-size your GPU - Start small, scale up only if needed
- Use ephemeral storage - Persistent storage has ongoing costs
- Terminate when done - Delete instances you no longer need
- Monitor usage - Check the Billing tab regularly
Next Steps
Now that you're set up, explore these features:
| Feature | Description | Documentation |
|---|---|---|
| Token Factory | Use our hosted LLM inference API with pay-per-token pricing | Token Factory Docs |
| HuggingFace Deployment | One-click deployment of any HuggingFace model | HuggingFace Docs |
| OpenAI Gateway | Use your models with OpenAI SDKs and tools | OpenAI Gateway Docs |
| SSH Access | Advanced SSH configuration and VS Code Remote | SSH Docs |
| Service Exposure | Make ports publicly accessible | Service Exposure Docs |
Troubleshooting
GPU Not Launching
- Insufficient balance: Add funds in the Billing section
- No availability: Try a different GPU pool or region
- Stuck in "Pending": Wait 2-3 minutes, then try terminating and relaunching
Can't Connect via SSH
- Connection refused: Wait 30 seconds after instance shows "Running"
- Permission denied: Verify your SSH key is added in Account Settings
- Host key changed: Run
ssh-keygen -R "[host]:port"
CUDA Not Working
# Check NVIDIA drivers
nvidia-smi
# If drivers not loaded, try:
sudo nvidia-smi
# Check CUDA installation
nvcc --version
# Test PyTorch CUDA
python3 -c "import torch; print(torch.cuda.is_available())"
# If False, reinstall PyTorch with CUDA support:
pip install torch --index-url https://download.pytorch.org/whl/cu121Out of GPU Memory
- Reduce batch size: Lower
--batch-sizein your training script - Enable gradient checkpointing: Trade compute for memory
- Use mixed precision: Add
--fp16or--bf16flags - Scale up GPUs: Use the Scale feature to add more GPUs
- Try a smaller model: Consider a quantized version
Slow Performance
- Check GPU utilization: Run
nvidia-smi- should be near 100% - Enable DataLoader workers: Add
num_workers=4to your DataLoader - Use persistent storage wisely: It's slower than local NVMe for random access
- Pin memory: Add
pin_memory=Trueto DataLoader
Need Help?
Contact us at support@packet.ai or use the Support tab in your dashboard for faster response.
