Back to Docs

Getting Started

Deploy your first GPU in under 60 seconds

Getting Started with Packet.ai

Get your first GPU running in minutes. This guide covers everything from account setup to running your first training job.

Platform Overview

Packet.ai is a cloud GPU platform designed for AI/ML workloads. Whether you're training models, running inference, or deploying LLMs, we provide the infrastructure you need.

Key Features

FeatureDescription
GPU ComputeNVIDIA A100, H100, RTX 4090 and more. Scale from 1-8 GPUs per instance.
HuggingFace IntegrationOne-click deployment of any HuggingFace model with vLLM inference server.
Token FactoryOpenAI-compatible API for LLM inference with batch processing and LoRA support.
Persistent StorageNFS-based storage that survives instance restarts. Perfect for datasets and checkpoints.
Service ExposureExpose any port to the internet with a public URL. Run APIs, notebooks, or web apps.
Browser TerminalFull shell access directly from your browser. No SSH setup required.

Prerequisites

Before you begin, make sure you have:

  • A Packet.ai account (Sign up here if you don't have one)
  • Credits in your account (prepaid balance or subscription)
  • A payment method on file

New User?

New accounts get $10 free credits to try the platform. Add funds via the Billing tab when you need more.

Step 1: Launch a GPU

  1. From your dashboard, click Launch GPU
  2. Select GPU Pool - Choose from available GPU types and regions. Popular options include:
    • RTX 4090 - Great for inference and smaller training jobs
    • A100 40GB - Ideal for training and large model inference
    • H100 - Maximum performance for demanding workloads
  3. Instance Type - Select CPU/RAM allocation for your container. More RAM is useful for data preprocessing.
  4. Storage (optional):
    • Ephemeral Storage - Fast local NVMe, cleared on restart (default)
    • Persistent Storage - NFS-based, survives restarts. Choose 50GB-1TB.
  5. GPU Count - Select 1-8 GPUs depending on your workload. Start with 1 for most tasks, scale up for distributed training.
  6. Click Launch GPU

Your GPU will begin provisioning. This typically takes 30-60 seconds.

Step 2: Connect to Your GPU

Once your GPU shows "Running" status, you have three options:

Option A: Browser Terminal (Easiest)

Click the Terminal icon on your GPU card to open a browser-based terminal directly in the dashboard. No setup required.

Option B: SSH with Key

  1. Go to Account Settings and add your SSH public key
  2. Copy the SSH command from your GPU card
  3. Connect from your terminal:
# Connect to your GPU instance
ssh -p <port> ubuntu@<host>

# Example
ssh -p 30123 ubuntu@35.190.160.152

Option C: SSH with Password

A password is shown on your GPU card. Click to reveal it:

# Connect with password
ssh -p <port> ubuntu@<host>
# Enter the password when prompted

SSH Key Recommended

For the best experience, add your SSH key in Account Settings. This enables passwordless authentication and VS Code Remote SSH integration.

Step 3: Start Working

Your GPU instance comes pre-configured with:

SoftwareDetails
Operating SystemUbuntu 22.04 LTS
NVIDIA DriversLatest stable drivers
CUDA ToolkitCUDA 12.x with cuDNN
PythonPython 3.10+ with pip
Package Managersapt, pip, conda (miniconda available)

Quick Test: Verify GPU Access

# Check GPU is available and see memory/utilization
nvidia-smi

# Test CUDA with PyTorch
pip install torch
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}')"

# Test with TensorFlow
pip install tensorflow
python3 -c "import tensorflow as tf; print(f'GPUs: {tf.config.list_physical_devices("GPU")}')"

# Check CUDA version
nvcc --version

Common Workflows

Training a Model

# Clone your repository
git clone https://github.com/your/repo.git
cd repo

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start training
python train.py --epochs 100 --batch-size 32

# Monitor GPU usage in another terminal
watch -n 1 nvidia-smi

Using Persistent Storage

If you selected persistent storage, it's mounted at /mnt/<volume-name>:

# Check your mounted volumes
df -h | grep mnt

# Store datasets (persists across restarts)
cp -r ./data /mnt/your-volume/datasets/

# Store model checkpoints
mkdir -p /mnt/your-volume/checkpoints
cp model_checkpoint.pt /mnt/your-volume/checkpoints/

# Link to your project directory
ln -s /mnt/your-volume/datasets ./data

Running Jupyter Notebook

# Install Jupyter
pip install jupyter

# Start Jupyter (accessible via port forwarding or service exposure)
jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser

# Or use JupyterLab
pip install jupyterlab
jupyter lab --ip 0.0.0.0 --port 8888 --no-browser

Then either use SSH port forwarding (ssh -L 8888:localhost:8888 ...) or expose port 8888 using the Service Exposure feature.

Exposing a Service

To make a web service accessible from the internet:

  1. Start your service on a port (e.g., --host 0.0.0.0 --port 8000)
  2. Click Expose Port in the Exposed Services section of your GPU card
  3. Enter the port number and a service name
  4. Copy the external URL provided (e.g., https://abc123.packet.ai)
# Example: Expose a FastAPI server
pip install fastapi uvicorn
cat > app.py << 'EOF'
from fastapi import FastAPI
app = FastAPI()

@app.get("/")
def read_root():
    return {"message": "Hello from Packet.ai GPU!"}
EOF

# Start on port 8000
uvicorn app:app --host 0.0.0.0 --port 8000

Deploying a HuggingFace Model

For quick model deployment, use the HuggingFace integration:

  1. Click HuggingFace in the sidebar
  2. Search for a model (e.g., "Llama 3.1", "Mistral", "Qwen")
  3. Select your GPU configuration
  4. Click Deploy

In 5-10 minutes, you'll have an OpenAI-compatible API endpoint running vLLM. See the HuggingFace Deployment docs for details.

Managing Your GPU

ActionDescriptionBilling Impact
StopPause the instance. State is preserved.GPU billing stops. Storage continues.
StartResume a stopped instance.GPU billing resumes.
RestartReboot the container.No change.
ScaleChange the number of GPUs.Billing adjusts to new GPU count.
TerminateDelete the instance permanently.All charges stop. Data is deleted.

Warning: Terminate is Permanent

Terminating an instance deletes all data including ephemeral storage. Make sure to save important files to persistent storage or download them first.

Cost Management

  • GPUs are billed per hour while running (prorated by minute)
  • Stopped instances don't incur GPU charges
  • Persistent storage is billed continuously while it exists
  • Check your balance and usage in the Billing section

Quick Tips to Save Money

  1. Stop when not using - GPU billing pauses immediately
  2. Right-size your GPU - Start small, scale up only if needed
  3. Use ephemeral storage - Persistent storage has ongoing costs
  4. Terminate when done - Delete instances you no longer need
  5. Monitor usage - Check the Billing tab regularly

Next Steps

Now that you're set up, explore these features:

FeatureDescriptionDocumentation
Token FactoryUse our hosted LLM inference API with pay-per-token pricingToken Factory Docs
HuggingFace DeploymentOne-click deployment of any HuggingFace modelHuggingFace Docs
OpenAI GatewayUse your models with OpenAI SDKs and toolsOpenAI Gateway Docs
SSH AccessAdvanced SSH configuration and VS Code RemoteSSH Docs
Service ExposureMake ports publicly accessibleService Exposure Docs

Troubleshooting

GPU Not Launching

  • Insufficient balance: Add funds in the Billing section
  • No availability: Try a different GPU pool or region
  • Stuck in "Pending": Wait 2-3 minutes, then try terminating and relaunching

Can't Connect via SSH

  • Connection refused: Wait 30 seconds after instance shows "Running"
  • Permission denied: Verify your SSH key is added in Account Settings
  • Host key changed: Run ssh-keygen -R "[host]:port"

CUDA Not Working

# Check NVIDIA drivers
nvidia-smi

# If drivers not loaded, try:
sudo nvidia-smi

# Check CUDA installation
nvcc --version

# Test PyTorch CUDA
python3 -c "import torch; print(torch.cuda.is_available())"

# If False, reinstall PyTorch with CUDA support:
pip install torch --index-url https://download.pytorch.org/whl/cu121

Out of GPU Memory

  • Reduce batch size: Lower --batch-size in your training script
  • Enable gradient checkpointing: Trade compute for memory
  • Use mixed precision: Add --fp16 or --bf16 flags
  • Scale up GPUs: Use the Scale feature to add more GPUs
  • Try a smaller model: Consider a quantized version

Slow Performance

  • Check GPU utilization: Run nvidia-smi - should be near 100%
  • Enable DataLoader workers: Add num_workers=4 to your DataLoader
  • Use persistent storage wisely: It's slower than local NVMe for random access
  • Pin memory: Add pin_memory=True to DataLoader

Need Help?

Contact us at support@packet.ai or use the Support tab in your dashboard for faster response.