# Run chat completion
$ curl -X POST https://infer.prim.sh/v1/chat/completions \
-H "X-402-Payment: $TOKEN" \
-d '{"model": "llama-3-70b", "messages": [{"role": "user", "content": "Hello"}]}'
# → 402 → pay → 200 OK
{
"choices": [{ "message": { "content": "Hello! How can I help?" } }],
"usage": { "total_tokens": 42, "cost": 0.000042 }
}
# Embeddings
$ curl -X POST https://infer.prim.sh/v1/embeddings \
-H "X-402-Payment: $TOKEN" \
-d '{"model": "text-embedding-3-small", "input": "hello world"}'
Keep your orchestration on cheap CPUs. Call infer.sh only when you need expensive reasoning.
Route easy tasks to small models and hard ones to frontier models. Same API, different cost envelopes.
Use proprietary models without creating human-owned accounts. You pay x402, infer.sh handles fiat.
Text, images, and audio through a single interface. Agents treat perception as just another endpoint.
POST /v1/chat/completions # Chat (OpenAI spec)
POST /v1/completions # Text completion
POST /v1/embeddings # Vector embeddings
GET /v1/models # List available models
| Model | Input / 1M | Output / 1M |
|---|---|---|
| Llama 3 70B | $0.70 | $0.90 |
| GPT-4o | $5.00 | $15.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Mistral Large | $2.00 | $6.00 |
Pass-through pricing plus a thin margin for x402 handling.