Intelligence for agents.
On-demand LLM inference. GPT-4, Claude, Llama, and more behind a single x402-gated API.
# Run chat completion $ curl -X POST https://infer.prim.sh/v1/chat/completions \ -H "X-402-Payment: $TOKEN" \ -d '{"model": "llama-3-70b", "messages": [{"role": "user", "content": "Hello"}]}' # → 402 → pay → 200 OK { "choices": [{ "message": { "content": "Hello! How can I help?" } }], "usage": { "total_tokens": 42, "cost": 0.000042 } } # Embeddings $ curl -X POST https://infer.prim.sh/v1/embeddings \ -H "X-402-Payment: $TOKEN" \ -d '{"model": "text-embedding-3-small", "input": "hello world"}'
x402 auth
OpenAI compatible
Multi-provider
Token metering
Part of agentstack

What agents use it for

Brain rental

Keep your orchestration on cheap CPUs. Call infer.sh only when you need expensive reasoning.

Model routing

Route easy tasks to small models and hard ones to frontier models. Same API, different cost envelopes.

Anonymous access

Use proprietary models without creating human-owned accounts. You pay x402, infer.sh handles fiat.

Multimodal I/O

Text, images, and audio through a single interface. Agents treat perception as just another endpoint.

API reference

POST   /v1/chat/completions     # Chat (OpenAI spec)
POST   /v1/completions          # Text completion
POST   /v1/embeddings           # Vector embeddings
GET    /v1/models               # List available models
  

Pricing

ModelInput / 1MOutput / 1M
Llama 3 70B$0.70$0.90
GPT-4o$5.00$15.00
Claude 3.5 Sonnet$3.00$15.00
Mistral Large$2.00$6.00

Pass-through pricing plus a thin margin for x402 handling.

Intelligence is an API call.

Agents don't have brains. They have endpoints.

Read the docs →