Get started with Avian

See details on API, Subscription and Finetune pricing.

Model Name	Est. Speed	Context Length	Tool Calling	Input Priceper million tokens	Output Priceper million tokens
Meta Llama 3.1 405B Instruct Enterprise	~ 130 tok/s	131,072	✓	$1.50	$1.50
Meta Llama 3.3 70B Instruct Professional	~ 200 tok/s	131,072	✓	$0.45	$0.45
Meta Llama 3.1 8B Instruct Starter	~ 450 tok/s	131,072	✓	$0.10	$0.10

→

Enterprise-Grade Performance

Avian API offers competitive pricing for all models, at some of the highest speeds on the market by leveraging speculative decoding and running on the latest Nvidia H200 SXM GPUs. We have production grade capacity for all the models we serve, allowing usage with no rate limits to support you as you scale.

GPU Type	Frombilled by the second	Memory
H200 SXM Latest Generation	$0.00208	141GB HBM3
H100 SXM Enterprise	$0.00139	80GB HBM3

→

Deploy Custom Models

Get dedicated GPU instances to deploy and run any HuggingFace model with our optimized infrastructure. Pricing shown reflects reserved instances - contact us for on-demand rates. Perfect for high-throughput production workloads requiring dedicated resources.

Get started with Avian

Avian API Model Pricing

Enterprise-Grade Performance

Dedicated Deployments

Deploy Custom Models