Fireworks AI

Fireworks AI specializes in fast, low-latency inference for open-source models. Known for optimized serving infrastructure, Fireworks delivers some of the fastest response times available for models like Llama, DeepSeek, and Qwen. All models use an OpenAI-compatible API.

Getting an API Key

Visit fireworks.ai/account/api-keys
Sign in or create a Fireworks AI account
Generate a new API key (starts with fw_...)
Paste the key into AISCouncil under Settings > AI Model > Fireworks AI

aviso

Fireworks AI does not offer a free tier. Pricing is competitive and pay-as-you-go. See fireworks.ai/pricing for current rates.

API keys are stored locally in your browser (localStorage) and are never included in shared bot URLs.

Supported Models

Model	Context	Max Output	Input Price	Output Price	Capabilities
Llama 3.3 70B	128K	16K	$0.90/MTok	$0.90/MTok	Tools, code, streaming
DeepSeek R1	128K	16K	$3.00/MTok	$8.00/MTok	Reasoning, code, streaming
Qwen 2.5 72B	128K	16K	$0.90/MTok	$0.90/MTok	Tools, streaming
Llama 4 Scout	1M	16K	$0.15/MTok	$0.40/MTok	Vision, tools, code, streaming
Llama 4 Maverick	1M	16K	$0.24/MTok	$0.77/MTok	Vision, tools, code, streaming

Prices are per million tokens (MTok). See fireworks.ai/pricing for the latest rates.

Choosing a Model

Llama 4 Scout and Maverick offer 1M context with vision at low prices -- Fireworks serves these with notably fast time-to-first-token.
Llama 3.3 70B is a reliable all-purpose model with 16K max output (higher than most providers offer for this model).
DeepSeek R1 provides reasoning capabilities with Fireworks' low-latency serving.

Low-Latency Inference

Fireworks AI is known for optimized model serving that minimizes latency. This makes it a strong choice when response speed matters -- interactive chat, real-time applications, or high-throughput workflows where every millisecond counts.

All models on Fireworks support up to 16K max output tokens, which is higher than many other inference providers offer for the same models.

Configuration

Select Fireworks AI as the provider when creating a bot profile. The app connects to api.fireworks.ai/inference/v1 using the OpenAI-compatible format with Bearer authentication.

Tips for Best Results

Choose Fireworks when latency matters most. If you need fast time-to-first-token and quick streaming, Fireworks' optimized infrastructure delivers.
Take advantage of the higher max output. At 16K max output tokens, you get longer responses than the same models on other providers.
Use Llama 4 Scout for cost-effective long-context work. Same pricing as Together AI but with Fireworks' speed advantage.

Getting an API Key​

Supported Models​

Low-Latency Inference​

Configuration​

Tips for Best Results​

Getting an API Key

Supported Models

Low-Latency Inference

Configuration

Tips for Best Results