Pular para o conteúdo principal

Fireworks AI

Fireworks AI specializes in fast, low-latency inference for open-source models. Known for optimized serving infrastructure, Fireworks delivers some of the fastest response times available for models like Llama, DeepSeek, and Qwen. All models use an OpenAI-compatible API.

Getting an API Key

  1. Visit fireworks.ai/account/api-keys
  2. Sign in or create a Fireworks AI account
  3. Generate a new API key (starts with fw_...)
  4. Paste the key into AISCouncil under Settings > AI Model > Fireworks AI
aviso

Fireworks AI does not offer a free tier. Pricing is competitive and pay-as-you-go. See fireworks.ai/pricing for current rates.

API keys are stored locally in your browser (localStorage) and are never included in shared bot URLs.

Supported Models

ModelContextMax OutputInput PriceOutput PriceCapabilities
Llama 3.3 70B128K16K$0.90/MTok$0.90/MTokTools, code, streaming
DeepSeek R1128K16K$3.00/MTok$8.00/MTokReasoning, code, streaming
Qwen 2.5 72B128K16K$0.90/MTok$0.90/MTokTools, streaming
Llama 4 Scout1M16K$0.15/MTok$0.40/MTokVision, tools, code, streaming
Llama 4 Maverick1M16K$0.24/MTok$0.77/MTokVision, tools, code, streaming

Prices are per million tokens (MTok). See fireworks.ai/pricing for the latest rates.

Choosing a Model
  • Llama 4 Scout and Maverick offer 1M context with vision at low prices -- Fireworks serves these with notably fast time-to-first-token.
  • Llama 3.3 70B is a reliable all-purpose model with 16K max output (higher than most providers offer for this model).
  • DeepSeek R1 provides reasoning capabilities with Fireworks' low-latency serving.

Low-Latency Inference

Fireworks AI is known for optimized model serving that minimizes latency. This makes it a strong choice when response speed matters -- interactive chat, real-time applications, or high-throughput workflows where every millisecond counts.

All models on Fireworks support up to 16K max output tokens, which is higher than many other inference providers offer for the same models.

Configuration

Select Fireworks AI as the provider when creating a bot profile. The app connects to api.fireworks.ai/inference/v1 using the OpenAI-compatible format with Bearer authentication.

Tips for Best Results

  • Choose Fireworks when latency matters most. If you need fast time-to-first-token and quick streaming, Fireworks' optimized infrastructure delivers.
  • Take advantage of the higher max output. At 16K max output tokens, you get longer responses than the same models on other providers.
  • Use Llama 4 Scout for cost-effective long-context work. Same pricing as Together AI but with Fireworks' speed advantage.