Skip to main content
W&B Inference provides access to several open-source foundation models. Each model has different strengths and use cases.

Model catalog

ModelModel ID (for API usage)TypeContext WindowParametersDescription
DeepSeek R1-0528deepseek-ai/DeepSeek-R1-0528Text161K37B-680B (Active-Total)Optimized for precise reasoning tasks including complex coding, math, and structured document analysis
DeepSeek V3-0324deepseek-ai/DeepSeek-V3-0324Text161K37B-680B (Active-Total)Robust Mixture-of-Experts model tailored for high-complexity language processing and comprehensive document analysis
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1Text128K37B-671B (Active-Total)A large hybrid model that supports both thinking and non-thinking modes via prompt templates
Meta Llama 3.1 8Bmeta-llama/Llama-3.1-8B-InstructText128K8B (Total)Efficient conversational model optimized for responsive multilingual chatbot interactions
Meta Llama 3.3 70Bmeta-llama/Llama-3.3-70B-InstructText128K70B (Total)Multilingual model excelling in conversational tasks, detailed instruction-following, and coding
Meta Llama 4 Scoutmeta-llama/Llama-4-Scout-17B-16E-InstructText, Vision64K17B-109B (Active-Total)Multi-modal model integrating text and image understanding, ideal for visual tasks and combined analysis
Microsoft Phi 4 Mini 3.8Bmicrosoft/Phi-4-mini-instructText128K3.8B (Active-Total)Compact, efficient model ideal for fast responses in resource-constrained environments
MoonshotAI Kimi K2moonshotai/Kimi-K2-InstructText128K32B-1T (Active-Total)Mixture-of-Experts model optimized for complex tool use, reasoning, and code synthesis
OpenAI GPT OSS 20Bopenai/gpt-oss-20bText131K3.6B-20B (Active-Total)Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities
OpenAI GPT OSS 120Bopenai/gpt-oss-120bText131K5.1B-117B (Active-Total)Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases
OpenPipe Qwen3 14B InstructOpenPipe/Qwen3-14B-InstructText32.8K14.8B (Active-Total)An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.
Qwen2.5 14B InstructQwen/Qwen2.5-14B-InstructText32.8K14.7B-14.7B (Active-Total)Dense multilingual instruction-tuned model with tool-use and structured output support
Qwen3 235B A22B Thinking-2507Qwen/Qwen3-235B-A22B-Thinking-2507Text262K22B-235B (Active-Total)High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation
Qwen3 235B A22B-2507Qwen/Qwen3-235B-A22B-Instruct-2507Text262K22B-235B (Active-Total)Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning
Qwen3 Coder 480B A35BQwen/Qwen3-Coder-480B-A35B-InstructText262K35B-480B (Active-Total)Mixture-of-Experts model optimized for coding tasks such as function calling, tooling use, and long-context reasoning
Z.AI GLM 4.5zai-org/GLM-4.5Text131K32B-355B (Active-Total)Mixture-of-Experts model with user-controllable thinking/non-thinking modes for reasoning, code, and agents

Using model IDs

When using the API, specify the model using its ID from the table above. For example:
response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[...]
)

Next steps

I