Qualcomm® AI HubAI Hub

GPT-OSS-20B

State‑of‑the‑art Mixture of Experts large language model with extended context length for text generation tasks.

GPT‑OSS‑20B is a 20.9B parameter Mixture of Experts (MoE) language model with 32 experts (4 active per token). It features an extended 131K context length with YARN rope scaling and uses a GPT‑4o compatible tokenizer. The model is quantized to MXFP4 for efficient on‑device deployment.

Not supported

This model is currently not supported on any Compute chipset.

To see performance metrics for this model on other chipsets, click the button below.

View for other chipsets

Technical Details

Number of parameters:20.91B
Model architecture:Mixture of Experts (MoE)
Number of experts:32
Active experts per token:4
Tokenizer:BPE (GPT-2 style with GPT-4o preprocessing)
Supported languages:English
TTFT:Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt.
Response Rate:Rate of response generation after the first response token.

Applicable Scenarios

  • Dialogue
  • Content Generation
  • Long Context Tasks

License

Tags

  • llm
  • generative-ai

Supported Compute Devices

  • Snapdragon X Elite CRD
  • Snapdragon X2 Elite CRD

Supported Compute Chipsets

  • Snapdragon® X Elite
  • Snapdragon® X2 Elite

Looking for more? See models created by industry leaders.

Discover Model Makers