Qwen3-VL-2B-Instruct
Multimodal 2B vision‑language model capable of understanding text and images.
Qwen3‑VL is a vision‑language model from Alibaba Cloud capable of understanding both text and images for multimodal reasoning tasks such as visual question answering and image captioning.
Not supported
This model is currently not supported on any All Models chipset.
To see performance metrics for this model on other chipsets, click the button below.
View for other chipsetsTechnical Details
Model architecture:Transformer with ViT Vision Encoder, Grouped Query Attention (GQA), and SwiGLU activation.
Supported languages:100+ languages and dialects
TTFT:Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt.
Response Rate:Rate of response generation after the first response token.
Applicable Scenarios
- Dialogue
- Content Generation
Supported Form Factors
- Phone
- Tablet
License
Model:APACHE-2.0
Terms of Use:Qualcomm® Generative AI usage and limitations
Tags
- llm
- generative-ai
Supported Devices
- Dragonwing IQ-9075 EVK
- Snapdragon 8 Elite QRD
- Snapdragon X Elite CRD
- Snapdragon X2 Elite CRD
Supported Chipsets
- Qualcomm® QCS9075
- Snapdragon® 8 Elite Mobile
- Snapdragon® X Elite
- Snapdragon® X2 Elite
Related Models
See all modelsLooking for more? See models created by industry leaders.
Discover Model Makers










