Qualcomm® AI HubAI Hub
HomeCompute ModelsStable-Diffusion-v2.1


State-of-the-art generative AI model used to generate detailed images conditioned on text descriptions.

Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.

Snapdragon® X Elite
TorchScripttoQualcomm® AI Engine Direct
Inference Time
Memory Usage

Technical Details

Input:Text prompt to generate image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB

Applicable Scenarios

  • Image Generation
  • Image Editing
  • Content Creation


  • generative-ai
    Models capable of generating text, images, or other data using generative models, often in response to prompts.
  • quantized
    A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.

Supported Compute Chipsets

  • Snapdragon® X Elite