Qualcomm® AI HubAI Hub
HomeIoT ModelsStable-Diffusion-v2.1


State-of-the-art generative AI model used to generate detailed images conditioned on text descriptions.

Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.

Not supported

This model is currently not supported on any IoT chipset.

To see performance metrics for this model on other chipsets, click the button below.

View for other chipsets

Technical Details

Input:Text prompt to generate image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB

Applicable Scenarios

  • Image Generation
  • Image Editing
  • Content Creation


  • generative-ai
    Models capable of generating text, images, or other data using generative models, often in response to prompts.
  • quantized
    A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.

Supported IoT Devices

  • QCS8550 (Proxy)

Supported IoT Chipsets

  • Qualcomm® QCS8550