Stable-Diffusion

State-of-the-art generative AI model used to generate detailed images conditioned on text descriptions.

Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.

TorchScriptQualcomm® AI Engine Direct

8.08ms

Inference Time

0-137MB

Memory Usage

570NPU

Layers

See more metrics

Model Repository Hugging Face Research Paper

Technical Details

Input:Text prompt to generate image

QNN-SDK:2.19

Text Encoder Number of parameters:340M

UNet Number of parameters:865M

VAE Decoder Number of parameters:83M

Model size:1GB

Applicable Scenarios

Image Generation
Image Editing
Content Creation

Supported Form Factors

Phone
Tablet

Licenses

Source Model:CREATIVEML-OPENRAIL-M

Deployable Model:CREATIVEML-OPENRAIL-M

Supported Devices

Samsung Galaxy S23
Samsung Galaxy S23 Ultra
Samsung Galaxy S23+
Samsung Galaxy S24
Samsung Galaxy S24 Ultra

Supported Chipsets

Snapdragon® 8 Gen 2 Mobile
Snapdragon® 8 Gen 3 Mobile

Qualcomm® AI HubAI HubQualcomm® AI HubAI Hub