Generating visual arts from text prompt and input guiding image.

On-device, high-resolution image synthesis from text and image prompts. ControlNet guides Stable-diffusion with provided input image to generate accurate images from given input prompt.

TorchScripttoQualcomm® AI Engine Direct
Inference Time
Memory Usage

Technical Details

Input:Text prompt and input image as a reference
Conditioning Input:Canny-Edge
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
ControlNet Number of parameters:361M
Model size:1.4GB

Applicable Scenarios

  • Image Generation
  • Image Editing
  • Content Creation

Supported Form Factors

  • Phone
  • Tablet



  • generative-ai
    Models capable of generating text, images, or other data using generative models, often in response to prompts.
  • quantized
    A “quantized” model can run in low or mixed precision, which can substantially reduce inference latency.

Supported Devices

  • Samsung Galaxy S23
  • Samsung Galaxy S23 Ultra
  • Samsung Galaxy S23+
  • Samsung Galaxy S24
  • Samsung Galaxy S24 Ultra

Supported Chipsets

  • Snapdragon® 8 Gen 2 Mobile
  • Snapdragon® 8 Gen 3 Mobile