Qualcomm® AI HubAI Hub

Riffusion

State‑of‑the‑art generative AI model used to generate spectrogram images given any text input. These spectrograms can be converted into audio clips.

Generates high resolution spectrograms images from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.

Technical Details

Input:Text prompt to generate spectrogram image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB

Applicable Scenarios

  • Music Generation
  • Music Editing
  • Content Creation

Supported Form Factors

  • Phone
  • Tablet

Licenses

Tags

  • generative-ai
  • quantized

Supported Devices

  • Google Pixel 3
  • Google Pixel 3a
  • Google Pixel 3a XL
  • Google Pixel 4
  • Google Pixel 4a
  • Google Pixel 5a 5G
  • QCS8550 (Proxy)
  • Samsung Galaxy S21
  • Samsung Galaxy S21 Ultra
  • Samsung Galaxy S21+
  • Samsung Galaxy S22 5G
  • Samsung Galaxy S22 Ultra 5G
  • Samsung Galaxy S22+ 5G
  • Samsung Galaxy S23
  • Samsung Galaxy S23 Ultra
  • Samsung Galaxy S23+
  • Samsung Galaxy S24
  • Samsung Galaxy S24 Ultra
  • Samsung Galaxy S24+
  • Samsung Galaxy Tab S8
  • Snapdragon X Elite CRD
  • Xiaomi 12
  • Xiaomi 12 Pro

Supported Chipsets

  • Qualcomm® QCS8550 (Proxy)
  • Snapdragon® 8 Gen 1 Mobile
  • Snapdragon® 8 Gen 2 Mobile
  • Snapdragon® 8 Gen 3 Mobile
  • Snapdragon® 888 Mobile
  • Snapdragon® X Elite

Related Models

See all models

Looking for more? See models created by industry leaders.

Discover Model Makers