Qualcomm® AI HubAI Hub

Riffusion

State‑of‑the‑art generative AI model used to generate spectrogram images of music given a text prompt. These spectrograms can be converted into audio clips.

Generates high resolution spectrograms images of music from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.

Technical Details

Input:Text prompt to generate spectrogram image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB

Applicable Scenarios

  • Music Generation
  • Music Editing
  • Content Creation

Licenses

Tags

  • generative-ai

Supported Compute Devices

  • Snapdragon X Elite CRD

Supported Compute Chipsets

  • Snapdragon® X Elite
  • Snapdragon® X Plus 8-Core

Related Models

See all models

Looking for more? See models created by industry leaders.

Discover Model Makers