Riffusion

State‑of‑the‑art generative AI model used to generate spectrogram images of music given a text prompt. These spectrograms can be converted into audio clips.

Generates high resolution spectrograms images of music from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.

Model Repository Hugging Face Research Paper

Technical Details

Input:Text prompt to generate spectrogram image

Text Encoder Number of parameters:340M

UNet Number of parameters:865M

VAE Decoder Number of parameters:83M

Model size:1GB

Applicable Scenarios

Music Generation
Music Editing
Content Creation

Supported Form Factors

Phone
Tablet

Licenses

Source Model:CREATIVEML-OPENRAIL-M

Deployable Model:CREATIVEML-OPENRAIL-M

Supported Devices

QCS8550 (Proxy)
Samsung Galaxy S23
Samsung Galaxy S23 Ultra
Samsung Galaxy S23+
Samsung Galaxy S24
Samsung Galaxy S24 Ultra
Samsung Galaxy S24+
Snapdragon X Elite CRD

Supported Chipsets

Qualcomm® QCS8550 (Proxy)
Snapdragon® 8 Gen 2 Mobile
Snapdragon® 8 Gen 3 Mobile
Snapdragon® X Elite
Snapdragon® X Plus 8-Core

Related Models

See all models

Stable-Diffusion-v1.5

State-of-the-art generative AI model used to generate detailed images conditioned on text descriptions.

Looking for more? See models created by industry leaders.

Discover Model Makers

By Industry

By Model Maker

New! Run your models on Snapdragon® 8 Elite devices with AI Hub.

Models from G42 now available for purchase on AI Hub

Model Makers

Collaborators

Models from Tech Mahindra now available for purchase on AI Hub

Learn about the collaboration between Amazon SageMaker and AI Hub

Communication

Code

Get help, share stories, and hear announcements on our Slack channel

Visit Qualcomm's organization card on Hugging Face

Get Started

Discover

Read our getting started guide and learn how to use Qualcomm AI Hub

Check out news, training videos, customer stories and more on our Resources page