Riffusion

State‑of‑the‑art generative AI model used to generate spectrogram images given any text input. These spectrograms can be converted into audio clips.

Generates high resolution spectrograms images from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.

Model Repository Hugging Face Research Paper

Technical Details

Input:Text prompt to generate spectrogram image

Text Encoder Number of parameters:340M

UNet Number of parameters:865M

VAE Decoder Number of parameters:83M

Model size:1GB

Applicable Scenarios

Music Generation
Music Editing
Content Creation

Licenses

Source Model:CREATIVEML-OPENRAIL-M

Deployable Model:CREATIVEML-OPENRAIL-M

Supported IoT Devices

QCS8550 (Proxy)

Supported IoT Chipsets

Qualcomm® QCS8550 (Proxy)

Related Models

See all models

Stable-Diffusion-v1.5

State-of-the-art generative AI model used to generate detailed images conditioned on text descriptions.

Looking for more? See models created by industry leaders.

Discover Model Makers

By Industry

By Model Maker

New! Run your models on Snapdragon® 8 Elite devices with AI Hub.

Models from G42 now available for purchase on AI Hub

Model Makers

Collaborators

Models from Tech Mahindra now available for purchase on AI Hub

Learn about the collaboration between Amazon SageMaker and AI Hub

Communication

Code

Get help, share stories, and hear announcements on our Slack channel

Visit Qualcomm's organization card on Hugging Face

Get Started

Discover

Read our getting started guide and learn how to use Qualcomm AI Hub

Watch customer stories, view training videos, and more on our Resources page