Riffusion
State‑of‑the‑art generative AI model used to generate spectrogram images of music given a text prompt. These spectrograms can be converted into audio clips.
Generates high resolution spectrograms images of music from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.
Technical Details
Input:Text prompt to generate spectrogram image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB
Applicable Scenarios
- Music Generation
- Music Editing
- Content Creation
Supported Form Factors
- Phone
- Tablet
Licenses
Source Model:CREATIVEML-OPENRAIL-M
Deployable Model:CREATIVEML-OPENRAIL-M
Terms of Use:Qualcomm® Generative AI usage and limitations
Tags
- generative-ai
Supported Devices
- QCS8550 (Proxy)
- Samsung Galaxy S23
- Samsung Galaxy S23 Ultra
- Samsung Galaxy S23+
- Samsung Galaxy S24
- Samsung Galaxy S24 Ultra
- Samsung Galaxy S24+
- Snapdragon X Elite CRD
Supported Chipsets
- Qualcomm® QCS8550 (Proxy)
- Snapdragon® 8 Gen 2 Mobile
- Snapdragon® 8 Gen 3 Mobile
- Snapdragon® X Elite
- Snapdragon® X Plus 8-Core
Related Models
See all modelsLooking for more? See models created by industry leaders.
Discover Model Makers