Riffusion
State‑of‑the‑art generative AI model used to generate spectrogram images given any text input. These spectrograms can be converted into audio clips.
Generates high resolution spectrograms images from text prompts using a latent diffusion model. This model uses CLIP ViT‑L/14 as text encoder, U‑Net based latent denoising, and VAE based decoder to generate the final image.
Technical Details
Input:Text prompt to generate spectrogram image
Text Encoder Number of parameters:340M
UNet Number of parameters:865M
VAE Decoder Number of parameters:83M
Model size:1GB
Applicable Scenarios
- Music Generation
- Music Editing
- Content Creation
Licenses
Source Model:CREATIVEML-OPENRAIL-M
Deployable Model:CREATIVEML-OPENRAIL-M
Terms of Use:Qualcomm® Generative AI usage and limitations
Tags
- generative-ai
- quantized
Supported IoT Devices
- QCS8550 (Proxy)
Supported IoT Chipsets
- Qualcomm® QCS8550 (Proxy)
Related Models
See all modelsLooking for more? See models created by industry leaders.
Discover Model Makers