Stable Audio is a new text-to-music generator that can generate original high-quality music and audio clips at a 44.1 kHz sample rate of specified length just from a text prompt (more examples later).
Stable Audio is built by Stability AI, the same company that created the famous AI image generation model Stable Diffusion, and just recently announced the release of StableCode – a code generation LLM that was trained on 560B tokens of code.
Why Stable Audio is different?
Just like some other audio generation models, Stable Audio is a diffusion model.
But unlike other diffusion AI models for music, Stable Audio was trained on 800,000 audio files containing music, sound effects, and single-instrument stems with additional metadata and timing conditioning.
AudioSparx Global, a music and sound effects library, provided up to over 19,500 hours of audio for training with corresponding text metadata.
As a result, this allows for more granular control over genres, content, and length of the generated samples.
Is Stable Audio Open Source?
No, not at the moment. As of September 2023, Stability AI has not yet released the model or the code to train a model similar to Stable Audio.
However, Harmonai, a research lab under Stability AI, promised to release an open-source model based on Stable Audio architecture and different training data, as well as the training code that will allow you to generate your own music generation models.
What Kind of Music Can You Create with Stable Audio
With Stable Audio you can generate stereo audio at a 44.1kHz sample rate for all sorts of music genres and sound effects.
The maximum length for free users is 45 seconds, and 90 seconds for Pro users.
Music is generated by simple text prompt input where you specify the “tags” of a music sample like instruments that you want to hear, the mood, sound effects, and the tempo in BPM.
For example, a great prompt could look like this:
Trance, Ibiza, Beach, Sun, 4 AM, Progressive, Synthesizer, 909, Dramatic Chords, Choir, Euphoric, Nostalgic, Dynamic, Flowing
And it would sound like this:
Although Stable Audio is kind of the first of its kind diffusion model, text to music AI generators aren’t a new thing.
A month prior to the release of Stable Audio, Meta has released its AudioCraft – a generative AI framework that consists of three music generation models.
And several months before that Google has released MusicLM to its beta testers in the AI Test Kitchen.