AI researchers at Google have developed a program called MusicLM, which is capable of producing musical compositions based on text prompts. The system can also convert a hummed or whistled tune into other musical instruments, similar to DALL-E's ability to generate images based on text inputs.
While MusicLM is not available for personal use, Google has released a number of samples generated by the model. These include 30-second song snippets created from short descriptions specifying genre, mood, and instruments, as well as longer pieces generated from just a few words, such as "melodic techno." One particularly interesting demo is the "story mode," where the model is given a script to follow and switch between different prompts.
Considering they were created by a program, the samples are more than impressive. The demo site also showcases MusicLM's ability to generate short clips featuring various instruments such as the cello or maracas, although its performance in some cases may be less satisfactory, as is the case with maracas. The site also offers eight-second samples of different genres, music suitable for a prison escape, as well as comparisons between the playing styles of a beginner and advanced pianist. Additionally, MusicLM is capable of interpreting phrases like "futuristic club" and "accordion death metal". It can even simulate human vocals, although there is a grainy or static quality to them that falls short of the real thing.
AI-generated music actually has a rich history dating back decades, with systems credited with composing pop songs, mimicking Bach better than humans in the 90s, and even providing live performance accompaniment. In a recent model, the AI image generation engine StableDiffusion is utilized to convert text prompts into spectrograms that can then be transformed into music. According to the research paper, MusicLM surpasses other systems in terms of quality and adherence to the prompt, as well as its ability to replicate melodies from audio inputs.
The researchers' most impressive demo is perhaps the one where MusicLM transforms a hummed or whistled tune into an electronic synth lead, string quartet, guitar solo, and other instruments. The website allows you to listen to the original audio input and the model's output, and from it's evident that the system performed remarkably well.
As with other AI ventures, Google is exercising significant caution with MusicLM compared to some of its counterparts in the field. The research paper concludes that they have no intention of releasing the models at this point, citing concerns about potential misappropriation of creative content (i.e., plagiarism) and the possibility of cultural appropriation or misrepresentation.
Although it's possible that the technology may appear in one of Google's fun musical experiments in the future, currently, the only ones who can benefit from the research are those who build musical AI systems. Google plans to release a dataset with roughly 5,500 music-text pairs publicly, which can aid in training and assessing other musical AI systems.