You might have heard the term “AI music,” but you probably never really thought about what it means. Here is what most people expect:
This amazing piece of AI music was created by a company named Aiva, which uses cutting-edge machine learning technology to create tunes like that. And there are also AI bands releasing great music online using similar machine learning programs.
But, what you may not realize, is that the original version of the song does not sound anything like what you hear. They usually use music editing software to vastly improve on what the AI comes up with, as shown in the behind-the-scenes video below:
The main output of the AI is an original melody, usually in the form of a MIDI piano track. The problem comes when you start trying to have the computer automatically add other instruments to it. It does not have a good sense of the beat and how to mix various instruments together to follow the piano melody. And, making things worse, MIDI files most of the time have a very computerish/fake sound, like old video game music. You would never confuse it with a real song you might hear on the radio.
Because of all of this, AI music companies and AI bands generally use a mixture of human input and AI input (see some good examples in this Google blog posting ). And what I have talked about so far does not even get into all the issues involved with having the AI write lyrics and sing the song.
For comparison purposes, here’s my AI masterpiece, where the computer created everything (song title, lyrics, melody, music, and vocals):
Yes, it totally sucks. I didn’t expect a top 40 hit, though. What matters is that it is one of the few songs I know of that was 100% made using AI from start to finish. You can listen to 10 of these songs, and also view the lyrics, on my AI Music Site .
Here’s how I did it:
The lyrics were written using the open-source GPT-2 Simple natural language program, which I trained on poems and song lyrics as described in my article how I created a lyrics generator about this lyrics site . And the song title was chosen based on various non-AI rules, mostly relating to what line is used the most in the lyrics.
The vocals were one of the hardest parts. It has to detect the pitch of the melody in the music, convert text to voice, and adjust the voice to match the notes of the melody. Then on top of that, it has to try to figure out exactly where the virtual singer should say each word, to stay on beat.
There was also an issue that I couldn’t just simply feed the MIDI music file to the vocal synthesis program directly. First, I had to convert the original MIDI file into a new MIDI with only one channel containing the piano part, because that is usually the melody the singer needs to follow. I used an open-source program called banana-split to get this done.
Next, I used an open-source virtual singer program named midi2voice to create the vocals (a WAV file), using the lyrics and music as input. The final step was to combine that voice file with the original MIDI music file to produce the song. I did this by converting the MIDI file to WAV, and then using ffmpeg ’s amerge command like this:
ffmpeg -i imabandoned.wav -i imabandoned_singer.wav -filter_complex amerge=inputs=2 -ac 2 imabandoned_final.mp3
In the end, I accomplished my goal, which was to set up an automated framework for creating an AI song with no human intervention needed. Not just for music like everyone else is focusing on, but with lyrics and vocals to make it into a real song. I knew ahead of time it was not going to sound very good, but this is just a start. Now that I have a demo version, I can work on making improvements to it. Maybe even someday launch an AI rock star, with CDs, merch, and virtual concerts. But I have a long way to go.
Some Additional AI Music Resources:
Pop Music Maker: This article has great info about this topic, plus a link to his open-source program.
Dadabots — They make AI music using raw audio instead of MIDI, so the results sound much more real. But, much of the output does not sound good, so they need to manually curate many short snippets of music into a song.
Neural Story Teller (see the bottom of the page) — Part of the Songs From Pi project. They do the same kind of thing I did, but using very different methods, and explain in an academic paper how they did it.
Mellotron — High-quality voice synthesis from NVIDIA, for singing.
Adversarially Trained End-to-end Korean Singing Voice Synthesis System — It is crazy how real this sounds, but they did not release the code for it, so it is way too hard for me to replicate.