Note
Go to the end to download the full example code.
Use ICA to isolate the instruments in a 100 Gecs song.¶

I really like ICA, and I use it all the time. But it’s not the most intuitive algorithm, and it is relatively niche compared to supervised learning. While teaching my students about it, I wanted to give them an example that was more intuitive than say, artifact removal in EEG data (after all, ICA can be used for much more than just artifact removal). So here’s my attempt at that, and also my attempt to pretend like I know what the kids are listening to these days.
ICA is a method that can be used to separate a complex signal (or image, or whatever else) into its constituent parts.
But there’s a catch. You need to have multiple “observations” of this signal. The canonical example would be an orchestra performance recorded with multiple microphones (a few in each section). Each microphone is one “observation”. Since each microphone picks up a blend of all the instruments, you can use ICA to separate the individual instruments (so you’d have the violins isolated.. 0r the tuba.. etc.).

And no, you probably can’t use ICA to separate the vocals from a song you downloaded from the internet.
Since a an orchestra performance might bore you (and because I don’t have such a recording handy), let’s use a different example. Pretend you are in the studio with 100 gecs. You set up 4 microphones in the room, each recording the drums, bass, an effects track (FX), and vocals. The individual microphones also pick up some of the other instruments, but that’s okay. You can use ICA to separate the components!
from functools import partial
from pathlib import Path
import IPython.display as ipd
import numpy as np
import pooch
from scipy.io import wavfile
from sklearn.decomposition import FastICA
Define some helper functions¶
(You can skip this section if you’re not interested in the details)
def load_audio(wav_path):
"""Load a wav file from disk."""
return wavfile.read(wav_path)
def convert_to_mono(wav_array):
"""Convert stereo audio to mono by averaging the channels."""
return np.mean(wav_array, axis=1)
def normalize_audio(wav_array):
"""Normalize the decibel range to -1 to 1."""
return wav_array / np.max(np.abs(wav_array))
def process_audio(wav_path):
"""Load a wav file, convert stereo to mono, and normalize decibel range."""
sfreq, wav_array = load_audio(wav_path)
if len(wav_array.shape) > 1:
wav_array = convert_to_mono(wav_array)
return sfreq, normalize_audio(wav_array)
def mix_stems(*wavs, mix_matrix):
"""Blend the individual stems together using a mixing matrix."""
return np.dot(mix_matrix, np.array(wavs, dtype=float))
Load the mixed audio¶
We’ll define a data fetcher to download the stems pack from the 10,000 gecs album. Please note that this will download a 1.2 GB file to your machine. Please be patient!
print("Please be patient, this may take a while...")
# We will ignore the guitars stem because it is mostly silent
want_stems = ["Drums.wav", "Bass.wav", "Vocals.wav", "FX.wav"]
members = [
f"10,000 gecs Stems/The Most Wanted Person in the United States/{stem}"
for stem in want_stems
]
unpack = pooch.Unzip(
extract_dir=".", # Relative to the path where the zip file is downloaded
members=members,
)
stem_fpaths = pooch.retrieve(
url="https://www.100gecs.com/uploads/10000gecsstems.zip",
known_hash="sha256:65d2f8dc5cf61a6cd2ac722c2c3bef465b76ca50f5d0363425acdbc5b100e754",
progressbar=True,
path=Path.home() / "100gecs",
processor=unpack,
)
stems_dir = Path(stem_fpaths[0]).parent
Please be patient, this may take a while...
Downloading data from 'https://www.100gecs.com/uploads/10000gecsstems.zip' to file '/home/circleci/100gecs/c7223dbb9070231f8f3ff9630f4b4c13-10000gecsstems.zip'.
0%| | 0.00/1.56G [00:00<?, ?B/s]
0%| | 3.78M/1.56G [00:00<00:41, 37.8MB/s]
1%|▎ | 13.1M/1.56G [00:00<00:25, 60.2MB/s]
1%|▍ | 20.1M/1.56G [00:00<00:24, 63.8MB/s]
2%|▋ | 31.1M/1.56G [00:00<00:18, 81.3MB/s]
3%|▉ | 39.3M/1.56G [00:00<00:19, 77.5MB/s]
3%|█▏ | 50.4M/1.56G [00:00<00:17, 88.0MB/s]
4%|█▍ | 61.5M/1.56G [00:00<00:15, 95.3MB/s]
5%|█▊ | 72.8M/1.56G [00:00<00:14, 101MB/s]
5%|██ | 84.1M/1.56G [00:00<00:14, 104MB/s]
6%|██▎ | 94.6M/1.56G [00:01<00:14, 100MB/s]
7%|██▋ | 105M/1.56G [00:01<00:14, 103MB/s]
7%|██▉ | 116M/1.56G [00:01<00:13, 103MB/s]
8%|███▏ | 127M/1.56G [00:01<00:13, 104MB/s]
9%|███▍ | 137M/1.56G [00:01<00:13, 103MB/s]
9%|███▋ | 147M/1.56G [00:01<00:13, 103MB/s]
10%|███▉ | 158M/1.56G [00:01<00:13, 102MB/s]
11%|████▏ | 169M/1.56G [00:01<00:13, 105MB/s]
12%|████▌ | 180M/1.56G [00:01<00:12, 107MB/s]
12%|████▊ | 192M/1.56G [00:01<00:12, 109MB/s]
13%|█████ | 203M/1.56G [00:02<00:12, 110MB/s]
14%|█████▎ | 214M/1.56G [00:02<00:12, 111MB/s]
14%|█████▋ | 225M/1.56G [00:02<00:11, 111MB/s]
15%|█████▉ | 237M/1.56G [00:02<00:11, 112MB/s]
16%|██████▏ | 248M/1.56G [00:02<00:11, 112MB/s]
17%|██████▌ | 259M/1.56G [00:02<00:11, 112MB/s]
17%|██████▊ | 271M/1.56G [00:02<00:11, 113MB/s]
18%|███████ | 282M/1.56G [00:02<00:11, 113MB/s]
19%|███████▎ | 293M/1.56G [00:02<00:11, 113MB/s]
20%|███████▋ | 305M/1.56G [00:02<00:11, 112MB/s]
20%|███████▉ | 316M/1.56G [00:03<00:11, 113MB/s]
21%|████████▏ | 327M/1.56G [00:03<00:10, 112MB/s]
22%|████████▍ | 338M/1.56G [00:03<00:10, 113MB/s]
22%|████████▊ | 350M/1.56G [00:03<00:10, 113MB/s]
23%|█████████ | 361M/1.56G [00:03<00:10, 112MB/s]
24%|█████████▎ | 372M/1.56G [00:03<00:10, 113MB/s]
25%|█████████▌ | 384M/1.56G [00:03<00:10, 112MB/s]
25%|█████████▉ | 395M/1.56G [00:03<00:10, 112MB/s]
26%|██████████▏ | 406M/1.56G [00:03<00:10, 112MB/s]
27%|██████████▍ | 417M/1.56G [00:03<00:10, 113MB/s]
28%|██████████▊ | 429M/1.56G [00:04<00:09, 113MB/s]
28%|███████████ | 440M/1.56G [00:04<00:09, 113MB/s]
29%|███████████▎ | 452M/1.56G [00:04<00:09, 113MB/s]
30%|███████████▌ | 463M/1.56G [00:04<00:09, 113MB/s]
30%|███████████▉ | 474M/1.56G [00:04<00:09, 113MB/s]
31%|████████████▏ | 486M/1.56G [00:04<00:09, 113MB/s]
32%|████████████▍ | 497M/1.56G [00:04<00:09, 113MB/s]
33%|████████████▋ | 508M/1.56G [00:04<00:09, 113MB/s]
33%|█████████████ | 520M/1.56G [00:04<00:09, 114MB/s]
34%|█████████████▎ | 531M/1.56G [00:04<00:09, 114MB/s]
35%|█████████████▌ | 542M/1.56G [00:05<00:08, 114MB/s]
36%|█████████████▉ | 554M/1.56G [00:05<00:08, 114MB/s]
36%|██████████████▏ | 565M/1.56G [00:05<00:08, 114MB/s]
37%|██████████████▍ | 577M/1.56G [00:05<00:08, 113MB/s]
38%|██████████████▋ | 588M/1.56G [00:05<00:08, 113MB/s]
39%|███████████████ | 599M/1.56G [00:05<00:08, 112MB/s]
39%|███████████████▎ | 610M/1.56G [00:05<00:08, 112MB/s]
40%|███████████████▌ | 622M/1.56G [00:05<00:08, 112MB/s]
41%|███████████████▊ | 633M/1.56G [00:05<00:08, 112MB/s]
41%|████████████████▏ | 644M/1.56G [00:05<00:08, 112MB/s]
42%|████████████████▍ | 655M/1.56G [00:06<00:08, 112MB/s]
43%|████████████████▋ | 667M/1.56G [00:06<00:07, 112MB/s]
44%|█████████████████ | 678M/1.56G [00:06<00:07, 113MB/s]
44%|█████████████████▎ | 689M/1.56G [00:06<00:07, 113MB/s]
45%|█████████████████▌ | 701M/1.56G [00:06<00:07, 113MB/s]
46%|█████████████████▊ | 712M/1.56G [00:06<00:07, 113MB/s]
47%|██████████████████▏ | 723M/1.56G [00:06<00:07, 113MB/s]
47%|██████████████████▍ | 735M/1.56G [00:06<00:07, 113MB/s]
48%|██████████████████▋ | 746M/1.56G [00:06<00:07, 113MB/s]
49%|██████████████████▉ | 757M/1.56G [00:06<00:07, 113MB/s]
49%|███████████████████▎ | 769M/1.56G [00:07<00:06, 113MB/s]
50%|███████████████████▌ | 780M/1.56G [00:07<00:06, 113MB/s]
51%|███████████████████▊ | 791M/1.56G [00:07<00:06, 113MB/s]
52%|████████████████████▏ | 803M/1.56G [00:07<00:06, 113MB/s]
52%|████████████████████▍ | 814M/1.56G [00:07<00:06, 113MB/s]
53%|████████████████████▋ | 825M/1.56G [00:07<00:06, 113MB/s]
54%|████████████████████▉ | 837M/1.56G [00:07<00:06, 113MB/s]
55%|█████████████████████▎ | 848M/1.56G [00:07<00:06, 113MB/s]
55%|█████████████████████▌ | 859M/1.56G [00:07<00:06, 113MB/s]
56%|█████████████████████▊ | 871M/1.56G [00:07<00:06, 112MB/s]
57%|██████████████████████ | 882M/1.56G [00:08<00:06, 112MB/s]
57%|██████████████████████▍ | 893M/1.56G [00:08<00:05, 112MB/s]
58%|██████████████████████▋ | 905M/1.56G [00:08<00:05, 113MB/s]
59%|██████████████████████▉ | 916M/1.56G [00:08<00:05, 113MB/s]
60%|███████████████████████▎ | 927M/1.56G [00:08<00:05, 113MB/s]
60%|███████████████████████▌ | 939M/1.56G [00:08<00:05, 113MB/s]
61%|███████████████████████▊ | 950M/1.56G [00:08<00:05, 113MB/s]
62%|████████████████████████ | 961M/1.56G [00:08<00:05, 113MB/s]
63%|████████████████████████▍ | 972M/1.56G [00:08<00:05, 112MB/s]
63%|████████████████████████▋ | 984M/1.56G [00:08<00:05, 112MB/s]
64%|████████████████████████▉ | 995M/1.56G [00:09<00:04, 112MB/s]
65%|████████████████████████▌ | 1.01G/1.56G [00:09<00:04, 112MB/s]
65%|████████████████████████▊ | 1.02G/1.56G [00:09<00:04, 112MB/s]
66%|█████████████████████████▏ | 1.03G/1.56G [00:09<00:04, 113MB/s]
67%|█████████████████████████▍ | 1.04G/1.56G [00:09<00:04, 113MB/s]
68%|█████████████████████████▋ | 1.05G/1.56G [00:09<00:04, 113MB/s]
68%|█████████████████████████▉ | 1.06G/1.56G [00:09<00:04, 113MB/s]
69%|██████████████████████████▏ | 1.07G/1.56G [00:09<00:04, 113MB/s]
70%|██████████████████████████▌ | 1.09G/1.56G [00:09<00:04, 113MB/s]
71%|██████████████████████████▊ | 1.10G/1.56G [00:09<00:04, 113MB/s]
71%|███████████████████████████ | 1.11G/1.56G [00:10<00:03, 113MB/s]
72%|███████████████████████████▎ | 1.12G/1.56G [00:10<00:03, 113MB/s]
73%|███████████████████████████▋ | 1.13G/1.56G [00:10<00:03, 113MB/s]
73%|███████████████████████████▉ | 1.14G/1.56G [00:10<00:03, 113MB/s]
74%|████████████████████████████▏ | 1.15G/1.56G [00:10<00:03, 113MB/s]
75%|████████████████████████████▍ | 1.16G/1.56G [00:10<00:03, 113MB/s]
76%|████████████████████████████▋ | 1.18G/1.56G [00:10<00:03, 113MB/s]
76%|█████████████████████████████ | 1.19G/1.56G [00:10<00:03, 112MB/s]
77%|█████████████████████████████▎ | 1.20G/1.56G [00:10<00:03, 112MB/s]
78%|█████████████████████████████▌ | 1.21G/1.56G [00:11<00:03, 112MB/s]
79%|█████████████████████████████▊ | 1.22G/1.56G [00:11<00:02, 112MB/s]
79%|██████████████████████████████ | 1.23G/1.56G [00:11<00:02, 112MB/s]
80%|██████████████████████████████▍ | 1.24G/1.56G [00:11<00:02, 112MB/s]
81%|██████████████████████████████▋ | 1.25G/1.56G [00:11<00:02, 112MB/s]
81%|██████████████████████████████▉ | 1.27G/1.56G [00:11<00:02, 112MB/s]
82%|███████████████████████████████▏ | 1.28G/1.56G [00:11<00:02, 111MB/s]
83%|███████████████████████████████▍ | 1.29G/1.56G [00:11<00:02, 111MB/s]
84%|███████████████████████████████▋ | 1.30G/1.56G [00:11<00:02, 111MB/s]
84%|████████████████████████████████ | 1.31G/1.56G [00:11<00:02, 112MB/s]
85%|████████████████████████████████▎ | 1.32G/1.56G [00:12<00:02, 112MB/s]
86%|████████████████████████████████▌ | 1.33G/1.56G [00:12<00:01, 112MB/s]
86%|████████████████████████████████▊ | 1.34G/1.56G [00:12<00:01, 112MB/s]
87%|█████████████████████████████████ | 1.36G/1.56G [00:12<00:01, 112MB/s]
88%|█████████████████████████████████▍ | 1.37G/1.56G [00:12<00:01, 112MB/s]
89%|█████████████████████████████████▋ | 1.38G/1.56G [00:12<00:01, 112MB/s]
89%|█████████████████████████████████▉ | 1.39G/1.56G [00:12<00:01, 112MB/s]
90%|██████████████████████████████████▏ | 1.40G/1.56G [00:12<00:01, 112MB/s]
91%|██████████████████████████████████▍ | 1.41G/1.56G [00:12<00:01, 113MB/s]
92%|██████████████████████████████████▊ | 1.42G/1.56G [00:12<00:01, 113MB/s]
92%|███████████████████████████████████ | 1.43G/1.56G [00:13<00:01, 113MB/s]
93%|███████████████████████████████████▎ | 1.45G/1.56G [00:13<00:00, 112MB/s]
94%|███████████████████████████████████▌ | 1.46G/1.56G [00:13<00:00, 113MB/s]
94%|███████████████████████████████████▉ | 1.47G/1.56G [00:13<00:00, 113MB/s]
95%|████████████████████████████████████▏ | 1.48G/1.56G [00:13<00:00, 113MB/s]
96%|████████████████████████████████████▍ | 1.49G/1.56G [00:13<00:00, 112MB/s]
97%|████████████████████████████████████▋ | 1.50G/1.56G [00:13<00:00, 112MB/s]
97%|████████████████████████████████████▉ | 1.51G/1.56G [00:13<00:00, 112MB/s]
98%|█████████████████████████████████████▏| 1.52G/1.56G [00:13<00:00, 112MB/s]
99%|█████████████████████████████████████▌| 1.54G/1.56G [00:13<00:00, 112MB/s]
99%|█████████████████████████████████████▊| 1.55G/1.56G [00:14<00:00, 112MB/s]
0%| | 0.00/1.56G [00:00<?, ?B/s]
100%|█████████████████████████████████████| 1.56G/1.56G [00:00<00:00, 5.42TB/s]
Extracting '10,000 gecs Stems/The Most Wanted Person in the United States/Drums.wav' from '/home/circleci/100gecs/c7223dbb9070231f8f3ff9630f4b4c13-10000gecsstems.zip' to '/home/circleci/100gecs/.'
Extracting '10,000 gecs Stems/The Most Wanted Person in the United States/Bass.wav' from '/home/circleci/100gecs/c7223dbb9070231f8f3ff9630f4b4c13-10000gecsstems.zip' to '/home/circleci/100gecs/.'
Extracting '10,000 gecs Stems/The Most Wanted Person in the United States/Vocals.wav' from '/home/circleci/100gecs/c7223dbb9070231f8f3ff9630f4b4c13-10000gecsstems.zip' to '/home/circleci/100gecs/.'
Extracting '10,000 gecs Stems/The Most Wanted Person in the United States/FX.wav' from '/home/circleci/100gecs/c7223dbb9070231f8f3ff9630f4b4c13-10000gecsstems.zip' to '/home/circleci/100gecs/.'
Load the stems¶
We’ll load the stems and process them by converting stereo to mono and normalizing the decibel range.
sfreq, drums = process_audio(stems_dir / "Drums.wav")
# For memory purposes, let's cut the recording in half
n_samples = drums.shape[0]
crop = n_samples // 2
drums = drums[:crop]
bass = process_audio(stems_dir / "Bass.wav")[1][:crop]
fx = process_audio(stems_dir / "FX.wav")[1][:crop]
vocals = process_audio(stems_dir / "Vocals.wav")[1][:crop]
mix_matrix = np.array([0.50, 0.20, 0.15, 0.15])
mix_func = partial(mix_stems, mix_matrix=mix_matrix)
drums = mix_func(drums, bass, fx, vocals)
bass = mix_func(bass, fx, vocals, drums)
fx = mix_func(fx, vocals, drums, bass)
vocals = mix_func(vocals, drums, bass, fx)
/home/circleci/project/examples/blogs/plot_most_wanted.py:56: WavFileWarning: Chunk (non-data) not understood, skipping it.
return wavfile.read(wav_path)
Here is one (blended) stem for reference¶
ipd.Audio(fx, rate=sfreq)