Separates a mixed audio file into individual stems (vocals, drums, bass, other instruments) using Meta’s Demucs neural network. Useful for analyzing individual elements of a mixed file, or preparing stems for masking analysis.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| file_path | string | required | Path to mixed audio file |
| output_dir | string | "./stems" | Directory to write separated stems |
| stems | string[] | ["vocals", "drums", "bass", "other"] | Which stems to extract |
Example Output
$ separate_stems full-mix.wav
Stem Separation: full-mix.wav Model: Demucs v4 (htdemucs) Duration: 3:42
Processing… done (18.4s)
Output files: ./stems/vocals.wav (24-bit WAV) ./stems/drums.wav (24-bit WAV) ./stems/bass.wav (24-bit WAV) ./stems/other.wav (24-bit WAV)
Quality estimate: Vocals: high (clear separation) Drums: high (clean transients) Bass: medium (some bleed from kick) Other: medium (residual content)
What the Numbers Mean
-
Quality estimate — Phantom’s confidence in the separation quality for each stem. “High” means clean isolation. “Medium” means some bleed from other sources. Bleed is more common between bass/kick and in dense arrangements.
-
Processing time — Stem separation uses neural networks and is CPU-intensive. Expect 5-20 seconds per minute of audio depending on your machine.
Example Prompts
Full separation
Separate my mix into stems — I want to analyze each element individually
Vocals only
Extract just the vocals from song.wav so I can analyze them
Analysis pipeline
Separate this reference track into stems, then run masking analysis between my vocals and the reference vocals
Related Tools
- analyze_masking — Compare separated stems for frequency overlap
- multi_stem_masking — Analyze all separated stems at once
- batch_diagnostic — Run diagnostics on all separated stems
Pro tip
Stem separation quality drops with heavily compressed or limited audio (less transient information for the model to work with). For best results, use the highest quality source available — uncompressed WAV or FLAC, before any mastering processing.