
Contents
Beat Tracking by Dynamic Programming
This page illustrates the use of the beat_* functions to implement a simple music audio beat tracker based on dynamic programming, as described in: D. Ellis "Beat Tracking by Dynamic Programming" J. New Music Research 36(1): 51-60, March 2007, DOI: 10.1080/09298210701653344.
% First, load an example sound file wavfilename = 'train01.wav'; [d,sr] = wavread(wavfilename); % Now, calculate the "onset strength" function as the sum of the % differentiated and half-rectified energy in a Mel-scale % time-frequency transform: [onset_fn, osr, sgram, tt, ff] = beat_onset(d,sr); % osr returns the sampling rate of the frames of onset_fn % sgram returns the mel spectrogram as an array, with tt and ff % being the time and frequency labels. % The dynamic programming approach needs an estimate of global % tempo, so we calculate one by autocorrelating the onset function, % applying a bias window, and choosing the biggest peak. % Optionally, it plots the windowed autocorrelation for us subplot(211) display = 1; tempo = beat_tempo(onset_fn, osr, display); % Now we can run the dynamic programming beat tracker beats = beat_simple(onset_fn, osr, tempo); % We have some helper functions: one to plot the beat times on top % of the Mel spectrogram (which we saved from beat_onset) subplot(212) beat_plot(beats, '-r', tt, ff, sgram); % And we can listen to the result; the system-found beats are % marked by little bursts of white noise superimposed on the % original: beat_play(beats, d, sr); % We can go straight from soundfile to beats with beat_track, which % just provides a wrapper around the steps above: beats = beat_track(wavfilename);
Warning: The playback thread did not start within one second.

Ground truth
We can also read in ground truth for the mirex06 McKinney/Moelants tapping data:
tapfilename = 'train01.txt'; truth = beat_ground_truth(tapfilename); % By default this only returns the subset of tap records that are % consistentn with the most popular tempo. Multiple sequences are % returned in separate cells of a cell array; we can plot and % listen to them too: beat_plot(truth{1}, 'xb'); beat_play(truth{1}, d(1:10*sr), sr); % We can also plot all the individual ground-truth taps in a % "scatter" format: subplot(211) beat_gt_plot(truth)

Scoring results
We can score a beat track against a ground truth by counting how many true beats are missed (deletions), and how many system-generated beats don't correspond to true beats (insertions):
collar = 0.2; % accept a beat within +/-20% of the tempo period verbose = 1; [err,ins,del,tru,hh,dd] = beat_score(beats,truth{1},collar,verbose); % In this case, only the first beat is 'wrong', because the human % was late to pick up the beat. % We can score against all the .. length(truth) % .. 35 different tap records for this tempo by passing them all to % the scoring function: [err,ins,del,tru,hh,dd] = beat_score(beats,truth,collar,verbose); % We don't line up with all the human taps, but then no single beat % track ever could because the humans have too much spread.
Overall error= 3.2% ( 1 ins, 1 del, 63 true) ans = 40 Overall error= 25.1% ( 330 ins, 89 del, 2279 true)
Testing against a set of examples
We can evaluate the beat tracker against a whole set of examples:
dirname = 'mirex06examples'; files = dir(fullfile(dirname,'*.wav')); for i = 1:length(files); fnames{i} = fullfile(dirname,files(i).name); end beat_test(fnames); % Overall average error rate is high, but it's highly variable % across examples.
Error for train01 (tempo est=129.2, users=129.3 BPM, 35 true tracks): Overall error= 11.6% ( 159 ins, 84 del, 2130 true) Error for train02 (tempo est=168.1, users= 83.9 BPM, 26 true tracks): Overall error= 113.3% (1129 ins, 13 del, 1016 true) Error for train03 (tempo est=101.8, users=153.7 BPM, 26 true tracks): Overall error= 110.1% ( 740 ins, 1291 del, 1851 true) Error for train04 (tempo est=127.6, users= 42.0 BPM, 28 true tracks): Overall error= 223.6% (1179 ins, 4 del, 538 true) Error for train05 (tempo est= 68.4, users= 68.5 BPM, 34 true tracks): Overall error= 11.0% ( 71 ins, 36 del, 1087 true) Error for train06 (tempo est=164.1, users= 82.0 BPM, 25 true tracks): Overall error= 164.6% (1128 ins, 22 del, 920 true) Error for train07 (tempo est=103.4, users= 56.5 BPM, 34 true tracks): Overall error= 143.4% (1035 ins, 214 del, 879 true) Error for train08 (tempo est=147.7, users=147.8 BPM, 26 true tracks): Overall error= 21.5% ( 238 ins, 155 del, 1815 true) Error for train09 (tempo est=128.4, users=127.7 BPM, 25 true tracks): Overall error= 17.0% ( 165 ins, 88 del, 1498 true) Error for train10 (tempo est= 61.2, users= 61.2 BPM, 20 true tracks): Overall error= 10.3% ( 41 ins, 15 del, 574 true) Error for train11 (tempo est=139.2, users=139.9 BPM, 26 true tracks): Overall error= 20.0% ( 202 ins, 133 del, 1725 true) Error for train12 (tempo est=122.0, users= 54.1 BPM, 27 true tracks): Overall error= 161.4% ( 999 ins, 81 del, 675 true) Error for train13 (tempo est=119.5, users=181.3 BPM, 30 true tracks): Overall error= 112.3% (1039 ins, 1810 del, 2541 true) Error for train14 (tempo est=130.0, users=130.7 BPM, 29 true tracks): Overall error= 12.2% ( 126 ins, 94 del, 1824 true) Error for train15 (tempo est=185.4, users= 62.0 BPM, 28 true tracks): Overall error= 207.2% (1709 ins, 4 del, 831 true) Error for train16 (tempo est=182.9, users= 90.7 BPM, 20 true tracks): Overall error= 131.3% (1022 ins, 28 del, 806 true) Error for train17 (tempo est= 93.1, users= 46.1 BPM, 24 true tracks): Overall error= 127.0% ( 611 ins, 66 del, 535 true) Error for train18 (tempo est=129.6, users= 61.2 BPM, 34 true tracks): Overall error= 151.1% (1320 ins, 177 del, 996 true) Error for train19 (tempo est= 93.5, users=188.1 BPM, 34 true tracks): Overall error= 63.5% ( 221 ins, 1730 del, 3073 true) Error for train20 (tempo est=110.0, users=219.8 BPM, 38 true tracks): Overall error= 76.9% ( 560 ins, 2551 del, 4043 true) Overall average error: 94.5% (13694 ins, 8596 del, 29357 true)
Acknowledgements
This material was developed for my course ELEN E4896 Music Signal Processing, under partial support from the NSF under project IIS-1117015.
2012-03-28 Dan Ellis dpwe@ee.columbia.edu