Skip to Content

Publications of Leigh M. Smith

This is a list of my recent publications on music cognition, and musical rhythm representation.

Rebecca S. Schaefer, Shinichi Furuya, Leigh M. Smith, Blair Bohannan Kaneshiro and Petri Toiviainen

Psychomusicology: Music, Mind and Brain, 22(2):168–174, 2012

Recent neuroscience research has shown increasing use of multivariate decoding methods and machine learning. These methods, by uncovering the source and nature of informative variance in large data sets, invert the classical direction of inference that attempts to explain brain activity from mental state variables or stimulus features. However, these techniques are not yet commonly used among music researchers. In this position article, we introduce some key features of machine learning methods and review their use in the field of cognitive and behavioral neuroscience of music. We argue for the great potential of these methods in decoding multiple data types, specifically audio waveforms, electroen- cephalography, functional MRI, and motion capture data. By finding the most informative aspects of stimulus and performance data, hypotheses can be generated pertaining to how the brain processes incoming musical information and generates behavioral output, respectively. Importantly, these methods are also applicable to different neural and physiological data types such as magnetoencephalography, near-infrared spectroscopy, positron emission tomography, and electromyography.

Leigh M. Smith, Stephen T. Pope, Jay Leboeuf and Steve Tjoa

Proceedings of the 12th International Conference on Music Perception and Cognition, page 943, Thessaloniki, Greece, July 2012. ICMPC/ESCOM. (abstract).

A software system, MediaMined, is described for the efficient analysis and classification of auditory signals. This system has been applied to the tasks of musical instrument identification, classifying musical genre, distinguishing between music and speech, and detection of the gender of human speakers. For each of these tasks, the same algorithm is applied, consisting of low-level signal analysis, statistical processing and perceptual modeling for feature extraction, and then supervised learning of sound classes. Given a ground truth dataset of audio examples, textual descriptive classification labels are then produced. Such labels are suitable for use in automating content interpretation (auditioning) and content retrieval, mixing and signal processing. A multidimensional feature vector is calculated from statistical and perceptual processing of low level signal analysis in the spectral and temporal domains. Machine learning techniques such as support vector machines are applied to produce classification labels given a selected taxonomy. The system is evaluated on large annotated ground truth datasets (n > 30000) and demonstrates success rates (F-measures) greater than 70% correct retrieval, depending on the task. Issues arising from labeling and balancing training sets are discussed. The performance of classification of audio using machine learning methods demonstrates the relative contribution of bottom-up signal derived features and data oriented classification processes to human cognition. Such demonstrations then sharpen the question as to the contribution of top-down, expectation based processes in human auditory cognition.

Leigh Smith

CCRMA MIR Workshop notes

Leigh M. Smith

Computational models of beat tracking of musical audio have been well explored, however, such systems often make "octave errors", identifying the beat period at double or half the beat rate than that actually recorded in the music. A method is described to detect if octave errors have occurred in beat tracking. Following an initial beat tracking estimation, a feature vector of metrical profile separated by spectral subbands is computed. A measure of subbeat quaver (1/8th note) alternation is used to compare half time and double time measures against the initial beat track estimation and indicate a likely octave error. This error estimate can then be used to re-estimate the beat rate. The performance of the approach is evaluated against the RWC database, showing successful identification of octave errors for an existing beat tracker. Using the octave error detector together with the existing beat tracking model improved beat tracking by reducing octave errors to 43% of the previous error rate.

ICMC 2010
Leigh M. Smith

A method for computing the similarity of metrical rhythmic patterns is described as applied to the audio signal of recorded music. For each rhythm, a combined feature vector of metrical profile and syncopation, separated by spectral subbands, hypermetrical profile, and tempo are compared. The descriptive capability of this feature vector is evaluated by it's use in a machine learning rhythm classification task, identifying ballroom dance styles using a support vector machine algorithm. Results indicate that with the full feature vector a result of 67% is achieved. This improves on previous results using rhythmic patterns alone, but does not exceed the best reported results. By evaluating individual features, measures of metrical, syncopation and hypermetrical profile are found to play a greater role than tempo in aiding discrimination.

Connection Science
Martin Coath, Susan Denham, Leigh M. Smith, Henkjan Honing, Amaury Hazan, Piotr Holonowicz, Hendrik Purwins

Connection Science, 21(2 & 3), 2009 pages 193-205)

We describe a biophysically motivated model of auditory salience based on a model of cortical responses and present results that show that the derived measure of salience can be used to identify the position of perceptual onsets in a musical stimulus successfully. The salience measure is also shown to be useful to track beats and predict rhythmic structure in the stimulus on the basis of its periodicity patterns. We evaluate the method using a corpus of unaccompanied freely sung stimuli and show that the method performs well, in some cases better than state-of-the-art algorithms. These results deserve attention because they are derived from a general model of auditory processing and not an arbitrary model achieving best performance in onset detection or beat-tracking tasks.

Leigh M. Smith

A computational multi-resolution model of musical rhythm expectation has been recently proposed based on cumulative evidence of rhythmic time-frequency ridges (Smith & Honing 2008a). This model was shown to demonstrate the emergence of musical meter from a bottom-up data processing model, thus clarifying the role of top-down expectation. Such a multiresolution time-frequency model of rhythm has also been previously demonstrated to track musical rubato well, with both synthesised (Smith & Honing 2008b) and performed audio examples (Coath et. al 2009). The model is evaluated for it's capability to generate accurate expectation from human musical performances. The musical performances consist of 63 monophonic rhythms from MIDI keyboard performances, and 50 audio recordings of popular music. The model generates expectations as forward predictions of times of future notes, a confidence weighting of the expectation, and a precision region. Evaluation consisted of generating successive expectations from an expanding fragment of the rhythm. In the case of the monophonic MIDI rhythms, these expectations were then scored by comparison against the onset times of notes actually then performed. The evaluation is repeated across each rhythm. In the case of the audio recording data, where beat annotations exist, but individual note onsets are not annotated, forward expectation is measured against the beat period. Scores were computed using information retrieval measures of precision, recall and F-score (van Rijsbergen 1979) for each performance. Preliminary results show mean PRF scores of (0.297, 0.370, 0.326) for the MIDI performances, indicating performance well above chance (0.177, 0.219, 0.195), but well below perfection. A model of expectation of musical rhythm has been shown to be computable. This can be used as a measure of rhythmic complexity, by measuring the degree of contradiction to expectation. As such, a rhythmic complexity measure is then applicable in models of rhythmic similarity used in music information retrieval applications.

Journal of Mathematics and Music
Leigh M. Smith and Henkjan Honing

Journal of Mathematics and Music, 2(2), 2008 pages 81-97

A method is described that exhaustively represents the periodicities created by a musical rhythm. The continuous wavelet transform is used to decompose an interval representation of a musical rhythm into a hierarchy of short-term frequencies. This reveals the temporal relationships between events over multiple time-scales, including metrical structure and expressive timing. The analytical method is demonstrated on a number of typical rhythmic examples. It is shown to make explicit periodicities in musical rhythm that correspond to cognitively salient “rhythmic strata” such as the tactus. Rubato, including accelerations and retards, are represented as temporal modulations of single rhythmic figures, instead of timing noise. These time varying frequency components are termed ridges in the time-frequency plane. The continuous wavelet transform is a general invertible transform and does not exclusively represent rhythmic signals alone. This clarifies the distinction between what perceptual mechanisms a pulse tracker must model, compared to what information any pulse induction process is capable of revealing directly from the signal representation of the rhythm. A pulse tracker is consequently modelled as a selection process, choosing the most salient time-frequency ridges to use as the tactus. This set of selected ridges are then used to compute an accompaniment rhythm by inverting the wavelet transform of a modified magnitude and original phase back to the time domain.