Computer Audition & Music Information Retrieval (MIR)
Computer Audition (CA) is general field of study of
algorithms and systems for audio understanding by machine. Since the
notion of what it means for a machine to "hear" is very broad and
somewhat vague, computer audition attempts to bring together several
disciplines that originally dealt with specific problems or had a
concrete application in mind. Inspired by models of human audition, CA
deals with questions of representation, transduction, grouping, use of
musical knowledge and general sound semantics for the purpose of
performing intelligent operations on audio and music signals by the
computer. Technically this requires a combination of methods from the
fields of signal processing, auditory modeling, music perception and
cognition, pattern recognition and machine learning, as well as more
traditional methods of artificial intelligence for musical knowledge
representation.
Computer Audition
Like Computer Vision
versus Image Processing, Computer Audition versus Audio Engineering
deals with understanding of audio rather than processing. It also
differs from problems of speech understanding by machine since it deals
with general audio signals, such as natural sounds and musical
recordings.
Applications of Computer Auditions are widely varying, and include
search for sounds, genre recognition, acoustic monitoring, music
trascription, score following, audio texture, music improvisation,
emotion in audio and so on.
Related Disciplines
Computer Audition overlaps with the following disciplines:
- Music Information Retrieval: methods for search and analysis of similarity between music signals.
- Auditory Scence Analysis: understanding and description of audio sources and events.
- Macine listening: methods for extracting auditory meaningful parameters from audio signals.
- Computational musicology and mathematical music theory: use of
algorithms that employ musical knowledge for analysis of music data.
- Computer music: use of computers in creative musical applications.
- Machine musicanship: audition driven interactive music systems.
Music Information Retrieval (MIR)
Music information retrieval or MIR is the interdisciplinary science of retrieving information from music.
This includes:
- Computational methods for classification, clustering, and modelling — Musical feature extraction for mono- and polyphonic music, similarity and pattern matching, retrieval
- Formal methods and databases — Applications of automated music
identification and recognition, such as score following, automatic
accompaniment, Routing and filtering for music and music queries, Query
languages, Standards and other metadata or protocols for music
information handling and retrieval, Multi-agent systems, distributed search)
- Software for music information retrieval — Semantic Web and musical digital objects, Intelligent agents, Collaborative software, Web-based search and semantic retrieval, query by humming, acoustic fingerprinting
- Human-computer interaction and interfaces — Multi-modal interfaces, User interfaces and usability, Mobile applications, User behaviour
- Music perception, cognition, affect, and emotions — Music similarity metrics, Syntactical parameters, Semantic parameters, Musical forms, structures, styles and genres, Music annotation methodologies
- Music analysis and knowledge representation — Automatic
summarization, citing, excerpting, downgrading, transformation, Formal
models of music, digital scores and representations, Music indexing and
metadata
- Music archives, libraries, and digital collections — Music digital libraries, Public access to musical archives, Benchmarks and research databases
- Intellectual property rights and music — National and international intellectual property right issues, Digital rights management, Identification and traceability
- Sociology and Economy of music — Music industry and use of MIR in
the production, distribution, consumption chain, User profiling,
Validation, User needs and expectations, evaluation of music IR
systems, building test collections, experimental design and metrics
Areas of Study
The study of CA could be roughly divided into the following sub-problems:
- Representation: signal and symbolic. This aspect deals with
time-frequency representations, both in terms of notes and spectral
models, including pattern playback and audio texture.
- Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma and auditory representations.
- Musical knowledge structures: analysis of tonality, rhythm and harmonies.
- Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation and clustering.
- Sequence modeling: matching and alignment between signals and note sequences.
- Source separation: methods of grouping of simultaneous sounds, such
as multiple pitch detection and time-frequency clustering methds.
- Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise and analysis of musical structure.
- Multi-modal analysis: finding correspondences between textual, visual and audio signals.
Representation Issues
Computer audition deals with audio signals that can be represented
in a variety of fashions, from direct encoding of digital audio in two
or more channels to symbolically represented synthesis instructions.
Audio signals are usually represented in terms of analogue or digital
recordings. Digital recordings are samples of acoustic waveform or
parameters of audio compression algorithms. One of the unique
properties of musical signals is that they often combine different
types of representations, such as graphical scores and sequences of
performance actions that are encoded as MIDI
files. Since audio signals usually comprise multiple sound sources,
then unlike speech signals that can be efficiently described in terms
of specific models (such as source-filter model), it is hard to devise
a parametric representation for general audio. Parametric audio
representations usually use filter banks or sinusoidal models to
capture multiple sound parameters, sometimes increasing the
representation size in order to capture internal structure in the
signal. Additional types of data that are relevant for computer
audition are textual descriptions of audio contents, such as
annotations, reviews, and visual information in the case of
audio-visual recordings.
Features
Description of contents of general audio signals usually requires
extraction of features that capture specific aspects of the audio
signal. Generally speaking, one could divide the features into signal
or mathematical descriptors such as energy, description of spectral
shape and etc., statistical characterization such as change or novelty
detection, special representations that are better adapted to the
nature of musical signals or the auditory system, such as logarithmic
growth of sensitivity (bandwidth) in frequency or octave invariance
(chroma). As mentioned above, since parametric models in audio usually
require very many parameters, the features are used to summarize
properties of multiple parameters in a more compact or salient
representation.
Musical Knowledge
Finding specific musical structures is possible by using musical
knowledge as well as supervised and unsupervised machine learning
methods. Examples of this include detection of tonality according to
distribution of frequencies that correspond to patterns of occurrence
of notes in musical scales, distribution of note onset times for
detection of beat structure, distribution of energies in different
frequencies to detect musical chords and so on.
Sound Similarity and Sequence Modeling
Comparison of sounds can be done by comparison of features with or
without reference to time. In some cases an overall similarity can be
assessed by close values of features between two sounds. In other cases
when temporal structure is important, methods of dynamic time warping
need to be applied to "correct" for different temporal scales of
acoustic events. Finding repretitions and similar sub-sequences of
sonic events is important for tasks such as texture synthesis and
machine improvisation.
Source Separation
Since one of the basic characteristics of general audio is that it
comprises multiple simultaneously sounding sources, such as multiple
musical instruments, people talking, machine noises or animal
vocalization, the ability to identify and separate individual source is
very desirable. Unfortunately, there are no methods that can solve this
problem in a robust fashion. Existing methods of source separation rely
sometime on correlation between different audio channels in
multi-channel recordings. The ability to separate sources from stereo
signals requires different techniques then those usually applied in
communications where multiple sensors are available. Other source
separation methods rely on training or clustering of features in mono
recording, such as tracking harmonically related partials for multiple
pitch detection.
Auditory Cognition
Listening to music and general audio is commonly not a task directed
activity. People enjoy music for various poorly understood reasons,
which are commonly referred to the emotional effect of music due to
creation of expectations and their realization or violation. Animals
attend to signs of danger in sounds, which could be either specific or
general notions of surprising and unexpected change. Generally, this
creates a situation where computer audition can not rely solely on
detection of specific features or sound properties and has to come up
with general methods of adapting to changing auditory environment and
monitoring its structure. This consists of analysis of larger
repetition and self similarity structures in audio to detect
innovation, as well as ability to predict local feature dynamics.
Multi-Modal analysis
Among the available data for describing music, there are textual
representations, such as liner notes, reviews and criticisms that
describe the audio contents in words. In other cases human reactions
such as emotional judgments or psycho-physiological measurements might
provide an insight into the contents and structure of audio. Computer
Audition tries to find relation between these different representations
in order to provide this additional understanding of the audio contents.
UCSD Computer Audition Lab
The lab conducts multidisciplinary research that combines machine
learning with musical signal processing for developing new approaches
to semantic musical annotation and modeling of musical and audio
structure.
Faculty Members
- Gert Lanckriet
- Shlomo Dubnov
- Lawrence Saul
- Bhaskar Rao
Links:
UCSD Computer Audition Lab [1]
Computer Audition Resources [2]
Tutorial on Computer Audition [3]
References
See also
External links
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Computer Audition"
|