Supported Datasets and Annotations
⭐ Dataset Quick Reference ⭐
This table is provided as a guide for users to select appropriate datasets. The list of annotations omits some metadata for brevity, and we document the dataset’s primary annotations only. The number of tracks indicates the number of unique “tracks” in a dataset, but it may not reflect the actual size or diversity of a dataset, as tracks can vary greatly in length (from a few seconds to a few minutes), and may be homogeneous.
“Downloadable” possible values:
✅ : Freely downloadable
🔑 : Available upon request
📺 : Do-it-yourself download
🧮 : Features only
❌ : Not available
Find the API documentation for each of the below datasets in Initializing.
Dataset |
Downloadable? |
Annotation Types |
Tracks |
License |
---|---|---|---|---|
AcousticBrainz Genre |
|
>4M |
||
BAF |
|
3425 |
||
Beatles |
|
180 |
❌ |
|
Beatport EDM key |
|
|
1486 |
|
Billboard (McGill) |
|
890 |
||
Candombe |
|
35 |
||
cante100 |
|
100 |
||
CIPI |
|
|
652 |
|
(CompMusic)
Carnatic Rhythm
|
|
176 |
||
(CompMusic)
Hindustani Rhythm
|
|
151 |
||
(CompMusic)
Indian Tonic
|
|
2150 |
||
(CompMusic)
Jingju A Cappella
|
|
82 |
||
(CompMusic)
OTMM Makam
|
|
1000 |
||
(CompMusic) Raga
|
|
780 |
||
Dagstuhl ChoirSet |
|
108 |
||
DALI |
|
5358 |
||
Da-TACOS |
|
|
||
EGFxSet |
|
8970 |
||
Filosax |
|
48 |
❌ |
|
Four-Way Tabla Stroke |
|
236 |
||
Freesound One-Shot Percussive Sounds |
|
10254 |
||
Giantsteps key |
|
global Key |
500 |
|
Giantsteps tempo |
|
664 |
||
Good Sounds |
|
|
16308 |
|
Groove MIDI |
|
1150 |
||
Gtzan-Genre |
|
1000 |
❌ |
|
Guitarset |
|
360 |
||
Ikala |
|
252 |
||
Haydn op20 |
|
24 |
||
IDMT-SMT-Audio Effects |
|
|
55044 |
|
IRMAS |
|
9579 |
||
MTG_jamendo_autotagging_moodtheme |
|
|
18448 |
|
MAESTRO |
|
Piano Notes |
1282 |
|
Medley-solos-DB |
|
21571 |
||
MedleyDB melody |
|
Melody F0 |
108 |
|
MedleyDB pitch |
|
103 |
||
Mridangam Stroke |
|
6977 |
||
Orchset |
|
Melody F0 |
64 |
|
PHENICX-Anechoic |
|
4 |
||
Queen |
|
51 |
❌ |
|
RWC classical |
|
61 |
||
RWC jazz |
|
50 |
||
RWC popular |
|
100 |
||
Salami |
|
1359 |
||
Saraga Carnatic |
|
249 |
||
Saraga Hindustani |
|
108 |
||
Saraga-Carnatic-Melody-Synth (SCMS) |
|
2460 |
||
Slakh |
|
|
1710 |
|
Tinysol |
|
2913 |
||
Tonality ClassicalDB |
|
Global Key |
881 |
|
TONAS |
|
72 |
||
vocadito |
|
40 |
Annotation Types
The table above provides annotation types as a guide for choosing appropriate datasets,
but it is difficult to generically categorize annotation types, as they depend on varying
definitions and their meaning can change depending on the type of music they correspond to.
Here we provide a rough guide to the types in this table, but we strongly recommend reading
the dataset specific documentation to ensure the data is as you expect. To see how these annotation
types are implemented in mirdata
see Annotations.
Events
A generic annotation to indicate whether a particular event is happening at a given time. It can be used, for instance, to indicate whether a particular instrument is playing at a given time-step or whether a particular note is being played at a given time-step. In fact, it is implicit in annotations such as F0 or Vocal Notes (instrument is activated when the melody is non-0). However, some datasets provide it as a standalone event annotation.
Beats
Musical beats, typically encoded as sequence of timestamps and corresponding beat positions. This implicitly includes downbeat information (the beginning of a musical measure).
Chords
Musical chords, e.g. as might be played on a guitar. Typically encoded as a sequence of labeled events, where each event has a start time, end time, and a label. The label taxonomy varies per dataset, but typically encode a chord’s root and its quality, e.g. A:m7 for “A minor 7”.
Drums
Transcription of the drums, typically encoded as a sequence of labeled events, where the labels indicate which drum instrument (e.g. cymbal, snare drum) is played. These events often overlap with one another, as multiple drums can be played at the same time.
F0
Musical pitch contours, typically encoded as time series indicating the musical pitch over time. The time series typically have evenly spaced timestamps, each with a corresponding pitch value which may be encoded in a number of formats/granularities, including midi note numbers and Hertz.
Effect
Effect applied to a track. It may refer to the effect applied to a single stroke or an entire track. It can include the effect name, the effect type, the effect parameters, and the effect settings.
Genre
A typically global “tag”, indicating the genre of a recording. Note that the concept of genre is highly subjective and we refer those new to this task to this article.
Instruments
Labels indicating which instrument is present in a musical recording. This may refer to recordings of solo instruments, or to recordings with multiple instruments. The labels may be global to a recording, or they may vary over time, indicating the presence/absence of a particular instrument as a time series.
Key
Musical key. This can be defined globally for an audio file or as a sequence of events.
Lyrics
Lyrics corresponding to the singing voice of the audio. These may be raw text with no time information, or they may be time-aligned events. They may have varying levels of granularity (paragraph, line, word, phoneme, character) depending on the dataset.
Matches
Music identifications in a query audio. This term is used in Audio Fingerprinting to refer to identifications of music from a reference database. Matches include information about which reference audio has been identified and the start and end times of the query match.
Meter
Rhythmic meter for each measure. A classical example of meter in Western music would be 4/4. Details how many subdivisions and the length of this subdivisions that we do have per each measure.
Melody
The musical melody of a song. Melody has no universal definition and is typically defined per dataset. It is typically encoded as F0 or as Notes. Other types of annotations such as Vocal F0 or Vocal Notes can often be considered as melody annotations as well.
Notes
Musical note events, typically encoded as sequences of start time, end time, label. The label typically indicates a musical pitch, which may be in a number of formats/granularities, including midi note numbers, Hertz, or pitch class.
Phonemes
Sung phonemes of the lead vocal lyrics. Likewise the annotations of lyrics, it can be represented as a stream of characters, or it can be time-aligned by start and end times, and the phoneme comprised in each interval.
Phrases
Musical phrase events, typically encoded by a sequence of timestamps indicating the boundary times and defined by solfège symbols. This annotations are not intended to describe the complete melody but the musical phrases present in the track.
Sections
Musical sections, which may be “flat” or “hierarchical”, typically encoded by a sequence of timestamps indicating musical section boundary times. Section annotations sometimes also include labels for sections, which may indicate repetitions and/or the section type (e.g. Chorus, Verse).
Segments
Segments of particular musical events, e.g. segments of note stability, segments of particular melodic event, and many more.
Technique
The playing technique used by a particular instrument, for example “Pizzicato”. This label may be global for a given recording or encoded as a sequence of labeled events.
Tempo
The tempo of a song, typical in units of beats-per-minute (bpm). This is often indicated globally per track, but in practice tracks may have tempos that change, and some datasets encode tempo as time-varying quantity. Additionally, there may be multiple reasonable tempos at any given time (for example, often 2x or 0.5x a tempo value will also be “correct”). For this reason, some datasets provide two or more different tempo values.
Vocal Activity
A time series or sequence of events indicating when singing voice is present in a recording. This type of annotation is implicitly available when Vocal F0 or Vocal Notes annotations are available.
Stroke Name
An open “tag” to identify an instrument stroke name or type. Used for instruments that have specific stroke labels.
Syllables
Additional representation of the sung lyrics but structured as syllables instead of complete sentences. It can be annotated as time-aligned events where the events are the syllables happening at certain time intervals. Otherwise, they can be represented as a stream of strings, grouped by meaningful syllable structures.
Tonic
The absolute tonic of a track. It may refer to the tonic a single stroke, or the tonal center of a track.