Body / medical datasets



An Open Access Database for the Evaluation of Heart Sound Algorithms

The Michigan Heart Sound and Murmur database (MHSDB) was provided by the University of Michigan Health System. It includes only 23 heart sound recordings with a total of time length of 1496.8 s and is available from

The PASCAL database comprises 176 recordings for heart sound segmentation and 656 recordings for heart sound classification. Although the number of the recordings is relatively large, the recordings have the limited time length from 1 s to 30 s. They also have a limited frequency range below 195 Hz due to the applied low-pass filter, which removes many of the useful heart sound components for clinical diagnosis. It is available from

The Cardiac Auscultation of Heart Murmurs database is provided by eGeneral Medical Inc., includes 64 recordings. It is not open and requires payment for access from:

unsorted cardio


A Progressively Expanded Database for Automated Lung Sound Analysis: An Update
ICBHI Respiratory Sound Database (The Respiratory Sound database - ICBHI 2017 Challenge)

The database consists of a total of 5.5 hours of recordings containing 6898 respiratory cycles, of which 1864 contain crackles, 886 contain wheezes, and 506 contain both crackles and wheezes, in 920 annotated audio samples from 126 subjects.

unsorted lungs / respiratory

fat tissue


KSoF (The Kassel State of Fluency Dataset – A Therapy Centered Dataset of Stuttering)

RWCP-SSD-Onomatopoeia is a dataset consisting of 155,568 onomatopoeic words paired with audio samples for environmental sound synthesis



The FSDnoisy18k dataset is an open dataset containing 42.5 hours of audio across 20 sound event classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. The audio content is taken from Freesound, and the dataset was curated using the Freesound Annotator. The noisy set of FSDnoisy18k consists of 15,813 audio clips (38.8h), and the test set consists of 947 audio clips (1.4h) with correct labels. The dataset features two main types of label noise: in-vocabulary (IV) and out-of-vocabulary (OOV). IV applies when, given an observed label that is incorrect or incomplete, the true or missing label is part of the target class set. Analogously, OOV means that the true or missing label is not covered by those 20 classes.

STARSS22 (Sony-TAu Realistic Spatial Soundscapes 2022)

The Sony-TAu Realistic Spatial Soundscapes 2022(STARSS22) dataset consists of recordings of real scenes captured with high channel-count spherical microphone array (SMA). The recordings are conducted from two different teams at two different sites, Tampere University in Tammere, Finland, and Sony facilities in Tokyo, Japan. Recordings at both sites share the same capturing and annotation process, and a similar organization. They are organized in sessions, corresponding to distinct rooms, human participants, and sound making props with a few exceptions.


ARCA23K is a dataset of labelled sound events created to investigate real-world label noise. It contains 23,727 audio clips originating from Freesound, and each clip belongs to one of 70 classes taken from the AudioSet ontology. The dataset was created using an entirely automated process with no manual verification of the data. For this reason, many clips are expected to be labelled incorrectly.

ADVANCE (AuDio Visual Aerial sceNe reCognition datasEt)
ESC50 (ESC: Dataset for Environmental Sound Classification)

The dataset consists of 5-second-long recordings organized into 50 semantical classes (with 40 examples per class) loosely arranged into 5 major categories.


SoundingEarth consists of co-located aerial imagery and audio samples all around the world.



Urban Sound 8K is an audio dataset that contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. All excerpts are taken from field recordings uploaded to


URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using the scraper library. The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events.

Room / home



Datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)

Warblr is a dataset for the acoustic detection of birds. The dataset comes from a UK bird-sound crowdsourcing research spinout called Warblr. From this initiative the authors collected over 10,000 ten-second smartphone audio recordings from around the UK. The audio totals around 28 hours duration.


Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection (MIMII)

is a sound dataset of industrial machine sounds.


ToyADMOS2 is a dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions.

FSD50K (Freesound Database 50K)

Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. It consists mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more.


USM-SED is a dataset for polyphonic sound event detection in urban sound monitoring use-cases. Based on isolated sounds taken from the FSD50k dataset, 20,000 polyphonic soundscapes are synthesized with sounds being randomly positioned in the stereo panorama using different loudness levels.