Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation

Main Article Content



A significant improvement occurred in recent years towards the solution of the cocktail-party problem. In fact, much attention has been drawn to supervised learning methods using synthetic mixtures datasets despite their being not representative of real-world mixtures. The difficulty in building a realistic dataset led researchers to use unsupervised-learning based methods, because of their ability to handle realistic mixtures directly. The results of unsupervised methods are still unconvincing.

In this paper, a method is introduced to create a realistic dataset with ground truth sources for speech separation. The main problem in designing a realistic dataset is the unavailability of ground truths for speakers’ signals, so a method is suggested to record two speakers simultaneously and obtain the ground truth for each speaker. Our method utilizes a MATLAB function which exploits a full duplex sound card to record and playback audio files at the same time. We have used TIMIT (Texas Instruments/Massachusetts Institute of Technology) corpus to implement our method, and design Realistic_TIMIT_2mix dataset. Evaluation is carried out on three datasets, and experiments show that our proposed dataset improved SI-SDR (Scale Invariant Signal to Distortion Ratio) by more than 1.5 dB and PESQ (Perceptual Evaluation of Speech Quality) by 0.5 approximately. We also measured the performance on different distances between the microphone and the speakers, and we found that our method made the learned model more stable when the distance changes.


Download data is not yet available.

Article Details

How to Cite
2024. Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation. Romanian Journal of Acoustics and Vibration. 20, 1 (Jun. 2024), 103–111.

How to Cite

2024. Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation. Romanian Journal of Acoustics and Vibration. 20, 1 (Jun. 2024), 103–111.

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>