Examining Test-Time Adaptation for Personalized Child Speech Recognition
Abstract
Automatic speech recognition (ASR) models often experience performance degradation due to data domain shifts introduced at test time, a challenge that is further amplified for child speakers. Test-time adaptation (TTA) methods have shown great potential in bridging this domain gap. However, the use of TTA to adapt ASR models to the individual differences in each child’s speech has not yet been systematically studied. In this work, we investigate the effectiveness of two widely used TTA methods–SUTA, SGEM–in adapting off-the-shelf ASR models and their fine-tuned versions for child speech recognition, with the goal of enabling continuous, unsupervised adaptation at test time. Our findings show that TTA significantly improves the performance of both off-the-shelf and fine-tuned ASR models, both on average and across individual child speakers, compared to unadapted baselines. However, while TTA helps adapt to individual variability, it may still be limited with non-linguistic child speech.

BibTeX
@inproceedings{shi2025examining,
title={Examining Test-Time Adaptation for Personalized Child Speech Recognition},
author={Shi, Zhonghao and Shi, Xuan and Xu, Anfeng and Feng, Tiantian and Srivastava, Harshvardhan and Narayanan, Shrikanth and Mataric, Maja},
booktitle={Proc. Interspeech 2025},
pages={2820--2824},
year={2025}
}