Researchers from the Centre for Speech Technology Research developed open-source speech synthesis technology that is widely used for commercial and clinical purposes.
Text-to-speech (TTS), also known as speech synthesis, has a wide range of uses. It is important for people who have lost the ability to speak and rely on computer-based aids to communicate. However, existing speech synthesis technologies provide a small range of voices that many users dislike. Users demanded more personal, "normal"-sounding voices, which they can identify with.
The Centre for Speech Technology Research (CSTR) is a pioneer in the field of speech synthesis. It has made significant advances in these areas since the 1980s.
The CSTR’s research on speech synthesis draws on the work of many researchers. It resulted in several software tools. These are freely available online. The two main tools are: the Festival software toolkit and the HTS software toolkit.
Festival toolkit
Festival provides a complete TTS framework. It includes both stages of a TTS tool: text analysis and waveform, or sound, generation. The waveform generation plays back recorded speech sounds. From its first version in 1996 to more recent updates in 2007, Festival continued to improve and develop. Advances included better intonation, larger lexicons, and more accurate letter-to-sound prediction.
HTS toolkit
The HTS toolkit is based on a statistical model of synthesis called the Hidden Markov Model (HMM). HTS is used with Festival. However, it does not rely on playback of recorded speech sounds for waveform generation. Instead, the statistical model can take a sample of a speaker’s voice, and create new speech sounds to match. This is significant because the software can create different speaking styles. The software’s statistical model is adaptive. This means that even speech samples that are low quality, short in duration, or of disordered speech can be used to create normal-sounding, personalised synthetic speech.
