Researchers from the Centre for Speech Technology Research developed open-source speech synthesis technology that is widely used for commercial and clinical purposes.
Text-to-speech (TTS), also known as speech synthesis, has a wide range of uses. It is important for people who have lost the ability to speak and rely on computer-based aids to communicate. However, existing speech synthesis technologies provide a small range of voices that many users dislike. Users demanded more personal, "normal"-sounding voices, which they can identify with.
The Centre for Speech Technology Research (CSTR) is a pioneer in the field of speech synthesis. It has made significant advances in these areas since the 1980s.
The CSTR’s research on speech synthesis draws on the work of many researchers. It resulted in several software tools. These are freely available online. The two main tools are: the Festival software toolkit and the HTS software toolkit.
Festival provides a complete TTS framework. It includes both stages of a TTS tool: text analysis and waveform, or sound, generation. The waveform generation plays back recorded speech sounds. From its first version in 1996 to more recent updates in 2007, Festival continued to improve and develop. Advances included better intonation, larger lexicons, and more accurate letter-to-sound prediction.
The HTS toolkit is based on a statistical model of synthesis called the Hidden Markov Model (HMM). HTS is used with Festival. However, it does not rely on playback of recorded speech sounds for waveform generation. Instead, the statistical model can take a sample of a speaker’s voice, and create new speech sounds to match. This is significant because the software can create different speaking styles. The software’s statistical model is adaptive. This means that even speech samples that are low quality, short in duration, or of disordered speech can be used to create normal-sounding, personalised synthetic speech.
Festival and HTS have been released as Open Source with unrestrictive licenses. This has enabled these tools to be widely used as a research and development framework within industry. Typically,over half of the papers on speech synthesis at industry and academic conferences in the area will be based on research using Festival and HTS toolkits.
Festival has formed the basis of several commercial products. It has also led to several companies being formed from the Centre for speech Technology Research , including Rhetorical Systems (and its descendent Phonetic Arts), CereProc, and Speech Graphics. This research has also formed the basis of other technologies that are licensed to a wide range of companies, such as the Combilex dictionary system and voice databases. In addition, major corporations such as AT&T, Google, Nuance, and Microsoft make regular use of Festival and HTS.
CSTR’s speech synthesis technology has the unique ability to repair and reconstruct disordered speech. Communication products based on the Festival and HTS toolkits have been developed to assist people with speech disorders. In 2010, a pilot study took place with Motor Neurone Disease sufferer Euan MacDonald. Using a 3-minute sample of his voice, the synthetic voice was installed in his eye-tracking-based communication device. The technology has undergone further small-scale trials, and has been supported by funding from the Medical Research Council and the Motor Neurone Disease Association. Future research will take place at the Anne Rowling Regenerative Neurology Clinic, established thanks to funding from J. K. Rowling.