Technologies on Display

G2 - Innovative Applications of Spoken Language Technologies

(1) CU VOCAL: A Cantonese Text-to-speech Synthesizer
CU VOCAL is a Cantonese text-to-speech (TTS) engine that generates highly natural and intelligible synthetic speech based on input Chinese text. It enables dynamic information delivery via a spoken presentation in Cantonese. CU VOCAL adopts a syllable-based concatenative approach that considers both coarticulatory and tonal contexts. A sophisticated language processor is also developed for word segmentation, appropriate concept verbalization, named entities identification, automatic disambiguation among multiple pronunciations in Chinese as well as mixed language (Chinese and English) handling.

Unique Features

  • Highly natural and intelligible speech output
  • Technology applicable to Cantonese and Putonghua
  • Speech quality optimizable for specific domains


  • CU VOCAL Web Service:
    The First Chinese Text-to-speech Web Service
    - Voice-enabled applications and multimedia
    messaging over the Web
    - Highly interoperable with other Web services
    (e.g. message multicasting Web service)
    - No need for local installation and maintenance
    - Transparent TTS engine upgrades
  • Client-based CU VOCAL:
    The First Cantonese SAPI Compatible Engine
    - Easily invoked by Windows-based applications
    - Microsoft SAPI 5.1 compatible
    - Potential applications include story reading (eBook),
    webpage/screen reader and announcement systems


(2) Audio Search Engine
This project enables cross-media information retrieval whereby users can use textual or spoken queries to retrieve relevant video and audio documents. We integrate Chinese speech recognition with information retrieval technologies to develop the first system for Cantonese spoken document retrieval. Our novel approach indexes and retrieves spoken audio documents in real time.

Unique Features

  • First Cantonese speech retrieval system
  • Novel approach uniquely suitable for monosyllabic languages (e.g. Chinese)
  • Use of subword units circumvents the segmentation ambiguity in Chinese
  • Extensible to cross-language speech retrieval systems
    (e.g. using English queries to retrieve Chinese documents)


  • Multimedia Information Search on Desktop and
    Handheld Computers
    - Textual queries may be input by typing or handwriting recognition
    - Spoken queries are recognized by the CUHK Cantonese speech recognition engine (CURSBB)


Principal Investigator
Professor Helen Meng
Department of Systems Engineering and Engineering Management