IT Fair 2003

Technologies on Display

G2 - Innovative Applications of Spoken Language Technologies

(1) CU VOCAL: A Cantonese Text-to-speech Synthesizer
CU VOCAL is a Cantonese text-to-speech (TTS) engine that generates highly natural and intelligible synthetic speech based on input Chinese text. It enables dynamic information delivery via a spoken presentation in Cantonese. CU VOCAL adopts a syllable-based concatenative approach that considers both coarticulatory and tonal contexts. A sophisticated language processor is also developed for word segmentation, appropriate concept verbalization, named entities identification, automatic disambiguation among multiple pronunciations in Chinese as well as mixed language (Chinese and English) handling.

Unique Features

Highly natural and intelligible speech output
Technology applicable to Cantonese and Putonghua
Speech quality optimizable for specific domains

Applications

CU VOCAL Web Service:
The First Chinese Text-to-speech Web Service
- Voice-enabled applications and multimedia
messaging over the Web
- Highly interoperable with other Web services
(e.g. message multicasting Web service)
- No need for local installation and maintenance
- Transparent TTS engine upgrades
Client-based CU VOCAL:
The First Cantonese SAPI Compatible Engine
- Easily invoked by Windows-based applications
- Microsoft SAPI 5.1 compatible
- Potential applications include story reading (eBook),
webpage/screen reader and announcement systems

Website
http://www.se.cuhk.edu.hk/cuvocal

(2) Audio Search Engine
This project enables cross-media information retrieval whereby users can use textual or spoken queries to retrieve relevant video and audio documents. We integrate Chinese speech recognition with information retrieval technologies to develop the first system for Cantonese spoken document retrieval. Our novel approach indexes and retrieves spoken audio documents in real time.

Unique Features

First Cantonese speech retrieval system
Novel approach uniquely suitable for monosyllabic languages (e.g. Chinese)
Use of subword units circumvents the segmentation ambiguity in Chinese
Extensible to cross-language speech retrieval systems
(e.g. using English queries to retrieve Chinese documents)

Applications

Multimedia Information Search on Desktop and
Handheld Computers
- Textual queries may be input by typing or handwriting recognition
- Spoken queries are recognized by the CUHK Cantonese speech recognition engine (CURSBB)

Websites
(desktop) http://www.se.cuhk.edu.hk/hccl/audiosearch
(handheld) http://www.se.cuhk.edu.hk/hccl/mobileaudiosearch

Principal Investigator
Professor Helen Meng
Department of Systems Engineering and Engineering Management