| Research & Development in UTMK |
|
| Fundamental Linguistics |
Research Area |
Resources |
Products |
- Phonology
- Phonectics
- Morphology
- Syntax
- Semantics
- Pragmatics
|
- Machine translation
- Computer aided translation
- Document categorisation
- Question answering
- Information Extraction
- Word sense disambiguation
- Lexical Semantics
- Text analysis
- Speech analysis
- Speech recognition
- Speech synthesis
- Internet application
- Multimodal system
- Multilingual online language system
|
Language resources
- Wordlists (words, roots, abbreviations, phonemes, orthographic syllables)
- Text & Spoken corpora
- Machine readable dictionaries
- Malay WordNet prototype
- Lexicontologies
- Bilingual knowledge bank (English-Malay)
- Linguistic formalisms (STCG, SSTC, S-SSTC)
- Pronunciation dictionaries (Malay, Mandarin)
- Malay spoken dictionary
- Malay n-gram language model
Processing resources
- Statistical text analyser
- Malay text normaliser
- Interactive Malay spelling checker
- Malay affixed-words analyser
- Malay POS tagger
- Simple sense tagger
- Malay date / name identifier
- Malay phrase tree editor
- S-SSTC editor
- Text aligner
- Interactive annotator of wordlist
- Lexicontology API
- Conceptual vector API
- Bilingual knowledge bank generator
- Grapheme-to-phoneme utility (Pinyin to phoneme)
- Syllable-balanced text selection utility
- Speech segmenter
- Text convertor: from raw to TEI format
- Dictionary format converter
- File distributor
- Frequency and effort tool
- Language identifier
Internet processing tools
- Rank aggregation model for meta search
- Result categorisation model for meta search
- Syntactical matching for Web services
- Category model construction
- Naming variant dictionary for XML element
- Document logical structure for HTML document
|
- Example-based machine translation prototype
- Corpus processing toolkit
- Information warehouse
- Internet data syndicator
- Generic e-Business portal
- Malay speech synthetiser prototype
|
|
|
The Computer-Aided Translation Unit (UTMK) has conducted nearly three decades of research and development work in Computational Linguistics (CL) and Natural Language Processing (NLP). Working in CL/NLP require fundamental knowledge of linguistics such as phonology, phonetics, morphology, syntax, semantics, and pragmatics. UTMK’s main researches focus on machine translation, document categorisation, question answering, information extraction, word sense disambiguation, lexical semantics, text analysis, speech analysis, speech recognition, speech synthesis, Internet applications, multimodal system, and multilingual online language system. To support the CL/NLP researches, UTMK is working on the development of mono- and multilingual language and processing resources. With the linguistic knowledge, the language and processing resources, UTMK has implemented NLP prototype systems. These last 10 years, UTMK emphasises its efforts to bring NLP techniques to commercial NLP applications.
|
| Research & Development Projects |
| Latest R & D Projects: |
- Voice-enable Automatic Question Answering System, December 2006 – May 2008, E-Science Fund - RM188,854
- Mind Your Language: Corpora and Algorithms for the Fundamental Natural Language Processing Tasks in Information Retrieval and Extraction for the Indonesian and Malay languages, February 2007 – February 2008, ARF Grant - $32,771
- Construction of Ontology-based Multilingual Sense Dictionary for Supporting Online WSD
Service and Sense Dictionary Look-up, December 2006 – May 2008, E-Science Fund - RM80,000
|
| more... |
|
| Research & Development Co-operations |
| Latest R & D Co-operations: |
- National University of Singapore (NUS), Singapore, February 2007 - to date
|
| more... |
|
|
|
|