تطوير مدونة لغوية كلامية تتضمن معلومات حول لهجات المتحدثين.

Eiman Alsharhan; Allan  Ramsay

Auteurs-es

Eiman Alsharhan Kuwait University
Allan Ramsay University of Manchester

Mots-clés :

.

Résumé

Arabic varieties differ substantially in all aspects of linguistics. These differences call for dialect specific modeling when building Arabic automatic speech recognition systems. The paper introduces the development of a multi-dialect annotated corpus of dialectal Arabic with data obtained from Linguistic Data Consortium (LDC). The annotation process is applied to GALE (phase 3) broadcast news and broadcast conversational speech. The annotation process resulted in assigning a dialect label for about 2900 speakers who contributed to this substantial Arabic resource. The final evaluation of the annotations shows that it achieved a substantial level of agreement. The annotations are fully available online for searching and downloading along with a set of access tools to help extract specific information from the database. The researchers’ goal is for this dataset to be used for the development of NLP applications, which pay attention to issues that arise because of the wide range of Arabic accents.

Téléchargements

Les données relatives au téléchargement ne sont pas encore disponibles.

Bibliographies de l'auteur-e

Eiman Alsharhan, Kuwait University

Assistant Professor, Dept. of Arabic Language & Literature, College of Arts, Kuwait University, Kuwait.

Allan Ramsay, University of Manchester

Professor, School of Computer Seience, faculty of Engineering and Physical Sciences, University of Manchester, UK.

An Exploratory Study of the Development of a Speech Corpus Annotated for the Main Arabic Dialects.

Auteurs-es

Mots-clés :

Résumé

Téléchargements

Bibliographies de l'auteur-e

Eiman Alsharhan, Kuwait University

Allan Ramsay, University of Manchester

Téléchargements

Publié-e

Comment citer

Numéro

Rubrique

ISSN

Langue

Soumissions

Renseignements

Les articles complets sont disponibles sur: