Characteristics of Written Kuwaiti Arabic and their use in Creating Resources for Morphological Analysis.

Auteurs-es

  • Bashayer Alotaibi Kuwait University
  • Eiman Alsharhan Kuwait University

Mots-clés :

Written Arabic, morphological analyzer, NLP, social media, phonemic writing, written convention

Résumé

Kuwaiti Arabic (KA), like other Arabic dialects, is a spoken variety of Arabic that does not have a standardized written convention contrary to Modern Standard Arabic (MSA). With the emergence and spread of social media platforms, Arabic dialects have found their way into the written medium, and hence a need arose to process them alongside MSA. The biggest challenge facing NLP tools is that dialects do not have consistent written conventions contrary to MSA, and writers expressing their dialects usually follow a phonetic writing system, or they write words as they pronounce them. This has opened the door for variations within the same dialect and between dialects and MSA. Furthermore, a prerequisite for analysing any language or dialect is the presence of clear written conventions. Therefore, efforts have been made to establish written conventions for Arabic dialects, but the Kuwaiti dialect has not received the required attention. The current study offers a practical solution for processing written KA. It identified and extracted the written conventions of KA from natural data collected from over 100K Kuwaiti tweets since they represent a good model of natural language. The morphological analyzer (MADAMIRA) - which is designed to process MSA - was enhanced with the extracted criteria. Furthermore, the study involved enriching the analyzer with a dictionary of Kuwaiti terms and vocabulary ‘lemmas’ collected from the Encyclopaedia of Kuwaiti Arabic and from the most used Kuwaiti words on Twitter (currently X). Providing the analyzer with this dictionary of KA words helps it process KA more accurately. The expanded version of the analyzer (MADAMIRA-KA) is the first of its kind designed entirely to process the Kuwaiti dialect and has achieved excellent performance in analyzing over 100K Kuwaiti tweets successfully. The importance of this study lies in developing such a morphological analyzer, which can be used for automated translation, dialect recognition and sentiment analysis.

Téléchargements

Les données relatives au téléchargement ne sont pas encore disponibles.

Bibliographies de l'auteur-e

Bashayer Alotaibi, Kuwait University

Assistant professor, Department of Arabic Language, College of Arts, Kuwait University, Kuwait.

Eiman Alsharhan, Kuwait University

Associate Professor, Department of Arabic Language, College of Arts, Kuwait University, Kuwait.

Téléchargements

Publié-e

2024

Comment citer

Alotaibi, B., & Alsharhan, E. (2024). Characteristics of Written Kuwaiti Arabic and their use in Creating Resources for Morphological Analysis. Arab Journal for the Humanities, 42(166), 275–301. Consulté à l’adresse https://journals.ku.edu.kw/ajh/index.php/ajh/article/view/365

Numéro

Rubrique

La langue arabe et sa littérature