Developing Lexical Resources of Saraiki Verbs: A Corpus Based Study

Main Article Content

Muhammad Awais
Musarrat Azher
Muhammad Farukh Arslan

Abstract

Saraiki is an Indo-Aryan language and is recognized as the fourth most widely spoken language in Pakistan. It is extensively used in Pakistan, especially in south Punjab and Sindh, and is also spoken in some parts of Afghanistan and India. The language holds significant historical and geographical importance. Despite numerous studies emphasizing its distinctiveness, Saraiki remains less explored in terms of its unique linguistic features. The current corpus-based study aims to create synsets of Saraiki verbs by establishing an interface for their synonyms. A corpus of three million words has been developed using literary and non-literary sources. Data collection involved sourcing information from online platforms and scanning hard copies of literary and non-literary works, which were then converted into machine-readable files. From the corpus, one hundred high-frequency verbs were selected and categorized based on Fellbaum’s (1993) model, which comprises fifteen files developed according to semantic domains. The verbs falling within these categories were analyzed for their lexico-semantic relations to construct an interface of their synonyms. This study holds significance as it contributes to the development of synsets for verbs, encompassing verb meanings, definitions of associated concepts, example sentences, and lexicosemantic relations. Consequently, this research proves valuable for students, teachers, and researchers of Saraiki, as well as those engaged in the creation of Wordnet.

Downloads

Download data is not yet available.

Article Details

How to Cite
Awais, M., Azher, M., & Arslan, M. F. (2023). Developing Lexical Resources of Saraiki Verbs: A Corpus Based Study . Linguistic Forum - A Journal of Linguistics, 5(3), 136–158. https://doi.org/10.5281/zenodo.14757341
Section
Articles
Author Biographies

Muhammad Awais, MPhil Scholar, Department of English, University of Sargodha, Punjab, Pakistan.

Muhammad Awais is a student of English Language and Literature. He completed his MPhil from Sargodha University in 2021. His article titled “Code-Switching as a Marker of Identity: A Linguistic Analysis of Pakistani TV Morning Shows” has been published in RJLS. Besides, he is a firefighter in Punjab Emergency Services Department and a poet. His interests include corpus linguistics, morphology and discourse analysis.

Musarrat Azher, Sargodha University

Dr. Musarrat Azher is a Fulbright Post-Doctoral Research Scholar, Texas A & M University, USA and Associate Professor, UOS. Her interests include Pakistani English, Corpus Linguistics, Register Analysis, Multidimensional Analysis.

Muhammad Farukh Arslan, Lecturer, NUML, PhD Scholar, Department of Applied Linguistics, Government College University, Faisalabad, Punjab, Pakistan.

Muhammad Farukh Arslan is a lecturer at National University of Modern Languages, Faisalabad Campus, Pakistan. He also serves as a visiting lecturer in Government College University Faisalabad (GCUF) and is enrolled in the Ph.D. program in the Department of Applied Linguistics, GCUF, Pakistan.

References

Adeeba, F., & Hussain, S. (2011). Experiences in Building Urdu WordNet. Proceedings of the 9th Workshop on Asian Language Resources, pp. 31-25. Retrieved from https://aclanthology.org/W11-3406

Arslan, M. F., Mahmood, M. A., Shoaib, M., Sana, I., & Zunaira, T. (2023). Morphological Description of Nouns in Shahmukhi Punjabi; A Corpus Based Study. Journal of Positive School Psychology, 7(3), P. 1259-1269. https://www.researchgate.net/publication/372549062

Fellbaum, C. (1990). English Verbs as a Semantic Net. International Journal of Lexicography, 3(4), pp. 278–301.

Fellbaum, C., Gross, D., & Miller, K. (1993). Adjectives in WordNet. International Journal of Lexicography, pp. 26-39

Hasan, E., Iqbal, M., Azeemi, Q., & Javeed, A., (2015). An Online Punjabi Shahmukhi Lexical Resource. Science International, 25(3), pp. 2529-2535.

Hashmi, R. S., & Majeed, G. (2014). Saraiki Ethnic Identity: Genesis of Conflict with State. Journal of Political Studies, 21(1), 79-101

Kar, S., & Chakrabarty, A. (2011). Expansion of the First Hindi-Nepali Word-Net based Bi-Lingual Dictionary and the advancement of the Human-Machine Interface. International Journal of Computer Applications, (0975 – 8887) on Electronics, Information and Communication Engineering, pp. 8-11

Kaur, R., Sharma, R. K., Preet, S., & Bhatia, P. (2010). Punjabi WordNet Relations and Categorization of Synsets.

Garcia, M. I. M. (2016). Saraiki: Language or Dialect?. Eurasian Journal of Humanities, 1, pp. 40-53.

Mehta, D. (2021). Part of Speech Tagging – POS Tagging in NLP | byteiota. byteiota | From Bits to Bytes. Retrieved 12 March 2021, from https://byteiota.com/pos-tagging/.

Miller, G. (1995). WordNet. Communications of the ACM, 38(11), pp.39-41.

Miller, G. A. (1993). Nouns in WordNet: A Lexical Inheritance System. International Journal of Lexicography, 3(4), pp. 245–264, Retrieved from https://doi.org/10.1093/ijl/3.4.245

Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4), pp. 235–244.

Moldovan, D. and Novischi, A., (2004). Word sense disambiguation of WordNet glosses. Computer Speech & Language, 18(3), pp. 301-317.

Mushtaq, M., & Shaheen, M. (2017). The Siraiki Province Movement in Punjab, Pakistan: Prospects and Challenges. Journal of the Punjab University Historical Society, 30(2), pp. 139-150.

Prabhu, V., Desai, S., Redkar, H., Prabhugaonkar, N., Nagvenkar, A., & Karmali, R., (2012). An Efficient Database Design for IndoWordNet Development Using Hybrid Approach, pp. 229–236.

Rattna, R. (2011). Creation of Punjabi Word-Net and Punjabi Hindi Bi-Lingual Dictionary.

Raza, G. (2016). Etymology of the Saraiki language name. Journal of Linguistics & Literature, 1(1), pp. 61-81

Gul, S., Azher, M., & Sana, N. (2021). Development of Saraiki WordNet by Mapping of Word Senses: A Corpus-based Approach. Linguistics and Literature Review, 7(2), pp. 47-66. https://doi.org/10.32350/llr.72

Shackle, C. (1977). Siraiki: A Language Movement in Pakistan. Modern Asian Studies, 11(3), pp. 379-403. Retrieved April 19, 2021, from http://www.jstor.org/stable/311504

Shackle, C. (2015). Siraiki language. Encyclopedia Britannica. https://www.britannica.com/topic/Siraiki-language