Developing Lexical Resources of Saraiki Verbs: A Corpus Based Study
Main Article Content
Abstract
Saraiki is an Indo-Aryan language and is recognized as the fourth most widely spoken language in Pakistan. It is extensively used in Pakistan, especially in south Punjab and Sindh, and is also spoken in some parts of Afghanistan and India. The language holds significant historical and geographical importance. Despite numerous studies emphasizing its distinctiveness, Saraiki remains less explored in terms of its unique linguistic features. The current corpus-based study aims to create synsets of Saraiki verbs by establishing an interface for their synonyms. A corpus of three million words has been developed using literary and non-literary sources. Data collection involved sourcing information from online platforms and scanning hard copies of literary and non-literary works, which were then converted into machine-readable files. From the corpus, one hundred high-frequency verbs were selected and categorized based on Fellbaum’s (1993) model, which comprises fifteen files developed according to semantic domains. The verbs falling within these categories were analyzed for their lexico-semantic relations to construct an interface of their synonyms. This study holds significance as it contributes to the development of synsets for verbs, encompassing verb meanings, definitions of associated concepts, example sentences, and lexicosemantic relations. Consequently, this research proves valuable for students, teachers, and researchers of Saraiki, as well as those engaged in the creation of Wordnet.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
License Terms
All articles published by MARS Publishers are made immediately available worldwide under an open access license. This means:
- everyone has free and unlimited access to the full-text of all articles published in MARS Publishers' journals;
- everyone is free to re-use the published material if proper accreditation/citation of the original publication is given.
References
Adeeba, F., & Hussain, S. (2011). Experiences in Building Urdu WordNet. Proceedings of the 9th Workshop on Asian Language Resources, pp. 31-25. Retrieved from https://aclanthology.org/W11-3406
Arslan, M. F., Mahmood, M. A., Shoaib, M., Sana, I., & Zunaira, T. (2023). Morphological Description of Nouns in Shahmukhi Punjabi; A Corpus Based Study. Journal of Positive School Psychology, 7(3), P. 1259-1269. https://www.researchgate.net/publication/372549062
Fellbaum, C. (1990). English Verbs as a Semantic Net. International Journal of Lexicography, 3(4), pp. 278–301.
Fellbaum, C., Gross, D., & Miller, K. (1993). Adjectives in WordNet. International Journal of Lexicography, pp. 26-39
Hasan, E., Iqbal, M., Azeemi, Q., & Javeed, A., (2015). An Online Punjabi Shahmukhi Lexical Resource. Science International, 25(3), pp. 2529-2535.
Hashmi, R. S., & Majeed, G. (2014). Saraiki Ethnic Identity: Genesis of Conflict with State. Journal of Political Studies, 21(1), 79-101
Kar, S., & Chakrabarty, A. (2011). Expansion of the First Hindi-Nepali Word-Net based Bi-Lingual Dictionary and the advancement of the Human-Machine Interface. International Journal of Computer Applications, (0975 – 8887) on Electronics, Information and Communication Engineering, pp. 8-11
Kaur, R., Sharma, R. K., Preet, S., & Bhatia, P. (2010). Punjabi WordNet Relations and Categorization of Synsets.
Garcia, M. I. M. (2016). Saraiki: Language or Dialect?. Eurasian Journal of Humanities, 1, pp. 40-53.
Mehta, D. (2021). Part of Speech Tagging – POS Tagging in NLP | byteiota. byteiota | From Bits to Bytes. Retrieved 12 March 2021, from https://byteiota.com/pos-tagging/.
Miller, G. (1995). WordNet. Communications of the ACM, 38(11), pp.39-41.
Miller, G. A. (1993). Nouns in WordNet: A Lexical Inheritance System. International Journal of Lexicography, 3(4), pp. 245–264, Retrieved from https://doi.org/10.1093/ijl/3.4.245
Miller, G., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K., (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography, 3(4), pp. 235–244.
Moldovan, D. and Novischi, A., (2004). Word sense disambiguation of WordNet glosses. Computer Speech & Language, 18(3), pp. 301-317.
Mushtaq, M., & Shaheen, M. (2017). The Siraiki Province Movement in Punjab, Pakistan: Prospects and Challenges. Journal of the Punjab University Historical Society, 30(2), pp. 139-150.
Prabhu, V., Desai, S., Redkar, H., Prabhugaonkar, N., Nagvenkar, A., & Karmali, R., (2012). An Efficient Database Design for IndoWordNet Development Using Hybrid Approach, pp. 229–236.
Rattna, R. (2011). Creation of Punjabi Word-Net and Punjabi Hindi Bi-Lingual Dictionary.
Raza, G. (2016). Etymology of the Saraiki language name. Journal of Linguistics & Literature, 1(1), pp. 61-81
Gul, S., Azher, M., & Sana, N. (2021). Development of Saraiki WordNet by Mapping of Word Senses: A Corpus-based Approach. Linguistics and Literature Review, 7(2), pp. 47-66. https://doi.org/10.32350/llr.72
Shackle, C. (1977). Siraiki: A Language Movement in Pakistan. Modern Asian Studies, 11(3), pp. 379-403. Retrieved April 19, 2021, from http://www.jstor.org/stable/311504
Shackle, C. (2015). Siraiki language. Encyclopedia Britannica. https://www.britannica.com/topic/Siraiki-language