Developing Lexical Resources of Saraiki Verbs:A Corpus Based Study
Main Article Content
Abstract
Saraiki is an Indo-Aryan language and is recognized as the fourth most widely spoken language in Pakistan. It is extensively used in Pakistan, especially in south Punjab and Sindh, and is also spoken in some parts of Afghanistan and India. The language holds significant historical and geographical importance. Despite numerous studies emphasizing its distinctiveness, Saraiki remains less explored in terms of its unique linguistic features. The current corpus-based study aims to create synsets of Saraiki verbs by establishing an interface for their synonyms. A corpus of three million words has been developed using literary and non-literary sources. Data collection involved sourcing information from online platforms and scanning hard copies of literary and non-literary works, which were then converted into machine-readable files. From the corpus, one hundred high-frequency verbs were selected and categorized based on Fellbaum’s (1993) model, which comprises fifteen files developed according to semantic domains. The verbs falling within these categories were analyzed for their lexico-semantic relations to construct an interface of their synonyms. This study holds significance as it contributes to the development of synsets for verbs, encompassing verb meanings, definitions of associated concepts, example sentences, and lexicosemantic relations. Consequently, this research proves valuable for students, teachers, and researchers of Saraiki, as well as those engaged in the creation of Wordnet.
Downloads
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.