Developing Lexical Resources of Saraiki Verbs:A Corpus Based Study

Main Article Content

Muhammad Awais
Musarrat Azher
Muhammad Farukh Arslan

Abstract

Saraiki is an Indo-Aryan language and is recognized as the fourth most widely spoken language in Pakistan. It is extensively used in Pakistan, especially in south Punjab and Sindh, and is also spoken in some parts of Afghanistan and India. The language holds significant historical and geographical importance. Despite numerous studies emphasizing its distinctiveness, Saraiki remains less explored in terms of its unique linguistic features. The current corpus-based study aims to create synsets of Saraiki verbs by establishing an interface for their synonyms. A corpus of three million words has been developed using literary and non-literary sources. Data collection involved sourcing information from online platforms and scanning hard copies of literary and non-literary works, which were then converted into machine-readable files. From the corpus, one hundred high-frequency verbs were selected and categorized based on Fellbaum’s (1993) model, which comprises fifteen files developed according to semantic domains. The verbs falling within these categories were analyzed for their lexico-semantic relations to construct an interface of their synonyms. This study holds significance as it contributes to the development of synsets for verbs, encompassing verb meanings, definitions of associated concepts, example sentences, and lexicosemantic relations. Consequently, this research proves valuable for students, teachers, and researchers of Saraiki, as well as those engaged in the creation of Wordnet.

Downloads

Download data is not yet available.

Article Details

Section

Articles