The scientific project "Annotated Corpus of the Megrelian Language with Formal Grammar and Electronic Dictionary" involves the processing of materials collected in Samegrelo during 2022-2023 using the FieldWorks Language Explorer (FLEx, SIL International 2024) program. This program provides data management and facilitates the creation of an online grammar, dictionary, and database, ultimately resulting in an automatically generated Megrelian-English dictionary and a brief formal grammar.

After the data annotation, the annotated texts have been made available online in the form of a database. In this database, or, in other words, in the annotated corpus of the Megrelian language, the following information is included: translations of texts split into sentences in Georgian and English, transcription using IPA (International Phonetic Alphabet), morpheme-based annotation of words based on the Leipzig Glossing Rules and Eurotyp's guiding principles, and POS (part-of-speech) annotation.

At this stage, the FLEx corpus contains 149 texts of various lengths selected from the texts obtained during fieldwork in 2022-2024, comprising 9,235 sentences, 97,329 tokens, and 30,479 unique forms. A dictionary and a brief formal grammar have been compiled during the work on the data.

The Megrelian-English dictionary is based on morphosyntactically annotated data entered into the FLEx environment. It is possible to create lexicons structured around either lexemes or morphemes (roots, affixes), or both. The brief formal grammar is based on the morphological segmentation of words and the classification of morphemes within FLEx.



