Research of Slovenian Language in Lexicography and Lexicology based on Digital Language Resources

Principal Investigator at ZRC SAZU
Project Team
Ljudmila Bokal, BA, Marjeta Humar, PhD, Nevenka Jerman, Nataša Gliha Komac, PhD, Smole Helena, Žganec Mario, Žganec Gros Jerneja
ARIS Project ID

L6-5405
Duration
1 January 2003–31 December 2005
Project Leader

Cvetko-Orešnik Varja
Partners
Scientific Research Centre of Slovenian Academy of Sciences and Arts, ALPINEON R & D

Description

The lexicographic and lexicological analysis of the Slovenian vocabulary, foreseen during the course of the project is based on digital language resources, prepared in the past at the Fran Ramovš Institute of the Slovenian language. The results of this research will be used in the update and expansion of the following monolingual dictionaries: the Desk Dictionary of Standard Slovenian, the Dictionary of Slovenian Synonyms, the Phraseological Dictionary of Slovenian, the Dictionary of Slovenian Verbal Valency, the Lemmatisation Dictionary, various terminological dictionaries (such as the General Technical Dictionary, the Dictionary of Theatre Terms, the Dictionary of Mountaineering, the Dictionary of Biology, the Terminological Dictionary of History of Arts, the Dictionary of Geographical Terms, the Historical Dictionary of Law Terms, the Dictionary of Geological Terms), as well as for lexicographical treatment of two dialectal dictionaries: the Dictionary of the Old Standard Prekmurje Language by V. Novak and the Dictionary of the Kostelsko Speech by J. Gregorič. The above mentioned activities represent an upgrade of the actual research and can be carried out on metodologically current level with new use of digital language resources. The importance of this resources for lexicological and lexicographical research has changed significantly in the past few years. During the selection of the lexical material the digital resources do not serve merely as an auxiliary aid anymore. The decison on headword inclusion or exclusion has to be based on actual corpus frequency, as well as the corpus context, the level of corpus text markup, relevant additional data, style and register variety of the language resources, and the formal depth of electronic lexical data records. Within the framework of the research project a considerable upgrade and expansion of the language resources at the Institute is foreseen. It will be used for the preparation of the planned monolingual dictionaries and for the improvement of the electronic dictionary of wordforms with the speech component. Along with the ortographic transcription, wordforms will be additionally represented both by their phonetic transcription using IPA symbols and their spoken representation in form of a spoken image. A Slovenian text-to-speech system will be used for grapheme-to-allophone transcription and for generation of spoken images of wordforms. The additional wordform representations will introduce an interesting multimedia component into the existing web representation of electronic dictionaries. Furthermore, they will serve for educational purposes by enabling foreigners to get an impression on how a Slovenian wordfrom is pronounced.

Research Project