Formant Combinatorics in Slovenian

Principal Investigator at ZRC SAZU
Boris Kern, PhD, Asst. Prof.
Original Title

Formant Combinatorics in Slovenian
Project Team
Tomaž Erjavec, PhD, Boris Kern, PhD, Asst. Prof. , Nina Ledinek, PhD, Asst. Prof. , Andreja Žele, PhD, member, SAZU, Matej Martinc, Andraž Pelicon, Senja Pollak, PhD, Irena Stramljič Breznik, PhD, Ines Voršič, PhD, Marko Pranjić
ARIS Project ID

J6-3131
Duration
1 October 2021–30 September 2025
Financial Source

Description

The objective of the project was to investigate the combinatorics of word-formational formants, which made it possible to present the characteristics of the word-formation and semantic extension mechanisms of Slovenian on the basis of contemporary language material, including all contemporary dictionaries of Slovenian and contemporary corpora, while integrating state-of-the-art research methods from linguistics and language technologies. Slovenian, like other Slavic languages, is characterised by an exceptionally rich morphemic structure of words resulting from multistage word formation. For example, the adjective mlad ‘young’ yields the noun mladost ‘youth’ at the first stage; this in turn yields the adjective mladosten ‘youthful’ at the second stage; this yields the noun mladostnik ‘adolescent’ at the third stage; and this in turn yields the possessive adjective mladostnikov ‘adolescent’s’ at the fourth stage. The example illustrates the compatibility of the following four suffixal formants: -ost + -en + -ik + -ov. The compatibility of formants is understood as the ability of different word-formational formants to co-occur within multistage word formation, taking into account the semantic extension aspect. In this way, the project established a new field of research in Slovenian linguistics: morphotactics.

The language-technology objective of the project was the pioneering creation of the first training dataset and the first language-technology application enabling the automatic morphemic segmentation of Slovenian words. This is of key importance for the development of semantic language resources and language technologies for Slovenian, and it is also of considerable importance for linguistics. The language-technology objectives further confirmed the strategic relevance of the project, since the development of diverse language technologies is imperative for all modern languages. The results also enabled integration into the international research space, including through comparisons with research on morphologically and genetically more closely related languages (other Slavic languages) as well as less closely related languages (non-Slavic languages).

As part of the project, the 23rd International Scientific Conference of the Commission for Word Formation of the International Committee of Slavists was also organised. It brought together 53 participants, including members of the Commission for Word Formation, collaborators from the KOBOS project, and invited linguists from Slovenia, Germany, Croatia, and Poland, who contributed 43 papers focusing on the diachronic, and especially the synchronic, treatment of multistage derivatives. It is a remarkable achievement that so many researchers devoted their attention to a topic that formed the core of the project proposal.

Understanding morphological segmentation also proved to have considerable potential for improving deep learning models, on which most modern language-technology tools are based. The results of the project have practical value for numerous subsequent applied projects, including speech synthesizers, which are important both from the perspective of including language users from vulnerable social groups and from the perspective of the development of smart devices, robotics, and related technologies.

Project Stages

PHASE 1: DIGITISATION AND FORMALISATION OF EXISTING PRINTED WORD-FORMATIONAL RESOURCES FOR SLOVENIAN

(October 2021 – January 2022)

data were encoded in a purpose-built formalism;
the schema of the word-formational database was developed on the basis of existing resources;
statistically relevant data on the frequency distribution and combinatorics of suffixal morphemes in Slovenian were extracted.

PHASE 2: REVIEW AND EVALUATION OF THE EXTRACTED SET OF COMBINATIONS OF WORD-FORMATIONAL FORMANTS

(January 2022 – June 2022)

the final set of combinations of word-formational formants was established;
typical phonological alternations at morpheme boundaries were inventoried;
morpheme truncations were inventoried.

PHASE 3: USE OF OTHER DICTIONARY RESOURCES

(January 2023 – March 2023)

derivatives of the same type were identified in other dictionary resources;
new types of derivatives and combinations of word-formational formants were recorded.

PHASE 4: ANALYSIS OF THE OBTAINED DATA

(April 2022 – September 2023)

the frequency and combinatorics of suffixal formants in other dictionary resources were analysed;
the semantic functions of word-formational formants and the combinatorial restrictions arising from them were analysed;
the combinatorics of individual components of derivatives were analysed with regard to borrowedness;
the combinatorics of individual components of derivatives were analysed with regard to the connotative value of individual components;
the formation of feminine forms was analysed, especially the distributional regularities of competing formants.

PHASE 5: USE OF CORPUS RESOURCES

(October 2022 – December 2022)

derivatives of the same type were identified in corpus resources;
new types of derivatives and combinations of word-formational formants were recorded;
the combinatorics of suffixal formants in the previously obtained material and in specialised corpora, such as Janes, were compared;
a linguistic analysis of the obtained data was carried out.

PHASE 6: PREPARATION OF THE TRAINING DATASET FOR MORPHEMIC ANALYSIS OF WORDS AND THE USE OF MACHINE LEARNING

(October 2024 – February 2025)

a training dataset for the automatic morphemic analysis of unrestricted vocabulary was prepared;
machine learning was carried out on the basis of these data, including tests of deep-learning methods.

PHASE 7: PUBLICATION OF THE SYNTHESIS OF THE FINDINGS OF THE WORD-FORMATIONAL ANALYSIS

(October 2022 – September 2025)

a synthesis of the analysis of data obtained from dictionary resources and corpora was prepared;
the results of the automatic morphemic segmentation of unrestricted vocabulary were evaluated;
the research results were presented at conferences;
a monograph is being prepared that presents the combinatorics of word-formational formants on contemporary language material and thus the functioning of the word-formation and semantic extension mechanisms in Slovenian.

Results

[1] ERJAVEC, Tomaž, PRANJIĆ, Marko, PELICON, Andraž, KERN, Boris, STRAMLJIČ BREZNIK, Irena, POLLAK, Senja. Automating derivational morphology for Slovenian. V: MEDVEĎ, Marek (ur.), et al. eLex 2023: electronic lexicography in the 21st century (eLex 2023): proceedings of the eLex 2023 conference: [Brno], 27–29 June 2023. Brno: Lexical Computing CZ, 2023. 449–465. COBISS.SI-ID – 158849795

[2] KERN, Boris. Besednodružinski slovar slovenskega jezika kot tujega jezika. V: Slavistična prepletanja 4 [Elektronski vir]. Gjoko NIKOLOVSKI (ur.), Natalija ULČNIK (ur.). Maribor: Univerza v Mariboru, Univerzitetna založba, 2022. 153–168. COBISS.SI-ID – 65082627

[3] KERN, Boris. Feminativi v izsamostalniških besedotvornih nizih. V: JOŽEF-BEG, Jožica (ur.), HOČEVAR, Mia (ur.), KOČNIK, Neža (ur.). Naslavljanje raznolikosti v jeziku in književnosti: [Slovenski slavistični kongres, Maribor, 28.–30. september 2023]. Ljubljana: Zveza društev Slavistično društvo Slovenije, 2023. 197–205. COBISS.SI-ID – 167240707

[4] KERN, Boris. Considering word formation in compiling dictionaries. V: ŠTRKALJ DESPOT, Kristina (ur.). Lexicography and Semantics: proceedings of the XXI EURALEX International Congress, 8–12 October 2024, Cavtat, Croatia. Zagreb: Institut za hrvatski jezik, 2024. 438–448. COBISS.SI-ID – 223546883

[5] KERN, Boris. Priponski nizi izsamostalniških stopenjskih tvorjenk z izhodiščnim obrazilom -iti v slovenščini. Južnoslovenski filolog. 2024, knj. 80, sv. 2. 127–139. DOI: 10.2298/JFI2402127K. COBISS.SI-ID – 224795907

[6] KERN, Boris, LEDINEK, Nina. Izprislovne stopenjske tvorjenke. V: MARUŠIČ, Franc (ur.), et al. Škrabčevi dnevi 13: zbornik prispevkov s simpozija 2023. Nova Gorica: Založba Univerze, 2025. 16–28. COBISS.SI-ID – 178755587

[7] KERN, Boris, UHLIK, Mladen, GABROVŠEK, Dejan. 17. mednarodni slavistični kongres v Parizu. Slavistična revija. 2025, letn. 73, št. 4. 632–638. COBISS.SI-ID – 264217091

[8] KERN, Boris, STRAMLJIČ BREZNIK, Irena. Kombinatorika izhodiščnega obrazila -ati v slovenščini. Slavistična revija. 2025, letn. 73, št. 2. 329–346. DOI: 10.57589/srl.v73i2.4253. COBISS.SI-ID – 242271747

[9] KERN, Boris. Izsamostalniški besedotvorni nizi v slovenščini z vidika morfotaktike. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[10] KERN, Boris. Slovarček terminov s področja stopenjskega besedotvorja v slovanskih jezikih. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[11] KERN, Boris, DIVJAK RACE, Duša, OMAN, Jera, ŽIBRED, Maruša (ur.). Stopenjsko besedotvorje: 23. mednarodna znanstvena konferenca Komisije za besedotvorje pri Mednarodnem slavističnem komiteju: zbornik povzetkov: Ljubljana, 17.–20. 9. 2024 = Multistage word formation: 23rd International Conference of the Commission on Word Formation of the International Committee of Slavists: book of abstracts. Ljubljana: Založba ZRC, 2024. COBISS.SI-ID – 204688899

[12] KULOVEC, Marjetka, JERKO, Boštjan, KERN, Boris. Stopenjske tvorjenke v slovenščini in sestavljene kretnje v slovenskem znakovnem jeziku. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[13] POLLAK, Senja, VORŠIČ, Ines, KERN, Boris, ULČAR, Matej. Novel Slovenian COVID-19 vocabulary from the perspective of naming possibilities and word formation. V: MEDVEĎ, Marek (ur.), et al. eLex 2023: electronic lexicography in the 21st century (eLex 2023): proceedings of the eLex 2023 conference: [Brno], 27–29 June 2023. Brno: Lexical Computing CZ, 2023. 419–438.

[14] PRANJIĆ, Marko, POLLAK, Senja. Advancements in Automatic Morphological Segmentation for Slovenian. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[15] STRAMLJIČ BREZNIK, Irena. Izmedmetna tvorba glagolov – tvorbene značilnosti skupine s pomenom ‘oglašati se’ in ‘govoriti’. V: ARIZANKOVSKA, Lidija (ur.). Tendencii vo zboroobrazuvanjeto na glagolite vo slovenskite jazici = Word-formation tendencies of verbs in Slavic languages = Tendencii slovoobrazovanija glagolov v slavjanskih jazykah: 29 maj – 3 juni 2023 g., Kongresen centar na Univerzitetot „Sv. Kiril i Metodij“ – Skopje vo Ohrid: [zbornik na trudovi]. Skopje: Filološki fakultet "Blaže Koneski", 2024. 327–343. COBISS.SI-ID – 218801155

[16] STRAMLJIČ BREZNIK, Irena, LEDINEK, Nina. Kvantitativni podatki o besedotvornih modelih in priponskih nizih izmedmetnih tvorjenk v Besednodružinskem slovarju slovenskega jezika za iztočnice na b. Slovenski jezik – Slovene linguistic studies. 2024, 16. 59–85. DOI: 10.3986/16.1.03. COBISS.SI-ID – 220034819

[17] STRAMLJIČ BREZNIK, Irena. Značilnosti izmedmetih priponskih nizov v Besednodružinskem slovarju slovenskega jezika za iztočnice na b. Slavia Centralis. 2024, letn. 17, št. 1. 1–17. DOI: 10.18690/scn.17.1.1-17.2024. COBISS.SI-ID – 200354051

[18] STRAMLJIČ BREZNIK, Irena. Ustreznost strojno izluščenih štiripriponskih izpridevniških besedotvornih nizov iz učne množice BSSJ. V: DRAGIĆEVIĆ, Rajna (ur.). Derivaciona gnezda u slovenskim jezicima: Sistemnost tvorbene produktivnosti: tematski blok: XVII međunarodni kongres slavista (Pariz, 25–30. VIII 2025) = Derivational nests in Slavic languages: Systematicity of word formation productivity: thematic block: XVII International Congress of Slavists (Paris, 25–30. August 2025). Beograd: Savez slavističkih društava Srbije, 2025. 163–180. DOI: 10.18485/ssds_mks17_dg.2025.ch8. COBISS.SI-ID – 253354755

[19] STRAMLJIČ BREZNIK, Irena. Primerjava najdaljših tvorbenih nizov za izbrana obrazila izpridevniških tvorjenk z gradivom v Pleteršnikovem slovarju. V: JESENŠEK, Marko (ur.). Imenitnost slovenščine sto let po Pleteršniku: zbornik povzetkov: Pišece, 22. 9. 2025, Pleteršnikova domačija. Maribor: Slavistično društvo, 2025. F. [14]. COBISS.SI-ID – 250472963

[20] STRAMLJIČ BREZNIK, Irena. Značilnosti prvostopenjskih izpridevniških samostalniških izpeljank in tvorbena kombinatorika obrazil -ica ter -ec v slovenščini. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[21] ŽELE, Andreja, LEDINEK, Nina, VORŠIČ, Ines. Med besedotvorjem in skladnjo: kolokabilnost slovenskih glagolskih zvez tipa GlagPrisl in njihova sistemska pretvorba v zveze tipa PridSam. V: KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932.

[22] KERN, Boris (ur.). Stopenjsko besedotvorje v slovanskih jezikih. Ljubljana: Založba ZRC, ZRC SAZU, 2026. DOI: 10.3986/9789610510932. COBISS.SI-ID - 269400835

Research Project