Tomaž Erjavec, PhD

Senior Research Associate, Associate Professor of Language Technologies, Consultant at the Centre for Language Technologies Development

+386 1 477 35 07

Jožef Stefan Institute, Jamova cesta 39, Ljubljana

Researcher's IDs
05023 (ARRS)

Tomaž Erjavec (12 August 1960) is employed part-time at ZRC SAZU, Fran Ramovš Institute of the Slovenian Language and primarily employed at the Department of Knowledge Technologies, Jožef Stefan Institute. His research takes place in the areas of Digital Humanities and Computational Linguistics. In the scope of the Centre for Language Technologies Development, ZRC SAZU, he advises on how to digitally encode dictionaries so that they are in line with international standards.

Research Areas

  • standards for encoding of language resources
  • development of language corpora
  • encoding of complex digital editions
  • technical and legal questions of language resource distribution
  • development of language technologies for annotating Slovenian language texts

Education, Academic and Scientific Titles

  • 2019: senior research fellow, ZRC SAZU, Fran Ramovš Institute of the Slovenian Language
  • 2019: expert and research counsellor, Jožef Stefan Institute
  • 2015: associate professor of Language Technologies, University of Ljubljana
  • 1997: PhD thesis: Unification, inheritance hierarchies and paradigms in the formalisation of language morphology, Faculty of computer science and informatics, Ljubljana University
  • 1992: MSc thesis: Treatments of Slovene verb morphology in inheritance models, Centre for Cognitive science, University of Edinburg
  • 1990: MSc thesis: Computer systems for Slovenian language morphological analysis and synthesis, Faculty of Computer Science and Informatics, University of Ljubljana
  • 1984: BSc, Faculty of Computer Science and Informatics, University of Ljubljana

Employments, Leadership Positions and Competences


  • 2018–: part-time employment at the Fran Ramovš Institute of the Slovenian Language ZRC SAZU
  • 1984–: full-time employment at the Jožef Stefan Institute
  • 2014–2020: professor, Jožef Stefan International Postgraduate School
  • 2006–2013: visiting professor at the University of Graz
  • 2005–2006, 2008–2012: visiting professor, University of Nova Gorica
  • 2004: visiting researcher, EU Joint Research Centre, Ispra (6 months)
  • 2002: visiting researcher, University of Tokyo (6 months)
  • 1992–1993: research assistant at the project »Integrated Language Database«, University of Edinburgh (1 year)

Leadership Positions (from 2000 onwards):

  • 2017-2020: J7-8280 FRENK - Resources, methods and tools for the understanding, identification and classification of various forms of socially unacceptable discourse in the information society: ARRS basic research project (principal investigator)
  • 2016-2018: L6-7134 Forbidden Books in the Slovenian Lands in the Early Modern Period: ARRS applied research project (principal investigator at JSI)
  • 2016-2018: J6-7094 KAS - Slovene scientific texts: resources and description: ARRS basic research project (principal investigator)
  • 2014–: national coordinator of the Slovenian research infrastructure for language resources and technologies CLARIN.SI
  • 2014-2017: J6-6842 JANES - Resources, Tools and Methods for the Research of Nonstandard Internet Slovene: ARRS basic research project (principal investigator at JSI)
  • 2014-2015: Compiling Corpora and Lexica of Non-Standard Serbian and Slovenian: Bilateral project SI-SR (principal investigator on the Slovenian side)
  • 2014-2015: Constructing a Bilingual Lexicon of Closely-Related Languages From Existing Language Resources: Bilateral project SI-HR (principal investigator on the Slovenian side)
  • 2013-2017: PARSEME: PARSing and Multi-Word Expressions: IC1207 COST Action (Slovenian representative)
  • 2013-2016: J6-5561 Slovenian Literature in Unknown Early Modern Manuscripts. Information Technology Aided Analyses and Scholarly Editions: ARRS basic research project (principal investigator at JSI)
  • 2011-2015: NetWordS: The European Network on Word Structure: EU ESF Network (Slovenian representative)
  • 2011-2014: J6-4019 The Leading Humanists in the Slovenian Territory between the 16th and mid-19th Centuries and their Social and Cultural Environment: ARRS basic research project (principal investigator at JSI)
  • 2011-2013: Developing Models of Historical Slovenian: Google Inc. (principal investigator at JSI)
  • 2010-2012: IMPACT: Improving Access to Text: EU FP7 (principal investigator at JSI)
  • 2009–2010: Definition of Syntactic-Semantic Structure of Slovene Verb: bilateral French-Slovenian project Proteus (principal investigator on the Slovenian side)
  • 2009–2012: Slovene Translation Studies - Resources and Research: ARRS basic research project (principal investigator at JSI)
  • 2008–2011: FlaReNet - Fostering Language Resources Network: EU FP7 IST NoE (Slovenian representative)
  • 2008–2010: Mondilex - Conceptual Modelling of Networking of Centres for High-Quality Research in Slavic Lexicography and their Digital Resources: EU FP7 SSA (principal investigator on the Slovenian side)
  • 2008–2009: Japanese-Slovene Resources for Students of Japanese: Japanese-Slovenian bilateral project JSPS (principal investigator on the Slovenian side)
  • 2008-2011: Unknown Manuscripts of the 17th and 18th Centuries: Information and Technology Supported Registry, Text-Critical Editions and Analyses: ARRS applied research project (principal investigator at JSI)
  • 2007–2009: Linguistic Annotation of Slovenian Language: Methods and Resources: ARRS basic research project (principal investigator)
  • 2007–2008: Development of Language Resources and Models for Machine Translation for South-Slavic and Balkan Languages: EU SEE-ERA.NET (principal investigator on the Slovenian side)
  • 2007–2009: Digital Text Centre with Multimedia Communication: Slovenian target research project (principal investigator at JSI)
  • 2006–2008: VoiceTRAN II: Multilingual Speech Communicator: Slovenian target research project (principal investigator at JSI)
  • 2004–2005: Development of Linguistic Resources for Machine Translation between Slovene and Serbian: Serbian-Slovenian bilateral project (principal investigator on the Slovenian side)
  • 2004–2007: Scholarly Digital Editions of Slovenian Literature: ARRS applied project (principal investigator at JSI)
  • 2004–2006: Development of the Slovenian Corpora Network: Slovenian target research project (principal investigator at JSI)
  • 2004–2006: VoiceTRAN: Multilingual Speech Communicator: Slovenian target research project (principal investigator at JSI)
  • 2002: Localisation of Open-Source Spell-Checker Ispell and Aspell: Slovenian target research project (principal investigator at JSI)


  • MSc supervisor: Helena Plahuta, Mateja Košir
  • PhD co-supervisor: Jernej Vičič


  • 2015: invited lecture at ConSOLE XXIII, The 23rd Conference of the Student Organization of Linguistics in Europe, Paris
  • 2015: chair of the programme committee of the Workshop on Replicability and Reusability in Natural Language Processing, IJCAI, the 15th International Conference on Artificial Intelligence, Buenos Aires
  • 2014: chair of the programme committee of the session »Phonology, Morphology, and Segmentation« at EMNLP, The Conference on Empirical Methods in Natural Language Processing, Doha
  • 2013: invited lecture at the Seventh International Conference NLP, Corpus Linguistics, E-Learning, Bratislava
  • 2013: chair of the programme committee of the session on NLP for the languages of Central and Eastern Europe and the Balkans« at ACL, The Conference of the Association for Computational Linguistics, Sofia
  • 2007: chair of the programme committee of ESSLLI, The 19th European Summer School in Logic, Language and Information, Dublin

Work in Editorial Boards and Expert Commissions:

  • 2013–: technical editor of the Slovenian Biography portal at SAZU
  • 2012: member of the expert group for the preparation of the National programme for the language policy of the Republic of Slovenia
  • 2005–2006: member of the expert group of the European Observatory for the Humanities and Social Sciences, European Strategy Forum on Research Infrastructures (ESFRI)
  • 2005–: member of the editorial board of the Journal of Language Resources and Evaluation, Springer
  • 2004–2016: regular reviewer for projects and project proposals in EU, Croatia, Czech Republic and Poland in the area of language technologies
  • 2002–: member of the Technical committee »Informatics, documentation and genera terminology« at the Slovenian Institute for Standardisation, mainly involved in contributing to the development and confirmation of standards of ISO TC37/SC4 Language resource management
  • 2001–2005: member of the editorial board of the Journal of the Computers and the Humanities, Kluwer
  • 2000-2002: member of the Council of the Text Encoding Initiative Consortium
  • 2000-2002: member of the board of the European Chapter of the Association for Computational Linguistics
  • 1998–: member of the editorial board of the International Journal of Corpus Linguistics, John Benjamins
  • 1998–2006: founding president of the Slovenian Language Technologies Society

2020: The Janes project : language resources and tools for Slovene user generated content, Darja Fišer, Nikola Ljubešić, Tomaž Erjavec

2017: MULTEXT-East, Tomaž Erjavec

2017: Slavic corpus and computational linguistics, Divjak, Dagmar, Sharoff, Tomaž Erjavec

2016: Modernising historical Slovene words, Yves Scherrer, Tomaž Erjavec

2015: The IMP historical Slovene language resources, Tomaž Erjavec

2004: Machine learning of morphosyntactic structure: lemmatizing unknown Slovene words, Tomaž Erjavec, Sašo Džeroski


Research areas
Jezikoslovje H350
Umetna inteligenca P176

computer processing of natural language • encoding standards for language resources • corpus linguistics • digital humanities