Category: Lexicons

  • Serbian lexicon: srLex

    srLex is an inflectional lexicon of Serbian.
    The size of the lexicon is 169,328 lemmas, or 6,905,941 surface forms.
    Each entry in the lexicon consists of a (wordform, lemma, MSD, MSD features, UPOS, morphological features, absolute frequency, in-million frequency) 8-tuple. The frequencies were estimated on the Serbian web corpus srWaC.

    The set of morphosyntactic tags used in the lexicon follows the MULTEXT-East V6 tagset for Serbo-Croatian macro-language, available here.

    Authors
    Nikola Ljubešić
    Availability
    For local use, srLex can be downloaded as a raw text file here.
    srLex can also be accessed and queried via our web services, which can also be used as an API (application programming interface).
    Publications
    The lexicon and its construction process have been described in detail in the following paper:
    Nikola Ljubešić, Filip Klubička, Željko Agić, Ivo-Pavao Jazbec (2016). New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia. [Link] [.bib]
  • Croatian lexicon: hrLex

    hrLex is an inflectional lexicon of Croatian.
    The size of the lexicon is 164,206 lemmas, or 6,427,709 4,970,520 surface forms.
    Each entry in the lexicon consists of a (word form, lemma, MSD, MSD features, UPOS, morphological features, absolute frequency, in-million frequency) 8-tuple. The frequencies were estimated on the Croatian web corpus hrWaC.

    The set of morphosyntactic tags used in the lexicon follows the MULTEXT-East V6 tagset for Serbo-Croatian macro-language, available here.

    Authors
    Nikola Ljubešić
    Availability
    For local use, hrLex can be downloaded as a raw text file here.
    hrLex can also be accessed and queried via our web services, which can also be used as an API (application programming interface).
    Publications
    The lexicon and its construction process have been described in detail in the following paper:
    Nikola Ljubešić, Filip Klubička, Željko Agić, Ivo-Pavao Jazbec (2016). New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). Portorož, Slovenia. [Link] [.bib]