Multilingual Named Entity Recognition for Historical Texts

Revision as of 14:24, 11 May 2026 by NeoWiki (talk | contribs) (Importing NeoWiki demo data)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

"Multilingual Named Entity Recognition for Historical Texts" proposes a training and evaluation methodology for NER models that must cope with the spelling variation, code-switching, and non-standard orthography typical of pre-modern European sources. The paper benchmarks several transformer-based architectures against a curated set of historical corpora.

The work underpins the Multilingual NER Toolkit project, which packages the methodology as reusable models and evaluation harnesses for downstream digital humanities use.