Named Entity Recognition Using Wikipedia In Bibliography

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes.

We first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy.

We transform the links between articles into ne annotations by projecting the target articleʼs classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards.

We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Our approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 [61], Mika et al., 2008 [46]) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.

Named entity recognition in Wikipedia

Authors: Dominic BalasuriyaUniversity of Sydney, NSW, Australia
Nicky RinglandUniversity of Sydney, NSW, Australia
Joel NothmanUniversity of Sydney, NSW, Australia
Tara MurphyUniversity of Sydney, NSW, Australia
James R. CurranUniversity of Sydney, NSW, Australia
2009 Article
· Citation Count: 8
· Downloads (cumulative): 550
· Downloads (12 Months): 59
· Downloads (6 Weeks): 4

Published in:
· Proceeding
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Pages 10-18

Suntec, Singapore — August 07 - 07, 2009
Association for Computational LinguisticsStroudsburg, PA, USA ©2009
table of contents ISBN: 978-1-932432-55-8

algorithmscontent analysis and feature selectiondesigndocument representationevaluation of retrieval resultslanguageslearning paradigmsmachine learningnatural language processingperformance

Powered by

The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us

One thought on “Named Entity Recognition Using Wikipedia In Bibliography

Leave a Reply

Your email address will not be published. Required fields are marked *