Information_extraction Information_extraction

Information extraction - Definition and Overview

Related Words: Cobol, Esp, Fortran, Acquaintance, Advice, Answer, Arraignment, Assembler, Bail, Bit, Blame

Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents.

A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. Current approaches to IE use natural language processing techniques that focus on very restricted domains. For example, the Message Understanding Conference (MUC) is a competition-based conference that focused on the following domains in the past:

  • MUC-1 (1987), MUC-2 (1989): Naval operations messages.
  • MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries.
  • MUC-5 (1993): Joint ventures and microelectronics domain.
  • MUC-6 (1995): News articles on management changes.
  • MUC-7 (1998): Satellite launch reports.

Typical subtasks of IE are:

  • Named Entity Recognition: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.
  • Coreference: identification chains of noun phrases that refer to the same object. For example, anaphora is a type of coreference.

Copyright 2009 WordIQ.com - Privacy Policy  :: Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the this Wikipedia article.