Ontological Document Reading

An Experience Report

  • David W. Embley FamilySearch International, Lehi, Utah, USA
  • Stephen W. Liddle Brigham Young University, Provo, Utah, USA
  • Deryle W. Lonsdale Brigham Young University, Provo, Utah, USA
  • Scott N. Woodfield Brigham Young University, Provo, Utah, USA
Keywords: Document Reading, Information Extraction, Conceptual Modeling, Ontology Conceptualization, Extraction Ontology


Ontological document reading is defined as automatically and appropriately populating a conceptual model representing an ontological conceptualization of some fragment of the real world. Appropriately populating the conceptualization involves not only extracting the information with respect to the declared object and relationship sets of the conceptual model but also involves checking the extracted information for real-world constraint violations, standardizing the data, and inferring the unwritten information that a document author intended to convey. Appropriately populating an ontology may, in addition, require adjustments to the ontology itself. This approach to document reading is presented in terms of an effort to build a system to extract the genealogical information in family history books. The status of the reading system is reported. Also explained is how the generated results can be imported into and thus contribute to the construction of a large repository of world-wide family interrelationships. The reading system’s potential use for constructing similar knowledge repositories in other domains is foreshadowed.

Invited Contribution