1. NLP - Entity Recognition and Extraction in Hebrew Texts
PI: Dr. Reut Tsarfaty
Department of Mathematics and Computer Science
Project Description: Building an automatic Hebrew text analyzer for recognizing named entities of different types: Person, Location, Organization, etc. The automatic analyzer will be based on sequence labeling models and will make use of statistical learning. In order to bootstrap the statistical learning model, a small sample of examples will have to be manually analyzed.
Expected Outputs: A statistical, robust and broad-coverage, named entity recognizer, that fits the peculiar structure of Hebrew texts, and its use for extracting various dictionaries of named entities (per period, genre, etc).
Future Implications: Named entity recognition is the basis for any smart search or build-up of a knowledge graph. It is also the first step towards relational information extraction (tuples of the form “X does Y to Z”) that can aid the search of events in time, and not only static entities.
Development Notes: The project will build upon the technology for morphological analysis and disambiguation that is currently developed in Dr. Reut Tsarfaty’s lab, and the annotated resources that are provided by the MILA center at the Technion. Licensing issues are derived from the original licenses of these projects.