NLP - Entity Recognition and Extraction in Hebrew Texts

 
 
1. NLP - Entity Recognition and Extraction in Hebrew Texts
PI: Dr. Reut Tsarfaty
Department of Mathematics and Computer Science
 
Project Description: Building an automatic Hebrew text analyzer for recognizing named entities of different types: Person, Location, Organization, etc. The automatic analyzer will be based on sequence labeling models and will make use of statistical learning. In order to bootstrap the statistical learning model, a small sample of examples will have to be manually analyzed.
Expected Outputs: A statistical, robust and broad-coverage, named entity recognizer, that fits the peculiar structure of Hebrew texts, and its use for extracting various dictionaries of named entities (per period, genre, etc).
Future Implications:  Named entity recognition is the basis for any smart search or build-up of a knowledge graph. It is also the first step towards relational information extraction (tuples of the form “X does Y to Z”) that can aid the search of events in time, and not only static entities.
Development Notes: The project will build upon the technology for morphological analysis and disambiguation that is currently developed in Dr. Reut Tsarfaty’s lab, and the annotated resources that are provided by the MILA center at the Technion. Licensing issues are derived from the original licenses of these projects.
 

Team Members

Dr. Reut Tsarfaty is Associate Professor at the Computer science department at Bar-Ilan University and a Research Scientist at AI2. During  2014-2019 she was a senior lecturer at the Computer Science Department and the head of the ONLP Lab at the Open University of Israel. Dr. Tsarfaty holds a B.Sc .from the Technion and MSc./PhD. from the Institute for Logic Language and Computation (ILLC) at the University of Amsterdam. She held postdoctoral research fellowships at Uppsala University in Sweden and at the Weizmann Institute. Her research focuses on statistical parsing, broadly interpreted to cover morphological, syntactic and semantic parsing ,and their applications, including (but not limited to) natural language programming, automated essay scoring, and the analysis and generation of opinionated text in social media. The research in Dr. Tsarfaty's lab is kindly supported by an ERC Starting Grant #677352 and an ISF individual grant #1739/26
 
Dan Bareket
 
Publications related to this project:
Dan Bareket and Reut Tsarfaty, Morphologically-Aware Named Entity Recognition (NER) for Modern Hebrew, ISCOL 2019 – The 2019 Israel Seminar of Computational Linguistics, IBM Research - Haifa, Israel.