Grammar implementations which
are guided by linguistic theory will normally lack coverage of even some
well-formed utterances, since no current theory exhaustively characterizes all
of the phenomena in any language. For
many uses of a grammar, approximate or robust analyses of the out-of-grammar
utterances would be better than nothing, and a variety of approaches have been
developed for such robust parsing. In
this paper I present an implemented method which adds two simple
"bridging" rules to an existing broad-coverage grammar, the English
Resource Grammar, allowing any two constituents to combine. This method relies on a parser which can efficiently
pack the full parse forest for an utterance, and then selectively unpack the
most likely N analyses guided by a statistical model trained on a manually
constructed treebank. Initial
experimental results with two types of annotated corpus data show both promise
and some remaining challenges for this approach. |