Understanding Constituency Parsing
Language is defined by some syntactic and semantic rules. Language consists of Sentences and it follows a structure.
Why do we need a Sentence Structure?
- We need to understand sentence structure in order to be able to interpret language correctly
- Humans communicate complex ideas by composing words together into bigger units.
- We need to know what is connected to what.
Here we will be taking on English Language. Here Sentences consists of phrases such as
Noun Phrase (NP)
Verb Phrase (VP) etc.
What is Constituency Parsing?
Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-phases also known as constituents. These Phrases belong to one of the phrases define above.
Let us Understand by taking an example sentence.
“ The bank of the river nile was very fertile .”
Here the node are the POS tag, from the WSJ standard. Whereas the intermediate is the phrases.
Constituency parsers help us to break the sentences into their constituent.
How to build a constituency Parser?
We can build a constituency parser in two ways.
- Using Phrase Structure Grammer
- A Context-Free Grammar for English
Let us write a simple CFG for constituency parsing for the above sentence. Here we define to parse only Noun Phrases.
Define a basic Noun Phrase Extraction grammar rule
Tokenize sentence and get pos tokens.
Apply Constituency Rules.
For some sentences, there could be more than 1 parse tree. It is sometimes hard to tackle this ambiguity. Here comes the ML-based Approach and hybrid approach.
One of the parsers by Berkey based on Self Attention Network is what I have liked and pretty good.
Import benpar and spacy. Here I am using TensorFlow 1.X .
Summary:-
Constituency parsing is breaking the sentences into logical phrases that can be better analyzed and understand by Humans. It can help in entity extraction tasks and other semantic relationships between sentences. Notebook Link.