In order to use GraphAware tools, the first step is to build an NLP pipeline:
CALL ga.nlp.processor.addPipeline({
name:"named_entity_extraction",
textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor',
processingSteps: {tokenize:true, ner:true}
})
Here, we specify the following:
- The pipeline name, named_entity_extraction.
- The text processor to be used. GraphAware supports both Stanford NLP and OpenNLP; here, we are using Stanford models.
- The processing steps:
- Tokenization: Extract tokens from a text. As a first approximation, a token can be seen as a word.
- NER: This is the key step that will identify named entities such as persons or locations.
We can now run this pipeline on the README text by calling the ga.nlp.annotate procedure as follows:
MATCH (n:Document)
CALL ga.nlp.annotate({text: n.text, id: id(n), checkLanguage: false, pipeline : "named_entity_extraction"}) YIELD result
MERGE (n)-[:HAS_ANNOTATED_TEXT...