What is NER? How is related to information extraction?

776    Asked by FujiwaraBan in Data Science , Asked on Jan 2, 2020

I am currently trying to explain a complex logic topic to someone who is new in natural language processing. How can I describe the concept of NER( named entity recognition) and its relationship with the information extraction in easy way? 

Answered by Fujiwara Ban

The named entity recognition is a natural language processing technique which is generally used in identifying and also the classifying the named entities in text into predefined categories such as person name, organization name, location, dates and numerous expression. It can play a crucial role in information extraction by using the technique of identifying and extracting specific pieces of information from unstructured text data. NER involves machine learning algorithms and linguistic rules for the purpose of recognising and tagging named entities in text.

Here is the Python coding example given:-

First try to make sure that you have spaCy install and download the English language model (pythn- m spaCy download en_core_web_sm):-

Import spacy
# Load the English language model
Nlp = spacy.load(“en_core_web_sm”)
# Sample text corpus for NER and information extraction
Text_corpus = “””
Apple Inc. is an American multinational technology company headquartered in Cupertino, California.
It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne on April 1, 1976.
The company designs, develops, and sells consumer electronics, computer software, and online services.
Apple is known for its innovative products like the iPhone, iPad, Mac, and Apple Watch.
“””
# Process the text corpus using spaCy NLP pipeline
Doc = nlp(text_corpus)
# Initialize dictionaries to store extracted information
Persons = {}
Organizations = {}
Dates = {}
# Iterate through entities identified by NER
For ent in doc.ents:
    If ent.label_ == “PERSON”:
        Persons[ent.text] = persons.get(ent.text, 0) + 1
    Elif ent.label_ == “ORG”:
        Organizations[ent.text] = organizations.get(ent.text, 0) + 1
    Elif ent.label_ == “DATE”:
        Dates[ent.text] = dates.get(ent.text, 0) + 1
# Print extracted information
Print(“Persons:”)
For person, count in persons.items():
    Print(f”{person}: {count}”)
Print(“
Organizations:”)
For org, count in organizations.items():
    Print(f”{org}: {count}”)
Print(“
Dates:”)
For date, count in dates.items():
    Print(f”{date}: {count}”)
Here is the example given in java programming language:-
Import edu.stanford.nlp.ling.CoreAnnotations;
Import edu.stanford.nlp.ling.CoreLabel;
Import edu.stanford.nlp.pipeline.Annotation;
Import edu.stanford.nlp.pipeline.StanfordCoreNLP;
Import edu.stanford.nlp.util.CoreMap;
Import java.util.HashMap;
Import java.util.List;
Import java.util.Map;
Import java.util.Properties;
Public class NERExample {
    Public static void main(String[] args) {
        // Set up Stanford NLP pipeline
        Properties props = new Properties();
        Props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner”);
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // Sample text for NER and information extraction
        String text = “Apple Inc. is an American multinational technology company headquartered in Cupertino, California. “ +
                “It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne on April 1, 1976. “ +
                “The company designs, develops, and sells consumer electronics, computer software, and online services. “ +
                “Apple is known for its innovative products like the iPhone, iPad, Mac, and Apple Watch.”;
        // Create an Annotation with the text
        Annotation document = new Annotation(text);
        // Run the Stanford NLP pipeline
        Pipeline.annotate(document);
        // Initialize maps to store extracted information
        Map persons = new HashMap<>();
        Map organizations = new HashMap<>();
        Map dates = new HashMap<>();
        // Iterate through sentences in the document
        List sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
        For (CoreMap sentence : sentences) {
            // Iterate through tokens in the sentence
            For (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
                String namedEntity = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);
                String word = token.get(CoreAnnotations.TextAnnotation.class);
                If (!namedEntity.equals(“O”)) { // O indicates non-entity
                    If (namedEntity.equals(“PERSON”)) {
                        Persons.put(word, persons.getOrDefault(word, 0) + 1);
                    } else if (namedEntity.equals(“ORGANIZATION”)) {
                        Organizations.put(word, organizations.getOrDefault(word, 0) + 1);
                    } else if (namedEntity.equals(“DATE”)) {
                        Dates.put(word, dates.getOrDefault(word, 0) + 1);
                    }
                }
            }
        }

        // Print extracted information

        System.out.println(“Persons:”);
        For (Map.Entry entry : persons.entrySet()) {
            System.out.println(entry.getKey() + “: “ + entry.getValue());
        }
        System.out.println(“
Organizations:”);
        For (Map.Entry entry : organizations.entrySet()) {
            System.out.println(entry.getKey() + “: “ + entry.getValue());
        }
        System.out.println(“
Dates:”);
        For (Map.Entry entry : dates.entrySet()) {
            System.out.println(entry.getKey() + “: “ + entry.getValue());
        }
    }

}



Your Answer

Interviews

Parent Categories