What is NER? How is related to information extraction?

1.2K Asked by FujiwaraBan in Data Science , Asked on Jan 2, 2020

I am currently trying to explain a complex logic topic to someone who is new in natural language processing. How can I describe the concept of NER( named entity recognition) and its relationship with the information extraction in easy way?

Answered by Fujiwara Ban

The named entity recognition is a natural language processing technique which is generally used in identifying and also the classifying the named entities in text into predefined categories such as person name, organization name, location, dates and numerous expression. It can play a crucial role in information extraction by using the technique of identifying and extracting specific pieces of information from unstructured text data. NER involves machine learning algorithms and linguistic rules for the purpose of recognising and tagging named entities in text.

Here is the Python coding example given:-

First try to make sure that you have spaCy install and download the English language model (pythn- m spaCy download en_core_web_sm):-

Import spacy

# Load the English language model

Nlp = spacy.load(“en_core_web_sm”)

# Sample text corpus for NER and information extraction

Text_corpus = “””

Apple Inc. is an American multinational technology company headquartered in Cupertino, California. 

It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne on April 1, 1976. 

The company designs, develops, and sells consumer electronics, computer software, and online services. 

Apple is known for its innovative products like the iPhone, iPad, Mac, and Apple Watch.

“””

# Process the text corpus using spaCy NLP pipeline

Doc = nlp(text_corpus)

# Initialize dictionaries to store extracted information

Persons = {}

Organizations = {}

Dates = {}

# Iterate through entities identified by NER

For ent in doc.ents:

    If ent.label_ == “PERSON”:

        Persons[ent.text] = persons.get(ent.text, 0) + 1

    Elif ent.label_ == “ORG”:

        Organizations[ent.text] = organizations.get(ent.text, 0) + 1

    Elif ent.label_ == “DATE”:

        Dates[ent.text] = dates.get(ent.text, 0) + 1

# Print extracted information

Print(“Persons:”)

For person, count in persons.items():

    Print(f”{person}: {count}”)

Print(“

Organizations:”)

For org, count in organizations.items():

    Print(f”{org}: {count}”)

Print(“

Dates:”)

For date, count in dates.items():

    Print(f”{date}: {count}”)

Here is the example given in java programming language:-

Import edu.stanford.nlp.ling.CoreAnnotations;

Import edu.stanford.nlp.ling.CoreLabel;

Import edu.stanford.nlp.pipeline.Annotation;

Import edu.stanford.nlp.pipeline.StanfordCoreNLP;

Import edu.stanford.nlp.util.CoreMap;

Import java.util.HashMap;

Import java.util.List;

Import java.util.Map;

Import java.util.Properties;

Public class NERExample {

    Public static void main(String[] args) {

        // Set up Stanford NLP pipeline

        Properties props = new Properties();

        Props.setProperty(“annotators”, “tokenize, ssplit, pos, lemma, ner”);

        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // Sample text for NER and information extraction

        String text = “Apple Inc. is an American multinational technology company headquartered in Cupertino, California. “ +

                “It was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne on April 1, 1976. “ +

                “The company designs, develops, and sells consumer electronics, computer software, and online services. “ +

                “Apple is known for its innovative products like the iPhone, iPad, Mac, and Apple Watch.”;

        // Create an Annotation with the text

        Annotation document = new Annotation(text);

        // Run the Stanford NLP pipeline

        Pipeline.annotate(document);

        // Initialize maps to store extracted information

        Map persons = new HashMap<>();

        Map organizations = new HashMap<>();

        Map dates = new HashMap<>();

        // Iterate through sentences in the document

        List sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

        For (CoreMap sentence : sentences) {

            // Iterate through tokens in the sentence

            For (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

                String namedEntity = token.get(CoreAnnotations.NamedEntityTagAnnotation.class);

                String word = token.get(CoreAnnotations.TextAnnotation.class);

                If (!namedEntity.equals(“O”)) {  // O indicates non-entity

                    If (namedEntity.equals(“PERSON”)) {

                        Persons.put(word, persons.getOrDefault(word, 0) + 1);

                    } else if (namedEntity.equals(“ORGANIZATION”)) {

                        Organizations.put(word, organizations.getOrDefault(word, 0) + 1);

                    } else if (namedEntity.equals(“DATE”)) {

                        Dates.put(word, dates.getOrDefault(word, 0) + 1);

                    }

                }

            }

        }

// Print extracted information

        System.out.println(“Persons:”);

        For (Map.Entry entry : persons.entrySet()) {

            System.out.println(entry.getKey() + “: “ + entry.getValue());

        }

        System.out.println(“

Organizations:”);

        For (Map.Entry entry : organizations.entrySet()) {

            System.out.println(entry.getKey() + “: “ + entry.getValue());

        }

        System.out.println(“

Dates:”);

        For (Map.Entry entry : dates.entrySet()) {

            System.out.println(entry.getKey() + “: “ + entry.getValue());

        }

    }

}

What is NER? How is related to information extraction?

Your Answer