Explain how to perform a sentence segmentation in Python using Spacy.

2.2K    Asked by GabrielleDowd in Data Science , Asked on Dec 18, 2019
Answered by Gabrielle Dowd

Sentence segmentation is the process of deciding where the sentences start or end in NLP. It is also known as sentence breaking or sentence boundary detection and is implemented in Python in the following way.

Initially we need to import Spacy


Let us insert a sentence to perform segmentation


Here we can see that the three sentences inserted are separated when doc.sents is used.

It is important to note that doc.cents is a generator. That is, a Doc is not segmented until doc.cents is called. This means that, where you could print the second Doc token with print (doc [1]), you can't call the "second Doc sentence" with print(doc.sents[1]):


However, you can build a sentence collection by running doc.scents and saving the result to a list:




Your Answer

Interviews

Parent Categories