What is tokenization and how does it work?

758    Asked by OkamotoBando in Data Science , Asked on Jan 2, 2020
Answered by Okamoto Bando

Tokenization is the process of breaking up the original text into component pieces which are known as tokens. Tokens have a variety of useful attributes and methods. One of the important things to know that they are pieces of the original text and they are not converted to the base words which in case happens with lemmatization and stemming.

The flowchart below shows how tokenization works.

Your Answer


Parent Categories