A user wants to perform tf-idf on a very large dataset and want to make a one column in csv format that will contain each term with its tfidf, in non-decreasing. How to do that?

2.3K    Asked by shivangiMehta in Data Science , Asked on Dec 26, 2019
Answered by shivangi Mehta

The above code works only in small size but crashes in large document.

For solving this problem, we should not coerce the TDM to a matrix. That will most likely cause an integer overflow issue with so many documents. The tm package uses the slam package to represent the tdm/dtm's. It has some functions for doing row- or column-wise operations without having to coerce to dense matrix.

The following code should work to fix the problem



Your Answer

Interviews

Parent Categories