A user used the MySentences class for extracting sentences from all files in a directory and used these sentences for training a word2vec model?

792    Asked by SumikoLacoste in Data Science , Asked on Dec 23, 2019
Answered by Sumiko Lacoste

My dataset is unlabeled. Below is the code

class MySentences(object):

    def __init__(self, dirname):

        self.dirname = dirname

    def __iter__(self):

        for fname in os.listdir(self.dirname):

            for line in open(os.path.join(self.dirname, fname)):

                yield line.split()

sentences = MySentences('wos_abstracts') # a memory-friendly iterator

model = gensim.models.Word2Vec(sentences)

But he gets the following error


This problem can be solved by a new function TaggedLineDocument which is updated in the library, added to transform sentences to vectors.


Now we can train the model





Your Answer

Interviews

Parent Categories