How to convert a DataFrame back to normal RDD in pyspark?

I need to use the(rdd.)partitionBy(npartitions, custom_partitioner method that is not available on the DataFrame. All of the DataFrame methods refer only to DataFrame results. So then how to create an RDD from the DataFrame data?

Answered by Raquel Strauss

To convert a pyspark dataframe to rdd simply use the .rdd method:

  rdd = df.rdd

But the setback here is that it may not give the regular spark RDD, it may return a Row object. In order to have the regular RDD format run the code below:

  rdd = df.rdd.map(tuple)

or

  rdd = df.rdd.map(list)


Your Answer

Interviews

Parent Categories