How to convert a DataFrame back to normal RDD in pyspark?

903 Asked by KatherineDwivedi in Data Science , Asked on Jul 15, 2021

I need to use the(rdd.)partitionBy(npartitions, custom_partitioner method that is not available on the DataFrame. All of the DataFrame methods refer only to DataFrame results. So then how to create an RDD from the DataFrame data?

Answered by Raquel Strauss

To convert a pyspark dataframe to rdd simply use the .rdd method:

  rdd = df.rdd

But the setback here is that it may not give the regular spark RDD, it may return a Row object. In order to have the regular RDD format run the code below:

  rdd = df.rdd.map(tuple)

  rdd = df.rdd.map(list)

How to convert a DataFrame back to normal RDD in pyspark?

Your Answer