How does createOrReplaceTempView work in Spark?

4.6K    Asked by Jiten Testing in Spark , Asked on May 10, 2021

I am new to Spark and Spark SQL.

How does createOrReplaceTempView work in Spark?

If we register an RDD of objects as a table will spark keep all the data in memory?

Answered by jay Trivedi

Often we might want to store the spark Data frame as the table and query it, to convert Data frame into temporary view that is available for only that spark session, we use registerTempTable or createorreplacetempview (Spark > = 2.0) on our spark Dataframe.

createorreplacetempview is used when you desire to store the table for a specific spark session.

createorreplacetempview creates (or replaces if that view name already exists) a lazily evaluated "view" that you can then use like a hive table in Spark SQL. It does not persist to memory unless you cache the dataset that underpins the view.

scala> val s = Seq(1,2,3,4).toDF("num")
s: org.apache.spark.sql.DataFrame = [num: int]
scala> s.createOrReplaceTempView("nums")
scala> s.createOrReplaceTempView("nums")
scala> spark.table("nums")
res6: org.apache.spark.sql.DataFrame = [num: int]
scala> spark.table("nums").cache
res7: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [num: int]
scala> spark.table("nums").count
res8: Long = 4

The data is cached fully only after the .count call.

However , the relevant quote (compared to persistent table): "Unlike the createOrReplaceTempView command, saveAsTable will materialize the contents of the DataFrame and create a pointer to the data in the Hive metastore."





Your Answer

Interviews

Parent Categories