Reading\Writing Different file format in HDFS by using pyspark

Issue – How to read\write different file format in HDFS by using pyspark

File Format Action Procedure example without compression
text File Read sc.textFile() orders = sc.textFile(“/user/BDD/navnit/data-master/retail_db/orders”)
Write rdd.saveAsTextFile() orders.saveAsTextFile(“/user/BDD/navnit/saveTextFile/orders”)
sequence File Read sc.sequenceFile(ordersSF = sc.sequenceFile(‘/user/BDD/navnit/saveSequenceFile/orders’)
Write PipelinedRDD.saveAsSequenceFile() ordersKV.saveAsSequenceFile(‘/user/BDD/navnit/saveSequenceFile/orders’)
Avro file Read sqlContext.read.format(“com.databricks.spark.avro”).load() orders = sqlContext.read.format(“com.databricks.spark.avro”).load(“/home/BDD/navnit/orders/”)
Write dataFram.write.format(“com.databricks.spark.avro”).save() orders.write.format(“com.databricks.spark.avro”).save(“/user/BDD/navnit/saveAvroFile/orders”)
Parquet File Read sqlContext.read.parquet() ordersParquet = sqlContext.read.parquet(‘/user/BDD/navnit/saveparquetFile/orders’)
Write dataFram.write.parquet() orders.write.parquet(“/user/BDD/navnit/saveparquetFile/orders”)
orc File Read sqlContext.read.orc() ordersOrc = sqlContext.read.orc(“/user/BDD/navnit/saveorcFile/orders”)
Write dataFrame.write.orc() orders.write.orc(“/user/BDD/navnit/saveorcFile/orders”)
JSON file Read sqlContext.read.json() orderaJSON = sqlContext.read.json(“/user/BDD/navnit/saveJSONFile/orders”)
Write dataFrame.write.json() orders.write.json(“/user/BDD/navnit/saveJSONFile/orders”)

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s