March 28, 2024

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession


spark = SparkSession.builder.appName("CSV Example").getOrCreate()


sc = spark.sparkContext


Spark SQL provides spark.read().csv("file_name") to read a file or directory of files in CSV format into Spark DataFrame, and dataframe.write().csv("path") to write to a CSV file. Function option() can be used to customize the behavior of reading or writing, such as controlling behavior of the header, delimiter character, character set, and so on.


# A CSV dataset is pointed to by path.

# The path can be either a single CSV file or a directory of CSV files

path = "D:/spark/data/csv/sales.csv"


df = spark.read.csv(path)

df.show()


# Read a csv with delimiter and a header

df_header = spark.read.option("delimiter", ",").option("header", True).csv(path)

df_header.show()

Creating DataFrames in Apache Spark



No comments:

Secure a Microsoft Fabric data warehouse

  Data warehouse in Microsoft Fabric is a comprehensive platform for data and analytics, featuring advanced query processing and full transa...