from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("CSV Example").getOrCreate()
sc = spark.sparkContext
Spark SQL provides spark.read().csv("file_name") to read a file or directory of files in CSV format into Spark DataFrame, and dataframe.write().csv("path") to write to a CSV file. Function option() can be used to customize the behavior of reading or writing, such as controlling behavior of the header, delimiter character, character set, and so on.
# A CSV dataset is pointed to by path.
# The path can be either a single CSV file or a directory of CSV files
path = "D:/spark/data/csv/sales.csv"
df = spark.read.csv(path)
df.show()
# Read a csv with delimiter and a header
df_header = spark.read.option("delimiter", ",").option("header", True).csv(path)
df_header.show()
Creating DataFrames in Apache Spark
No comments:
Post a Comment