Data Engineering with Avishkar: How to Install Apache Spark on Microsoft Windows 10

March 07, 2024

How to Install Apache Spark on Microsoft Windows 10

Apache Spark is an open-source Big Data processing framework for large volumes of data from multiple sources. Spark is used in distributed computing for processing machine learning applications, data analytics, and graph-parallel processing on single-node machines or clusters.

This blog post will show you how to install Apache Spark on Windows 10 and test the installation.

Step 1: Install Java 8

1.1 Download Java https://java.com/en/download/.

1.2 Install Java

1.3 Configure Environment variable JAVA_HOME and for Java JDK directory (example, C:\Program Files\Java\<jdk_version>).

1.4 Check Java Version using Command Prompt.

java -version

Step 2: Install Python

2.1 Download Python 3.11 from https://www.python.org/

2.2 Install Python 3.11

python --version

Step 3: Configure Hadoop

3.1 Download the winutils.exe file https://github.com/cdarlint/winutils

3.2 Create folder C:\Hadoop\bin

3.3 Copy the winutils.exe file to C:\Hadoop\bin

3.4 Configure Environment variable HADOOP_HOME for directory C:\Hadoop

3.5 Configure path %HADOOP_HOME%\bin

Step 4: Install Spark

4.1 Download https://spark.apache.org/downloads.html

4.2 Create a new folder named Spark

4.3 Extract Spark zip to C:\Spark

4.4 Configure Environment variable SPARK_HOME and for Apache Python directory (example, C:\Spark\spark-3.5.0-bin-hadoop3).

4.5 Configure path SPARK_HOME%\bin

Step 5: Launch Spark with Command Prompt

5.1 Open Command Prompt

C:\Spark\spark-3.5.0-bin-hadoop3\bin\spark-shell

5.2 Browse http://localhost:4040/.

You should see an Apache Spark shell Web UI.

Creating DataFrames and Datasets in Apache Spark

https://avishkarm.blogspot.com/2024/03/creating-dataframes-and-datasets-in.html

No comments:

Subscribe to: Post Comments (Atom)