Apache Spark is an open-source Big Data processing framework for large volumes of data from multiple sources. Spark is used in distributed computing for processing machine learning applications, data analytics, and graph-parallel processing on single-node machines or clusters.
This blog post will show you how to install Apache Spark on Windows 10 and test the installation.
Step 1: Install Java 8
1.1 Download Java https://java.com/en/download/.
1.2 Install Java
1.3 Configure Environment variable JAVA_HOME and for Java JDK directory (example, C:\Program Files\Java\<jdk_version>).
1.4 Check Java Version using Command Prompt.
java -version
Step 2: Install Python
2.1 Download Python 3.11 from https://www.python.org/
2.2 Install Python 3.11
python --version
Step 3: Configure Hadoop
3.1 Download the winutils.exe file https://github.com/cdarlint/winutils
3.2 Create folder C:\Hadoop\bin
3.3 Copy the winutils.exe file to C:\Hadoop\bin
3.4 Configure Environment variable HADOOP_HOME for directory C:\Hadoop
3.5 Configure path %HADOOP_HOME%\bin
Step 4: Install Spark
4.1 Download https://spark.apache.org/downloads.html
4.2 Create a new folder named Spark
4.3 Extract Spark zip to C:\Spark
4.4 Configure Environment variable SPARK_HOME and for Apache Python directory (example, C:\Spark\spark-3.5.0-bin-hadoop3).
4.5 Configure path SPARK_HOME%\bin
Step 5: Launch Spark with Command Prompt
5.1 Open Command Prompt
C:\Spark\spark-3.5.0-bin-hadoop3\bin\spark-shell
5.2 Browse http://localhost:4040/.
You should see an Apache Spark shell Web UI.
Creating DataFrames and Datasets in Apache Spark
https://avishkarm.blogspot.com/2024/03/creating-dataframes-and-datasets-in.html
No comments:
Post a Comment