October 18, 2018

HDFS Cheat Sheet

Commonly used & most useful  HDFS Commands



1. Create a directory in HDFS at given path(s).

# cd $HADOOP_HOME

# hdfs  dfs -mkdir /usr/local/hadoopdata



2. Upload and download a file in HDFS

   Upload:

    hdfs  dfs -put  /usr/local/localdata/names.csv/  /usr/local/hadoopdata/names.csv

 
Download:

   dfs -get     /usr/local/hadoopdata/names.csv  /usr/local/localdata/hnames.csv



3.  List the contents of a directory.

 hdfs  dfs -ls     /usr/local/hadoopdata



4. Copy a file from/To Local file system to HDFS

Works similarly to the get and put  commands, except that the destination is restricted to a local file reference.

hdfs  dfs -copyFromLocal  /usr/local/localdata/names.csv  /usr/local/hadoopdata/hnames.csv

hdfs  dfs -copyToLocal    /usr/local/hadoopdata/hnames.csv  /usr/local/localdata/fromhdnames2.csv

5. See contents of a file



 hdfs  dfs -cat    /usr/local/hadoopdata/hnames.csv/names.csv

6. Display last few lines of a file.

hdfs  dfs -tail    /usr/local/hadoopdata/hnames.csv/names.csv

7. Display the aggregate length of a file.

hdfs  dfs -du    /usr/local/hadoopdata/hnames.csv/names.csv


8. Run some of the examples provided:

hadoop jar hadoop-mapreduce-examples-2.7.6.jar wordcount  /usr/local/hdfsdata/names/names.csv  /usr/local/hdfsdata/names/namescount.csv

9. Examine the output files

 hdfs dfs -ls /usr/local/hdfsdata/names/namescount.csv

 hdfs dfs -cat /usr/local/hdfsdata/names/namescount.csv/part-r-00000

10. Example: Top 5 words in a text file

   10.1 Copy local file to Hadoop
    hdfs dfs -copyFromLocal  /usr/local/localdata/data.txt  /usr/local/hdfsdata/names/data.txt

   10.2 Run wordcount to produce result
   hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount  /usr/local/hdfsdata/names/data.txt  /usr/local/hdfsdata/names/datacount

  10.3 Use sort and head for top 5
 hdfs dfs -cat  /usr/local/hdfsdata/names/datacount/part-r-00000 | sort  -k 2 -r | head -5

  10.4 Create file

    hdfs dfs -cat  /usr/local/hdfsdata/names/datacount/part* | sort  -k 2 -r | head -3 > /usr/local/localdata/top3.txt

No comments:

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...