In previous post we successfully installed Apache Hadoop 2.6.1 on Ubuntu 13.04. The main agenda of this post is to run famous mapreduce word count sample program in our single node hadoop cluster set-up. Running word count problem is equivalent to "Hello world" program of MapReduce world. Before executing word count mapreduce sample program, we need to download input files and upload it to hadoop file system.
1. http://www.gutenberg.org/cache/epub/20417/pg20417.txt
2. http://www.gutenberg.org/files/5000/5000-8.txt
Start hadoop services :- First start the Hadoop cluster using following command
Note:- If you do not start services and try to upload files to hdfs you will get error some thing like
Verify that you have copied all three files in hdfs, execute following command and you should see all three files.
Do you want to also verify ,why hdfs directory is special and different from local file system ? Try to execute following command and you will not able to access that directory,remember it is visible to Hadoop and managed by it.
If you get output something similar to above, you are on right track and output of this mapreduce program is stored in "/user/hduser1/wordcountOuput". We will now see the output processed by hadoop.
First verify output directory and see what are the files it contains. Execute followoing command for the same.
Now, execute following command to see processed output in terminal(Ouput shown below is just partial one, you have to scroll and see complete output):
Using "getmerge" command we can download mapreduce output to local file system. Use following command to merge output files present in hdfs output folder.
Before executing this command create a new directory or specify local file system directory accordingly (here "/tmp/wordCountLocal" is my local output directory)
Download input data from following URL:-
Download each text files from following URL and store the files in a some directory, For me it is downloaded in /home/zytham/Downloads/hadoop_data1. http://www.gutenberg.org/cache/epub/20417/pg20417.txt
2. http://www.gutenberg.org/files/5000/5000-8.txt
Upload input file to HDFS :-
Switch to hduser1, if you are not in that context, remember while doing hadoop 2.6.1 installation in Ubuntu 13.04, we created hduser1 and set-up hadoop in context of hduser1.Start hadoop services :- First start the Hadoop cluster using following command
hduser1@ubuntu:~$ cd /usr/local/hadoop2.6.1/sbin
hduser1@ubuntu:/usr/local/hadoop2.6.1/sbin$ ./start-all.sh
Call From ubuntu/127.0.1.1 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused;
Copy local file to HDFS:- Copy downloaded files from /home/zytham/Downloads/hadoop_data to hadoop filesystem (a file system managed by hadoop).Execute following command to create a hdfs directory and copy files from local file system to newly created hdfs directory.hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hdfs dfs -mkdir -p /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -copyFromLocal /home/zytham/Downloads/hadoop_data /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -ls /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1$ cd /user/hduser1/hdfsdata/hadoop_data
bash: cd: /user/hduser1/hdfsdata/hadoop_data: No such file or directory
Run map-reduce Hadoop word count example:-
For convenience I have created a Wordcount sample program jar, download word count sample program jar and save it in some directory of your convenience. I have placed in hadoop installation directory "/home/zytham/hadoop_poc/WordcountSample.jar". Now execute the word-count jar file in single node hadoop pseudo cluster with following command../hadoop jar <word_count_sample_jar> <classNameOfSampleJar> <Input_files_location> <Output_directory_location>
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop jar /home/zytham/hadoop_poc/WordcountSample.jar WordCountExample /user/hduser1/hdfsdata/hadoop_data /user/hduser1/wordcountOuput
15/10/04 15:29:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/04 15:29:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/10/04 15:29:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/10/04 15:29:37 INFO input.FileInputFormat: Total input paths to process : 3
..........................
..........................
15/10/04 15:29:43 INFO mapred.LocalJobRunner: reduce task executor complete.
15/10/04 15:29:43 INFO mapreduce.Job: map 100% reduce 100%
15/10/04 15:29:43 INFO mapreduce.Job: Job job_local884144492_0001 completed successfully
15/10/04 15:29:44 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=4011472
FILE: Number of bytes written=8420485
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11928267
HDFS: Number of bytes written=883509
HDFS: Number of read operations=37
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=78578
Map output records=629920
Map output bytes=6083556
Map output materialized bytes=1462980
Input split bytes=397
Combine input records=629920
Combine output records=101397
Reduce input groups=82616
Reduce shuffle bytes=1462980
Reduce input records=101397
Reduce output records=82616
Spilled Records=202794
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=180
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=807419904
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3676562
File Output Format Counters
Bytes Written=883509
If you get output something similar to above, you are on right track and output of this mapreduce program is stored in "/user/hduser1/wordcountOuput". We will now see the output processed by hadoop.
First verify output directory and see what are the files it contains. Execute followoing command for the same.
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -ls /user/hduser1/wordcountOuput
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/10/04 15:33:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hduser1 supergroup 0 2015-10-04 15:29 /user/hduser1/wordcountOuput/_SUCCESS
-rw-r--r-- 1 hduser1 supergroup 883509 2015-10-04 15:29 /user/hduser1/wordcountOuput/part-r-00000
Now, execute following command to see processed output in terminal(Ouput shown below is just partial one, you have to scroll and see complete output):
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -cat /user/hduser1/wordcountOuput/part-r-00000
........
.......
worst 10
worst. 1
worsted 2
worsted! 1
worsting 1
worth 36
worth. 5
worth._ 2
worthful 1
worthier 1
worthless. 1
worthy 21
worthy, 1
æsthetic 1
è 3
état_. 1
� 5
�: 1
�crit_ 1
�pieza; 1
Using "getmerge" command we can download mapreduce output to local file system. Use following command to merge output files present in hdfs output folder.
hduser@ubuntu:/usr/local/hadoop/bin$ ./hadoop dfs -getmerge /user/hduser1/wordcountOuput /tmp/wordCountLocal
sir kindly tell me what "classnamesimplejar"
ReplyDeleteAivivu, đặt vé máy bay tham khảo
ReplyDeleteve may bay di my gia re
vé mỹ về việt nam
vé máy bay từ Toronto về việt nam
vé máy bay giá rẻ nhật việt
mua vé máy bay từ hàn quốc về việt nam
Vé máy bay từ Đài Loan về Việt Nam
danh sách khách sạn cách ly tại tphcm
vé máy bay chuyên gia nước ngoài sang Việt Nam
the registration with us. with the subject of The conditions are very simple. pgslot
ReplyDeletereal money worth it. Become a member of EDMBET 123 and start playing profitable games สล็อตxo
ReplyDelete