Mapreduce, a data processing framework(engine), can be used to analyse various kinds of data (logs, feedbacks, sales details, etc) sources. In previous post, we analyses time-temperature statistics and generates report with max/min temperature for various cities. In this post we will analyse customer feedback/review comments, for various mobile phones, stored in text file and conclude that which mobile can be be good buy.
Input schema :- <Mobile_set_detail><TAB><Price><TAB><Vendor><TAB><Comment>
Example:- Lenovo A6000 Plus Rs. 7,499.00 Flipkart Satisfied with phone.
Expected output:-
In positiveFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Apple Iphone 4s - 16 Gb:Rs. 12,243.00 Comments(2) Amazingly smooth and has a much better battery life. || good for style and long term uses. ||
In negativeFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Lenovo VIBE P1m (Black, 16 GB):Rs. 7,999 Comments(3) Poor service so do not buy. ||Poor service so do not buy. ||Do not prefer and not reccomend ||
As part of solving this use case we will learn about -
1. Life cycle of mapper and reduce class- setup - > map()/reduce() -> cleanup()
For each mapper/reducer task order of execution of these three method is same. setup() method provides an opportunity to alter/setup or modify input or supporting data for mapper or reducer class.
In cleanup method resources can be released.
2. MultipleOutputs - more than one reduce file can be generated using MultipleOutputs
Reducer class:- In reducer class, setup() method creates positive and negative words list(based on these words in comments positive ande negative comments is separated out).
In reduce method, list is iterated and positive/negative feedback pattern is matched against POS_QUALIFY_PATTERN which is created using wordList.get(0) which gives positive comment words and similarly, NEG_QUALIFY_PATTERN is created using wordList.get(0) which gives negative comment words.If match is found corresponding comment string(sbfPos/sbNeg) is updated with count.
Once for loop is terminated, both file(positiveReview and negativeReview) is updated with comments count and comment string.
Driver class :-
Note:- Data used for sample program is fictitious, ONLY for educational purpose and it does not convey any message regarding good or bad of product.
Problem statement:- Analyse text file storing customer feedback about various mobile phone from various vendor using mapreduce and separate out positive & negative comments in separate file corresponding to each mobile phone with price.And corresponding to each mobile set display total number of comments too.Download sample input file.Input schema :- <Mobile_set_detail><TAB><Price><TAB><Vendor><TAB><Comment>
Example:- Lenovo A6000 Plus Rs. 7,499.00 Flipkart Satisfied with phone.
Expected output:-
In positiveFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Apple Iphone 4s - 16 Gb:Rs. 12,243.00 Comments(2) Amazingly smooth and has a much better battery life. || good for style and long term uses. ||
In negativeFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Lenovo VIBE P1m (Black, 16 GB):Rs. 7,999 Comments(3) Poor service so do not buy. ||Poor service so do not buy. ||Do not prefer and not reccomend ||
As part of solving this use case we will learn about -
1. Life cycle of mapper and reduce class- setup - > map()/reduce() -> cleanup()
For each mapper/reducer task order of execution of these three method is same. setup() method provides an opportunity to alter/setup or modify input or supporting data for mapper or reducer class.
In cleanup method resources can be released.
2. MultipleOutputs - more than one reduce file can be generated using MultipleOutputs
Sample code for mapper, reducer and driver class
Mapper class :- In below mapper class, input file is read and map method is executed for each line. Parse the input line and write in context. Both key and value is of type Text./* * Mapper executes setup for each task in sequence : setup - > map -> cleanup */ class ReviewMapperClass extends Mapper<Object, Text, Text, Text> { @Override protected void map(Object key, Text value, Context context) { try { String inputLine = value.toString(); String feedback = inputLine.split("\\t")[3]; String productId = inputLine.split("\\t")[0]; String price = inputLine.split("\\t")[1]; String mapperKey = productId + ":" + price; context.write(new Text(mapperKey), new Text(feedback)); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } }
Reducer class:- In reducer class, setup() method creates positive and negative words list(based on these words in comments positive ande negative comments is separated out).
In reduce method, list is iterated and positive/negative feedback pattern is matched against POS_QUALIFY_PATTERN which is created using wordList.get(0) which gives positive comment words and similarly, NEG_QUALIFY_PATTERN is created using wordList.get(0) which gives negative comment words.If match is found corresponding comment string(sbfPos/sbNeg) is updated with count.
Once for loop is terminated, both file(positiveReview and negativeReview) is updated with comments count and comment string.
/* * Reducer executes on mapper output in sequence : setup - > map -> cleanup we * have not overridden setup and cleanup. */ class ReviewReducerClass extends Reducer<Text, Text, Text, Text> { MultipleOutputs<Text, Text> multiOutput; List<String> wordList = new LinkedList<String>(); @Override protected void setup(Context context) { multiOutput = new MultipleOutputs<Text, Text>(context); Configuration conf = context.getConfiguration(); wordList.add(conf.get("positiveWords")); wordList.add(conf.get("negativeWords")); } @Override public void reduce(Text key, Iterable<Text> feedbackList, Context con) { Matcher matcherQualifyPositive; Matcher matcherQualifyNegative; final String POS_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(0) + ")(.*)"; final String NEG_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(1) + ")(.*)"; Pattern posQualifyPattern = Pattern.compile(POS_QUALIFY_PATTERN, Pattern.CASE_INSENSITIVE); Pattern negQualifyPattern = Pattern.compile(NEG_QUALIFY_PATTERN, Pattern.CASE_INSENSITIVE); int countPos = 0; int countNeg = 0; try { StringBuffer sbfPos = new StringBuffer(""); StringBuffer sbfNeg = new StringBuffer(""); for (Text strVal : feedbackList) { matcherQualifyPositive = posQualifyPattern.matcher(strVal .toString()); matcherQualifyNegative = negQualifyPattern.matcher(strVal .toString()); if (matcherQualifyPositive.find()) { if (!matcherQualifyNegative.find()) { sbfPos.append(strVal).append(" || "); countPos++; } } else if (matcherQualifyNegative.find()) { sbfNeg.append(strVal).append("||"); countNeg++; } } /* Write on both positive and negative feedback file */ if (countPos != 0 && !sbfPos.equals("")) { multiOutput.write(PositiveAndNegativeReview.positiveReview, new Text(key.toString() + " Comments("+ countPos + ")"), new Text(sbfPos.toString())); } if (countNeg != 0 && !sbfNeg.equals("")) { multiOutput.write(PositiveAndNegativeReview.negativeReview, new Text(key.toString() + " Comments("+ countNeg + ")"), new Text(sbfNeg.toString())); } System.out.println(sbfNeg.toString()); System.out.println(sbfPos.toString()); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } @Override protected void cleanup(Context context) { wordList = null; multiOutput = null; } }
Driver class :-
public class PositiveAndNegativeReview { public static String positiveReview = "positiveReview"; public static String negativeReview = "negativeReview"; /** * Uses of setUp and cleanup in Mapper and Reducer - */ public static void main(String[] args) { final String POSITIVE_WORD = "good |satisfied |classic|class|happy |thanks | recommend |good to go|best |rocking |yo |fancy |stylish |must buy | amazing |smooth |awesome |damn good "; final String NEGATIVE_WORD = "not good |Do not |donot |poor | not satisfied |very poor|not happy |worst | not recommend |do noy buy|not-satisfied|waste |bad | false |not stylish |should not buy |not amazing | not smooth |wasted |damn bad "; Configuration conf = new Configuration(); conf.set("positiveWords", POSITIVE_WORD); conf.set("negativeWords", NEGATIVE_WORD); try { Job job = Job.getInstance(conf, "Filer file with good feedback!!"); job.setMapperClass(ReviewMapperClass.class); job.setReducerClass(ReviewReducerClass.class); job.setJarByClass(ReviewFilterForBestBuy.class); /* * Set below four property carefully otherwise job fails silently * after first context.write */ job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); /* Optional, it's good to set */ job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); /* Multiple output setting */ MultipleOutputs.addNamedOutput(job, negativeReview, TextOutputFormat.class, Text.class, Text.class); MultipleOutputs.addNamedOutput(job, positiveReview, TextOutputFormat.class, Text.class, Text.class); Path pathInput = new Path( "hdfs://localhost:54310/user/hduser1/feedbackPosNeg.txt"); Path pathOutputDir = new Path( "hdfs://localhost:54310/user/hduser1/testfs/output_dir_feedback"); FileInputFormat.setInputPaths(job, pathInput); FileOutputFormat.setOutputPath(job, pathOutputDir); System.exit(job.waitForCompletion(true) ? 1 : 0); } catch (IOException e) { e.printStackTrace(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } }Start hadoop services(./start-all.sh from sbin directory) and execute driver program. verify output directory - it should two files(negativeReview-r-00000 and positiveReview-r-00000).Download sample output file.
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat /user/hduser1/testfs/output_dir_feedback/positiveReview-r-00000 Apple Iphone 4s - 16 Gb - Black:Rs. 12,617.00 Comments(2) Yo like it. || Amazingly smooth and has a much better battery life. || Apple iPhone 5s 40 16GB 41:Rs. 38,269.00 Comments(1) Good phone. || Lenovo A2010 (Black, 8 GB):Rs. 4,990 Comments(4) Very stylish and fancy. || Very stylish and fancy. || Good phone. || Very good in low end. ||
Mapreduce usecase discussion is very nice and understandable.
ReplyDeleteHowever, it algorithm to match positive and negative feedback can be improved.
Đặt vé máy bay tại Aivivu, tham khảo
ReplyDeleteVé máy bay đi Mỹ
vé về việt nam từ mỹ
khi nào có chuyến bay từ nhật về việt nam
vé máy bay từ đức về việt nam
giá vé máy bay từ canada về Việt Nam
đặt vé máy bay từ hàn quốc về việt nam
giá khách sạn cách ly
Love To Enjoy A Sexual Encounter With A Udaipur Escorts Sarakaur
ReplyDeleteYou are looking to spend quality time in Udaipur? Udaipur Escorts will make your life more colorful and help you enjoy the rest of your day. They will take away all your office stress and make your night memorable and joyful.
Udaipur Escorts
#UdaipurEscorts #UdaipurCallgirls #UdaipurescortsServices #EscortsinUdaipur
Very nice blog, Thanks for sharing a great article putting it all together for us. best boat airdopes under 2000
ReplyDeleteVery nice blog, Thanks for sharing great article putting it all together for us.
ReplyDeleteAmazon Upcoming Sale
I have just found this website while searching over the internet, you have posted valuable information which i like reading.
ReplyDeleteBest Massage Chair in India
Excellent Blog! I would like to thank you for the efforts you have made in writing this post. Best baby diapers in India
ReplyDeleteLooking forward to reading more. Great article post. Fantastic. Thanks so much for the blog. Much obliged.
ReplyDelete오피
You actually make it seem so easy with your presentation but I find this matter to be really something which I think I would never understand. It seems too complex and extremely broad for me. I am looking forward for your next post, I’ll try to get the hang of it!
ReplyDelete마사지
Thanks for ones marvelous posting! I truly enjoyed reading it, you will be a great author. I will make sure to bookmark your blog and will often come back sometime soon. I want to encourage yourself to continue your great writing, have a nice holiday weekend!
ReplyDelete건전마사지
I've been searching for hours on this topic and finally found your post. casino online, I have read your post and I am very impressed. We prefer your opinion and will visit this site frequently to refer to your opinion. When would you like to visit my site?
ReplyDeleteBuying reverse mobile phone call tracker? Learn what reverse mobile call tracker you should utilize and the way it work! kidstracker.io/call-tracking.html
ReplyDeleteAs I have read this blog by the Hot Delhi Girls it is quite great to think this as they have written a great point to discover.
ReplyDeleteI am so grateful for your blog post. Really looking forward to reading more. Top 7 Ladies Beauty Parlour in DelhiReally Great.
ReplyDeleteTRULY bOOK mEW
ReplyDeleteExcellent post! I appreciate how well you addressed this subject. Both the writing and the thoughts were excellent. We appreciate you sharing your knowledge. Anticipating more from you to read. Keep up the excellent work! E-commerce Web Development Services Company India
ReplyDeleteGreat post! I really enjoyed your insights. Your explanation made it much easier to understand.
ReplyDeletecommand logistics services
Very nice blog, Thanks for sharing great article putting it all together for us.
ReplyDeleteMega SVG Bundle
butterfly images for cricut
Fantastic overview of how MapReduce can be applied to analyze customer feedback! Your explanation of the Map and Reduce phases, along with the practical example, really clarifies how this approach can handle large datasets effectively. The insights on extracting actionable information from feedback are especially useful for improving customer experience. I’d be interested in learning more about any challenges you’ve encountered with MapReduce in real-world applications or how it compares with other data processing frameworks. Thanks for the informative post!
ReplyDeleteGreat insights on using MapReduce for analyzing customer feedback! The way you've outlined the process makes it clear how effective this approach can be for handling large datasets. It's fascinating to see how leveraging these technologies can transform raw feedback into actionable insights. I’m particularly interested in how you handle sentiment analysis within this framework. Thanks for sharing!
ReplyDeleteThank you for such good writing! Your insights were spot on and exactly what I was looking for. I will definitely follow your blog for more content. SEO India
ReplyDelete