Mapreduce: Analyse customer feedback stored in text file about mobile phone and separate out positive & negative feedback in separate files

Mapreduce, a data processing framework(engine), can be used to analyse various kinds of data (logs, feedbacks, sales details, etc) sources. In previous post, we analyses time-temperature statistics and generates report with max/min temperature for various cities. In this post we will analyse customer feedback/review comments, for various mobile phones, stored in text file and conclude that which mobile can be be good buy.
Note:- Data used for sample program is fictitious, ONLY for educational purpose and it does not convey any message regarding good or bad of product. 
Problem statement:- Analyse text file storing customer feedback about various mobile phone from various vendor using mapreduce and separate out positive & negative comments in separate file corresponding to each mobile phone with price.And corresponding to each mobile set display total number of comments too.Download sample input file.

Input schema
:- <Mobile_set_detail><TAB><Price><TAB><Vendor><TAB><Comment>
Example:-  Lenovo A6000 Plus Rs. 7,499.00 Flipkart Satisfied with  phone.

Expected output:-
In positiveFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Apple Iphone 4s - 16 Gb:Rs. 12,243.00 Comments(2) Amazingly smooth and has a much better battery life. || good for style and long term uses. || 
 In negativeFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Lenovo VIBE P1m (Black, 16 GB):Rs. 7,999 Comments(3)  Poor service so do not buy. ||Poor service so do not buy. ||Do not prefer and not reccomend ||

As part of solving this use case we will learn about -
1. Life cycle of mapper and reduce class- setup - > map()/reduce() -> cleanup()
For each mapper/reducer task order of execution of these three method is same. setup() method provides an opportunity to alter/setup or modify input or supporting data for mapper or reducer class.
In cleanup method resources can be released.
2. MultipleOutputs - more than one reduce file can be generated using MultipleOutputs

Sample code for mapper, reducer and driver class

Mapper class :- In below mapper class, input file is read and map method is executed for each line. Parse the input line and write in context. Both key and value is of type Text.
/*
* Mapper executes setup for each task in sequence : setup - > map -> cleanup
*/
class ReviewMapperClass extends Mapper<Object, Text, Text, Text> {
@Override
protected void map(Object key, Text value, Context context) {
 try {
  String inputLine = value.toString();
  String feedback = inputLine.split("\\t")[3];
  String productId = inputLine.split("\\t")[0];
  String price = inputLine.split("\\t")[1];
  String mapperKey = productId + ":" + price;
  context.write(new Text(mapperKey), new Text(feedback));
 } catch (IOException e) {
  e.printStackTrace();
 } catch (InterruptedException e) {
  e.printStackTrace();
 }
}
}

Reducer class:- In reducer class, setup() method creates positive and negative words list(based  on these words in comments positive ande negative comments is separated out).
In reduce method, list is iterated and positive/negative feedback pattern is matched against POS_QUALIFY_PATTERN which is created using wordList.get(0) which gives positive comment words and similarly, NEG_QUALIFY_PATTERN is created using wordList.get(0) which gives negative comment words.If match is found corresponding comment string(sbfPos/sbNeg) is updated with count.
Once for loop is terminated, both file(positiveReview and negativeReview) is updated with comments count and comment string.
/*
* Reducer executes on mapper output in sequence : setup - > map -> cleanup we
* have not overridden setup and cleanup.
*/
class ReviewReducerClass extends Reducer<Text, Text, Text, Text> {
MultipleOutputs<Text, Text> multiOutput;
List<String> wordList = new LinkedList<String>();

@Override
protected void setup(Context context) {
 multiOutput = new MultipleOutputs<Text, Text>(context);
 Configuration conf = context.getConfiguration();
 wordList.add(conf.get("positiveWords"));
 wordList.add(conf.get("negativeWords"));
}

@Override
public void reduce(Text key, Iterable<Text> feedbackList, Context con) {
 Matcher matcherQualifyPositive;
 Matcher matcherQualifyNegative;
 final String POS_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(0)
   + ")(.*)";
 final String NEG_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(1)
   + ")(.*)";
 Pattern posQualifyPattern = Pattern.compile(POS_QUALIFY_PATTERN,
   Pattern.CASE_INSENSITIVE);
 Pattern negQualifyPattern = Pattern.compile(NEG_QUALIFY_PATTERN,
   Pattern.CASE_INSENSITIVE);

 int countPos = 0;
 int countNeg = 0;
 try {
  StringBuffer sbfPos = new StringBuffer("");
  StringBuffer sbfNeg = new StringBuffer("");
  for (Text strVal : feedbackList) {
   matcherQualifyPositive = posQualifyPattern.matcher(strVal
     .toString());
   matcherQualifyNegative = negQualifyPattern.matcher(strVal
     .toString());
   if (matcherQualifyPositive.find()) {
    if (!matcherQualifyNegative.find()) {
     sbfPos.append(strVal).append(" || ");
     countPos++;
    }
   } else if (matcherQualifyNegative.find()) {
    sbfNeg.append(strVal).append("||");
    countNeg++;
   }
  }
  /* Write on both positive and negative feedback file */
  if (countPos != 0 && !sbfPos.equals("")) {
   multiOutput.write(PositiveAndNegativeReview.positiveReview,
   new Text(key.toString() + " Comments("+ countPos + ")"),
     new Text(sbfPos.toString()));
  }
  if (countNeg != 0 && !sbfNeg.equals("")) {
   multiOutput.write(PositiveAndNegativeReview.negativeReview,
   new Text(key.toString() + " Comments("+ countNeg + ")"),
     new Text(sbfNeg.toString()));
  }
  System.out.println(sbfNeg.toString());
  System.out.println(sbfPos.toString());
 } catch (IOException e) {
  e.printStackTrace();
 } catch (InterruptedException e) {
  e.printStackTrace();
 }
}

@Override
protected void cleanup(Context context) {
 wordList = null;
 multiOutput = null;
}
}

Driver class
:-
public class PositiveAndNegativeReview {
public static String positiveReview = "positiveReview";
public static String negativeReview = "negativeReview";

/**
 * Uses of setUp and cleanup in Mapper and Reducer - 
 */
public static void main(String[] args) {
 final String POSITIVE_WORD = "good |satisfied |classic|class|happy |thanks |
  recommend |good to go|best |rocking |yo |fancy |stylish |must buy |
  amazing |smooth |awesome |damn good ";
 final String NEGATIVE_WORD = "not good |Do not |donot |poor |
  not satisfied |very poor|not happy |worst |
  not recommend |do noy buy|not-satisfied|waste |bad |
  false |not stylish |should not buy |not amazing |
  not smooth |wasted |damn bad ";

 Configuration conf = new Configuration();
 conf.set("positiveWords", POSITIVE_WORD);
 conf.set("negativeWords", NEGATIVE_WORD);
 try {
  Job job = Job.getInstance(conf, "Filer file with good feedback!!");
  job.setMapperClass(ReviewMapperClass.class);
  job.setReducerClass(ReviewReducerClass.class);
  job.setJarByClass(ReviewFilterForBestBuy.class);
  /*
   * Set below four property carefully otherwise job fails silently
   * after first context.write
   */
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(Text.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);

  /* Optional, it's good to set */
  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextOutputFormat.class);

  /* Multiple output setting */
  MultipleOutputs.addNamedOutput(job, negativeReview,
    TextOutputFormat.class, Text.class, Text.class);
  MultipleOutputs.addNamedOutput(job, positiveReview,
    TextOutputFormat.class, Text.class, Text.class);

  Path pathInput = new Path(
  "hdfs://localhost:54310/user/hduser1/feedbackPosNeg.txt");
  Path pathOutputDir = new Path(
  "hdfs://localhost:54310/user/hduser1/testfs/output_dir_feedback");
  FileInputFormat.setInputPaths(job, pathInput);
  FileOutputFormat.setOutputPath(job, pathOutputDir);
  System.exit(job.waitForCompletion(true) ? 1 : 0);
 } catch (IOException e) {
  e.printStackTrace();
 } catch (ClassNotFoundException e) {
  e.printStackTrace();
 } catch (InterruptedException e) {
  e.printStackTrace();
 }
}
}
Start hadoop services(./start-all.sh from sbin directory) and execute driver program. verify output directory - it should two files(negativeReview-r-00000 and positiveReview-r-00000).Download sample output file.
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat /user/hduser1/testfs/output_dir_feedback/positiveReview-r-00000
Apple Iphone 4s - 16 Gb - Black:Rs. 12,617.00 Comments(2) Yo like it.  || Amazingly smooth and has a much better battery life. || 
Apple iPhone 5s 40 16GB 41:Rs. 38,269.00 Comments(1) Good phone.  || 
Lenovo A2010 (Black, 8 GB):Rs. 4,990 Comments(4) Very stylish and fancy.  || Very stylish and fancy.  || Good phone.  || Very good in low end.  || 

25 Comments

  1. Mapreduce usecase discussion is very nice and understandable.
    However, it algorithm to match positive and negative feedback can be improved.

    ReplyDelete
  2. Love To Enjoy A Sexual Encounter With A Udaipur Escorts Sarakaur
    You are looking to spend quality time in Udaipur? Udaipur Escorts will make your life more colorful and help you enjoy the rest of your day. They will take away all your office stress and make your night memorable and joyful.
    Udaipur Escorts
    #UdaipurEscorts #UdaipurCallgirls #UdaipurescortsServices #EscortsinUdaipur

    ReplyDelete
  3. Very nice blog, Thanks for sharing a great article putting it all together for us. best boat airdopes under 2000

    ReplyDelete
  4. Very nice blog, Thanks for sharing great article putting it all together for us.
    Amazon Upcoming Sale

    ReplyDelete
  5. I have just found this website while searching over the internet, you have posted valuable information which i like reading.
    Best Massage Chair in India

    ReplyDelete
  6. Excellent Blog! I would like to thank you for the efforts you have made in writing this post. Best baby diapers in India

    ReplyDelete
  7. Looking forward to reading more. Great article post. Fantastic. Thanks so much for the blog. Much obliged.

    오피

    ReplyDelete
  8. You actually make it seem so easy with your presentation but I find this matter to be really something which I think I would never understand. It seems too complex and extremely broad for me. I am looking forward for your next post, I’ll try to get the hang of it!

    마사지

    ReplyDelete
  9. Thanks for ones marvelous posting! I truly enjoyed reading it, you will be a great author. I will make sure to bookmark your blog and will often come back sometime soon. I want to encourage yourself to continue your great writing, have a nice holiday weekend!

    건전마사지

    ReplyDelete
  10. I've been searching for hours on this topic and finally found your post. casino online, I have read your post and I am very impressed. We prefer your opinion and will visit this site frequently to refer to your opinion. When would you like to visit my site?

    ReplyDelete
  11. Buying reverse mobile phone call tracker? Learn what reverse mobile call tracker you should utilize and the way it work! kidstracker.io/call-tracking.html

    ReplyDelete
  12. As I have read this blog by the Hot Delhi Girls it is quite great to think this as they have written a great point to discover.

    ReplyDelete
  13. I am so grateful for your blog post. Really looking forward to reading more. Top 7 Ladies Beauty Parlour in DelhiReally Great.

    ReplyDelete
  14. Excellent post! I appreciate how well you addressed this subject. Both the writing and the thoughts were excellent. We appreciate you sharing your knowledge. Anticipating more from you to read. Keep up the excellent work! E-commerce Web Development Services Company India

    ReplyDelete
  15. Great post! I really enjoyed your insights. Your explanation made it much easier to understand.
    command logistics services

    ReplyDelete
  16. Very nice blog, Thanks for sharing great article putting it all together for us.
    Mega SVG Bundle
    butterfly images for cricut

    ReplyDelete
  17. Fantastic overview of how MapReduce can be applied to analyze customer feedback! Your explanation of the Map and Reduce phases, along with the practical example, really clarifies how this approach can handle large datasets effectively. The insights on extracting actionable information from feedback are especially useful for improving customer experience. I’d be interested in learning more about any challenges you’ve encountered with MapReduce in real-world applications or how it compares with other data processing frameworks. Thanks for the informative post!

    ReplyDelete
  18. Great insights on using MapReduce for analyzing customer feedback! The way you've outlined the process makes it clear how effective this approach can be for handling large datasets. It's fascinating to see how leveraging these technologies can transform raw feedback into actionable insights. I’m particularly interested in how you handle sentiment analysis within this framework. Thanks for sharing!

    ReplyDelete
  19. Thank you for such good writing! Your insights were spot on and exactly what I was looking for. I will definitely follow your blog for more content. SEO India

    ReplyDelete
  20. Great post! I really enjoyed your insights. Your explanation made it much easier to understand.Logistics Bpo

    ReplyDelete
  21. This comment has been removed by the author.

    ReplyDelete
Previous Post Next Post