Jan 312016
 

I recently finished up some data aggregation work involving Apache’s Hive, and as a means of getting some MapReduce work off the ground quickly, it’s pretty good. Hive’s goal is to abstract away MapReduce behind basic SQL queries, and on that front it succeeds. The fact that I’m ultimately doing MapReduce jobs is hidden except for what would look like a minor quirk if I didn’t know that was what was going on under the covers. That said, there were a couple of things I noticed both during development and with running the jobs on Amazon’s EMR service that are worth noting.

Continue reading »

 Posted by at 10:05 PM