ApacheCon NA 2010 Session

How to make your map-reduce jobs perform as well as pig: Lessons from pig optimizations

Pig makes hadoop easy with its high level data flow language. A lot of features and optimizations have been added to pig based on the challenges faced by pig users in getting the best out of their hadoop cluster. Map-reduce programmers are likely to face the same challenges. In this presentation we will discuss how pig has dealt with some of them, such as the ability to balance skew in joins or share scans across multiple grouping operations.