Category Archives: Examples

Splitting the file in Map-Reduce

Imagine that you have a file “data.csv” that lies on Hadoop and you need to split it into a number of smaller files with the different data to process them separately. To do it with Pig or Hive you should specify the file schema to describe it as a table, which might be not the thing you need (for instance, if different rows have different schema). Here’s an example of how it can be done with a MapReduce job utilizing MultipleOutputs. Continue reading