Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Limit rows for Pig

Hello, so I have use tPigLoad and I would like to limit the amount of rows to just the first 10 rows. How can I go around doing so? 

I have tried creating auto-incremental keys but they seem to only be possible using tMap and not in tPigMap.
I have also tried tPigCode to limit the rows but can't figure out what to put in there.

Labels (2)
2 Replies
Anonymous
Not applicable
Author

hey,
ever heard of Apache Pig - Limit Operator
Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows. Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

To write data analysis programs, Pig provides a high-level language known as Pig Latin. This language provides various operators using which programmers can develop their own functions for reading, writing, and processing data.

To analyze data using Apache Pig, programmers need to write scripts using Pig Latin language. All these scripts are internally converted to Map and Reduce tasks. Apache Pig has a component known as Pig Engine that accepts the Pig Latin scripts as input and converts those scripts into MapReduce jobs.
regards
julie
Anonymous
Not applicable
Author

The retail demo dataset has information about customer orders. The data for each order contains the postal code of the customer. In this example you run Pig Latin statements to return the top ten postal codes by revenue.