Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Rank() in Hadoop

Hi,

This might not be a right question to ask here but Hadoop forum did not help me either.

We need to filter records from Hadoop (source) table before loading it to a QVD since volume is too big to load as it is from source.

In the existing script they have used Rank() and it was working because it was fetching data from oracle. But when I tried to replicate Rank() in Hadoop it keeps throwing generic error. I do not have any experience in Hadoop and do not have client to execute SQL and check. If anybody tried hadoop analytic function before please help me out. It would be great if you share any workaround.

Sample data:

ServiceID      Date          Time

A                12/14/2014   2:13 PM

A                12/14/2014   3:15 PM

B                12/15/2014   1:15 PM

B                12/15/2014    2:20 PM  

I need to bring only 2 records which are highlighted above (recent records).

Thanks,

Nirmal

1 Solution

Accepted Solutions
Not applicable
Author

Hi,

Sorry for late response.

Admin team suggested row_number() function instead of rank() and it worked.

Thanks

View solution in original post

4 Replies
saumyashah90
Specialist
Specialist

Have you tried using pig script like this:

register datafu-1.1.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');

data = load 'data' using PigStorage(',') as (id:chararray, rating:int);
data = group data by id;
data = foreach data {
  sorted = order data by rating DESC;
  generate group, sorted;
}
data = foreach data generate FLATTEN(Enumerate(sorted));
data = foreach data generate $2, $0, $1;
dump data;

Not applicable
Author

Hi,

Thanks for your response.

As I said before, do not have any experience in Hadoop scripting and do not have any client tool in my dev environment to try as well. It is on Hive so please suggest if I can query something in Qlikview and would like to check if Hadoop version in my environment supports analytic function as well.

saumyashah90
Specialist
Specialist

in qlikview you can use Rank() directly....

What you can use is you take the number for how many rows you need ranks and store it in a variable.

and use that variable on UI

Rank(Sum(Field))<=vVar

Not applicable
Author

Hi,

Sorry for late response.

Admin team suggested row_number() function instead of rank() and it worked.

Thanks