Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
This might not be a right question to ask here but Hadoop forum did not help me either.
We need to filter records from Hadoop (source) table before loading it to a QVD since volume is too big to load as it is from source.
In the existing script they have used Rank() and it was working because it was fetching data from oracle. But when I tried to replicate Rank() in Hadoop it keeps throwing generic error. I do not have any experience in Hadoop and do not have client to execute SQL and check. If anybody tried hadoop analytic function before please help me out. It would be great if you share any workaround.
Sample data:
ServiceID Date Time
A 12/14/2014 2:13 PM
A 12/14/2014 3:15 PM
B 12/15/2014 1:15 PM
B 12/15/2014 2:20 PM
I need to bring only 2 records which are highlighted above (recent records).
Thanks,
Nirmal
Hi,
Sorry for late response.
Admin team suggested row_number() function instead of rank() and it worked.
Thanks
Have you tried using pig script like this:
register datafu-1.1.0.jar;
define Enumerate datafu.pig.bags.Enumerate('1');
data = load 'data' using PigStorage(',') as (id:chararray, rating:int);
data = group data by id;
data = foreach data {
sorted = order data by rating DESC;
generate group, sorted;
}
data = foreach data generate FLATTEN(Enumerate(sorted));
data = foreach data generate $2, $0, $1;
dump data;
Hi,
Thanks for your response.
As I said before, do not have any experience in Hadoop scripting and do not have any client tool in my dev environment to try as well. It is on Hive so please suggest if I can query something in Qlikview and would like to check if Hadoop version in my environment supports analytic function as well.
in qlikview you can use Rank() directly....
What you can use is you take the number for how many rows you need ranks and store it in a variable.
and use that variable on UI
Rank(Sum(Field))<=vVar
Hi,
Sorry for late response.
Admin team suggested row_number() function instead of rank() and it worked.
Thanks