Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
Is there a way that in the load script as I proceed, I can accumulate a distinct string value.
For Example :
Rownumber Subject
1 Physics
2 Maths
3 Maths
4 Computer Science
6 Maths
I am looking for one more field that will have a correct distinct count of all the subjects that have been encountered until then
This is what I am looking for.
Rownumber Subject SubjectCount
1 Physics 1
2 Maths 2
3 Maths 2
4 Computer Science 3
6 Maths 3
I tried using numsum and peek to accumelate the distinct subjects. Kindly suggest what would be the right way to do this?
Thanks,
Rajesh Vaswani
Maybe not the most elegant way, but you could actually do it like you said, performing a distinct count of Subject up to the row number, maybe like this:
INPUT:
LOAD * INLINE [
Rownumber, Subject
1, Physics
2, Maths
3, Maths
4, Computer Science
6, Maths
];
for i = 1 to FieldValueCount( 'Rownumber' )
RESULT:
Load
FieldValue('Rownumber',$(i)) as Rownumber,
count(distinct Subject) as DCount
Resident INPUT where Rownumber <= FieldValue('Rownumber',$(i));
Next
join (RESULT) load * resident INPUT;
drop table INPUT;
corrected where condition, I assume Rownumber is not necessarily continuously, like in your sample
Hi Rajesh,
How about this:
Table:
Load * Inline [
Rownumber,Subject
1, Physics
2, Maths
3, Maths
4, Computer Science
6, Maths
];
Left Join (Table)
Load
Subject,
Count(Subject) as SubjectCount
Resident Table
Group by Subject;
Regards,
Stephen
Hi Stephen,
Thanks for the reply. In my case there is only one table, the no. of subjects are variable and I do not know them before hand.
Thanks,
Rajesh Vaswani
Hi,
I have large no. of records. This is something that takes time. Any way to speed this?
thanks,
Rajesh Vaswani
Hi Rajesh,
I am not making myself clear. The first part was me loading some test data to match yours.
It is the second part - the Left Join from the resident table - that creates the SubjectCount field that you were asking for.
Regards,
Stephen
Only by adding more hardware.
Regards,
Stephen
Maybe...
You can actually try to make use of the way how QV stores the field values compressed as distinct values.
This has some potential issues, first, you need to assure that the internal storage of Subject values is in correct Rownumber order.
Then try:
INPUT:
LOAD * INLINE [
Rownumber, Subject
1, Physics
2, Maths
3, Maths
4, Computer Science
6, Sports
7, Maths
8, Physics
];
RESULT:
left join LOAD
Rownumber, if(FieldIndex('Subject', Subject) < peek(DistinctCount), peek(DistinctCount), FieldIndex('Subject',Subject)) as DistinctCount
resident INPUT;
Hi Stephen,
This will not work. I need the no. of subjects until that row is reached.
Thanks,
Rajesh Vaswani
Ah.
Your example showed the same number beside Maths twice so therefore I thought that you were looking for the total.
You will need to order the data coming in and then use Peek and Previous:
Temp_Table:
Load * Inline [
Rownumber,Subject
1, Physics
2, Maths
3, Maths
4, Computer Science
6, Maths
];
Table:
Load
Rownumber,
Subject,
if(Previous(Subject)=Subject, Peek('SubjectCount')+1, 1) As SubjectCount
Resident
Temp_Table
Order by Subject, Rownumber;
Drop Table Temp_Table;
Regards,
Stephen