Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have clean/high quality data in a difficult to use format. I am hoping someone clever can give me a leg up on making my data more useful.
Each PersonID have values that get updated over time. The SequenceNumber *always* grows - it's a primary key. The problem is this only tracks changes. If there are no changes the cell is left blank.
<SequenceNumber> | <PersonID> | <Var1> | <Var2> | <Var3> |
12315 | 1 | aa | ||
12366 | 1 | bb | cc | |
12367 | 1 | bb1 | ||
12499 | 1 | bb2 | cc1 |
The desired result is
<PersonID> | <Var1> | <Var2> | <Var3> |
1 | aa | bb2 | cc1 |
If this were just 3 or 4 variables and a thousand rows I'd probably just bite the bullet and brute-force this with loops and inspecting each variable. HOWEVER - I have 500,000 rows and several hundred variables. So I need something more clever (group by PersonID with highest sequence number where value isn't blank - for each variable).
This is either super easy and I'm just missing the obvious, or this is neigh impossible within Qlik without inspecting each value in each row.
I'm not sure of an easier way, but this is what i would do:
//unpivot source data using crosstable
source_data:
crosstable (varname, varvalue, 2)
load *
;
load
*
from https://community.qlik.com/t5/QlikView-App-Dev/Data-Prep-Get-most-recent-value/td-p/1843663
(html, utf8, embedded labels, table is @1)
;
//flag max sequence number for each person/variable
left join (source_data)
//flag_max_records:
load
max([<SequenceNumber>]) as [<SequenceNumber>]
,[<PersonID>]
,varname
,1 as keep_record
resident source_data
where varvalue > 0
group by [<PersonID>]
,varname
;
//pivot using generic load on only the the max records
max_records:
generic load
[<PersonID>],varname, varvalue resident source_data
where keep_record = 1;
//join generic load tables into single table
set vListOfTables = ;
for vTableNo = 0 to NoOfTables()
let vTableName = Tablename($(vTableNo));
If Subfield(vTableName,'.',1)='max_records' Then
Let vListOfTables = vListOfTables & If(Len(vListOfTables)>0,',') & Chr(39) & vTableName & Chr(39) ;
End If
Next vTableNo
;
trace vTableName $(vTableName);
final_data:
load distinct
[<PersonID>]
resident source_data;
For each vTableName in $(vListOfTables)
Left Join (final_data) Load * Resident [$(vTableName)];
Drop Table [$(vTableName)];
Next vTableName
drop table source_data;
exit script;
Ick! That looks like an awful mess.
...but still a thousand times better than any solution I was thinking of!!
The unpivot/cross-table makes perfect sense here.
As a sidebar, I've seen quite a few Qlik tutorials that use cross-tables (multi-language support comes to mind). I should probably find a way to work then into my repotoir.
It's not that messy, but it's not a single function for sure.
If it was a few fixed number of columns i would use peek function sorting by sequency#, but being you have so many columns i think this could be the way to go.
If it works for you, please mark as solution.
Good luck.