Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Lookup missing values in an existing column

Hello!

I am trying to populate a sparse column based on values of the same column. I attached an example.

I have a table like the following:

Id, Before, Group

1, None, 1

2, 1, -

3, None, 2

4, 2, -

5, 4, -

6, 3, -

To fill the missing values I intend to lookup the value in Before in the Id column and receive the value of Group. For example in line 2 i have the value 1 in "Before", so it should look in line 1 (as there is "Id" 1) and return 1 for "Group".

I want to receive the following result:

Id, Before, Group

1, None, 1

2, 1, 1,

3, None, 2

4, 2, 1

5, 4, 1

6, 3, 2

I am using LEFT JOIN right now to lookup the missing groups and I understand that this will not work because the column group already exsits. But in fact I do not know how to deal with it. Can anyone help?

20 Replies
swuehl
MVP
MVP

Maybe you are mixing up AUTONUMBER( expression [, AutonumberID] ) and AUTONUMBERHASH128( expression {, expression}).

AUTONUMBER will create a unique number per expression value, starting from 1 and incremented by 1 for each new expression value as it occurs.

But you can create several autonumber counters that are used independently from each other, in my example, I am using two counters, the default counter (without second argument) and one counter named 'Lookup'.

I am doing so because the first counter should create a number for each occurence of 'None' in Before field, the second should essentially create and index / counter for each record.

Not applicable
Author

Using your advice for my small dataset works perfectly but not for my production data. I really cannot figure out, what the problem is. For some entries the group is assigned well but for some reason something weird appears. Is it possible that I'll send you my data in a personal message?

swuehl
MVP
MVP

You can send me a message with a link where I can download your file.

Not applicable
Author

Thanks a lot!

Right now I could not send you a message?

Not applicable
Author

I used your approach to get the right group-value for each id.

But in the attached file you could see that (for some reason I could not understand) the wrong group is picked. Using the selection I made in the attatched file you see that id "scza140|1|2|22.09. 05:46" has the group-value 606 although the value before (scza140|1|1|22.09. 05:14) belongs to group 1 and for that reason they should belong to the same group.

I hope you can figure out what is going wrong cause i really do not get the point (because 16 values are correctly assigned to group 1)

swuehl
MVP
MVP

Dorothea, your sample Id values are not unique (and the above approach relies on unique Ids).

For example these IDs with frequency added:

SCZB185|26|1|26.09. 03:063
SCZB152|76|1|26.09. 09:002
SCZB152|8|1|25.09. 13:342
SCZC185|59|1|24.09. 03:592
SCZA155|48|1|23.09. 16:492
SCZC185|22|1|25.09. 19:522
SCZA155|63|1|22.09. 18:292
SCZA152|25|1|22.09. 18:202
SCZA152|19|1|24.09. 07:222
SCZA155|27|1|24.09. 12:062
SCZA145|46|1|22.09. 02:132

Is this what you expect in your data? Maybe the timestamp is just not granular enough?

If you expect this, how would you then like to do the lookup?

swuehl
MVP
MVP

Another issue is that you sort your data by Time, but then Before lookup requires to lookup ahead in time:

Id Before Group
SCZA152|51|1|23.09. 05:26SCZA152|51|2|23.09. 06:00
Not applicable
Author

Yes I noticed that my Ids are not unique but in fact they should be. The Ids are not given in my raw data, they are created (and that is still one of my big problems).

I attached a sample file of my raw data which explains how "SCZB185|26|1|26.09. 03:06" could be connected trice. Be aware that the columns "Line" and "Linked Line" were added manually by me and is not in the data present.

Each line of my data set does not have a unique identifier, so I created one using a combination of LP-Nr, LPT, Nr and Arrival. Arrival or Departure are both possible, but one timestamp is need as the combination of LP-Nr, LPT and Nr could exist for each day.

Additionally each line has the columns Linked LP-Nr, Linked LPT and Linked Nr to show the following action. But as I mentioned before it is not unique without any timestamp. So the challenge is to finde the right timestamp.

To finde the right timestamp i have to distinguish between 2 cases:

1. Combination of LP-Nr & LPT is equal to combination of Linked LP-Nr & Linked LPT (they only differ concerning the Nr): In this case i use the Arrival of the action whose departure is at the same DATE as the arrival. I could not find any mistakes for those connections.

2. This case is much more difficult and does not work 100% correctly. In this case the LPT and Linked LPT are not equal or LP-Nr and Linked LP-Nr are not equal. This case is displayed in the attached file. The lines 1-3 are all linked to SCZB185|26|1 which exists trice (on 24, 25 and 26 September). In the script I connect each line to all possible lines (1-4, 1-5, 1-6, 2-4, 2-5, 2-6, 3-4, 3-5, 3-6) and compute the difference between arrival of the first action and departure of the connected action and only keep those who are >= 0 (1-5, 1-6, 2-5, 2-6, 3-5, 3-6). And then I'll take the minimum difference. But in this case for lines 1-3 the difference is minimal always for the combination with line 5 and for that reason it is not unique.

The right solution would be 1-4, 2-5, 3-6 as each connection should only be used once.

If you had any ideas for that problem i would be very thankful!

And if i understand you right: if my ids are unique, the assigning of the group should work perfectly?

swuehl
MVP
MVP

I think there must be a different key to solve your issues. In your present data or in data you may need to access.

Just looking at your file: Wouldn't Abbhf and Anbhf tell you the connections needed?

  

AbbhfAnbhf
AB
CD
EF
BB
DC
FF
Not applicable
Author

No the combination of Abbhf and Anbhf is not unique and with that combination i will never find the subsequent action of one action (the goal is to display kind of a timetable) because for the subsequent action i only have the combination of Linked LP-Nr, Linked LPT and Linked Nr as hint.

So you do not have any idea how to solve the issue to get unique connections between my actions?