Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
svm_1412
Contributor II
Contributor II

Qlikview Data Cleanup

Hallo everyone!

I am fairly new to Qlikview. Have a question regarding Data Clean-up. 

The below Table shows the values,

Row values marked with the font RED sum up to "Zero", in such cases I want to set the values(that made up the sum) to "Zero".

The combination could be from Bucket_0_30 until Bucket_2880.

Attached is an Excel File explaining it in detail. I  have around 200K Rows, and was not able to find a solution.

AccountBucket_0_30Bucket_31_60Bucket_61_90Bucket_91_120Bucket_121_150Bucket_151_180Bucket_181_360Bucket_361_720Bucket_721_1080Bucket_1081_1440Bucket_1441_1800Bucket_1801_2160Bucket_2161_2520Bucket_2521_2880Bucket_2880Total Buckets/faellig_gesamt
1000-5,990000-4,9957,8100-15,61-37,210-5,99
20000000000000-66,130-66,13
3-339,3659,98169,270000-19,9819,980000-22,2122,21-110,11
43,272,8730,28055,10-100-27,61-12,390000400-8,48
51,495,280,791,671,02002,485,2520,2020,89-40,62-99,350-80,9
61,573,343,892,592,097,44-5,862,67-31,124,49000-17,4817,48-8,9
7-933,36-363,58-77,88001327,27-1761,931761,9300-891,61-304,011195,6200-47,55
800-20,67000-36,992,86-13,7947,92000-11,6411,64-20,67
9000000-31,34017,990000-19,9419,94-13,35

 

Thanks in Adv

Regards

Manju

Labels (2)
1 Solution

Accepted Solutions
jensmunnichs
Creator III
Creator III

So this ended up being a bit easier than I thought it would be, though again I'm not sure about the performance of the new app.

Changelog:

  • Used your Excel sheet instead of Inline data
  • FlagVar (and everything related to FlagVar) removed, no longer needed
  • Loops reversed, now start at 15 and work their way back to 1 ('step -1') to make sure we read right to left
  • $(b) is now the first bucket in the row, $(a) is the last (used to be flipped)
  • CurFlag variable removed, as we now only ever have 1 flag field at a time
  • Flagged values are now replaced immediately after flagging, so that the old values are no longer considered in flagging remaining values

The results I got seemed to all have the same totals as before, so I think this has been a success. I will say though that when I compare the values using an Excel formula, I'm getting some 'FALSE's returned, so I think the formatting might be a bit off. However, this should easily be fixed by reloading the data using formatting functions (like text() or num()).

Hope this now does what you were looking for, if not let me know!

View solution in original post

9 Replies
jensmunnichs
Creator III
Creator III

What's the logic behind summing those specific values? Or are the values  that need to be summed marked red in the source data as well?

svm_1412
Contributor II
Contributor II
Author

Hey,

No, the values are not marked with Red. I used it to frame the question so I could explain clearly.

My thought process is: sum of  the combinations should give a ZERO. It can be two combinations, or any number of combinations. 

But its always Continuous. I mean, there may be 0 in between the values(like in the first row below), but they are always Continuous.

Bucket_0_30Bucket_31_60Bucket_61_90Bucket_91_120Bucket_121_150Bucket_151_180Bucket_181_360Bucket_361_720Bucket_721_1080Bucket_1081_1440Bucket_1441_1800Bucket_1801_2160Bucket_2161_2520Bucket_2521_2880Bucket_2880
000-5,990000-4,9957,8100-15,61-37,210
0000000000000-66,130
-339,3659,98169,270000-19,9819,980000-22,2122,21

 

 thx for your time!

jensmunnichs
Creator III
Creator III

Hey,

PFA

Sorry it took a while, busy weekend.

This was quite a tricky but fun one to do. I'm about 1000% sure there are more efficient solutions to this problem than the one I came up with, but it works! I hope it holds up when using all of your data. Keep in mind that if your real data has more buckets, you'll have to change the script a bit (should be fairly self-explanatory).

I found this video explaining a different logical solution to this problem, though I was unfortunately not able to make it work in Qlikview. If you are able to implement it somehow, it would probably be a lot more efficient than my solution, so I thought I'd post it anyway.

Hope this helps! Let me know if you have any questions about this.

svm_1412
Contributor II
Contributor II
Author

Hey Jens!

thx much for the App, its very well Documented and I could understand what its doing in each and every Step.

I connected it with the actual Data, it took around 2 hours on my Local machine but it came up with the output !!!

Looks very good, but there is one small glitch:

I found that there is a difference in Sum (Buckets Total) between that of Excel and that of Qlikview(around 100.000  in Total for 200k Rows). Looked up deeper and could find some Rows, that have a difference in SUM.

In the attached Sample I have included some Rows, Sample Account: 13071, 31085, 46627  (highlighted in Yellow).

For Ex. In the Row(13071) it has to sum up into -1,5 but turns out to be 0 in Output. And I could not figure out what exactly is making it to 0.

Thanks & Regards

 

 

 

jensmunnichs
Creator III
Creator III

Hi there,

I think this is the result of checking all continuous combinations before replacing. If you look at columns K through P (for 13071), you'll see that those values sum up to be 0 as well, so those values get replaced too.

I'm sure it wouldn't be impossible to replace after every check, but the question is, in what order should the rows be checked? Depending on your answer, we might have to change the loop logic a bit as well. For example, I could swap them around so that it first checks every combination of continuous rows with a length of 1 value, then 2, then 3, etc. Or we could stick with what it does now, which is to first check every row of any length starting from the first column, then every row of any length starting from the second column, and so on.

Also, should we check left to right, right to left?

I hope this makes sense, if you have any questions please let me know.

svm_1412
Contributor II
Contributor II
Author

Hi Jens,

ach! i understand. That is the reason why its adding up to Zero.

I would say of the Order from Right to Left. Because the first  Open Payment is 238,86 and then the customer paid -240,36 and so forth.

The option with checking every combination of continuous rows with a length of 1 value, then 2, then 3 sounds promising.

So for my understanding,  the loop first looks up for the Subset in RED(beginning from Right), replaces it and then looks up for the next subset(in our case Green= -1,5 ) correct? 

1307100000000-1,51,50-210,42210,42-240,36238,86

 

But i cannot wrap my head around how much complexity it would consist.

Thanks and regards

 

jensmunnichs
Creator III
Creator III

Hi,

>The option with checking every combination of continuous rows with a length of 1 value, then 2, then 3 sounds promising.

I believe we might actually have to use the other method I described, as I think this would not return the result you described.

>So for my understanding,  the loop first looks up for the Subset in RED(beginning from Right), replaces it and then looks up for the next subset(in our case Green= -1,5 ) correct? 

In this case, if we are checking every row of 1 value, then every row of 2 values, etc., we would actually find and replace -210,42/210,42 and -1,5/1,5 before finding the row you marked in red (as this row consists of 6 values), even if we are going right to left.

I'll therefore try to make it so we check every row starting from the final column, then every row starting 1 column in, then 2, etc. No promises yet but I'll see if I can make it work. One thing I can already tell you though is that it will probably perform worse than the previous app I posted, but we'll just try to get the logic right first and worry about performance later.

jensmunnichs
Creator III
Creator III

So this ended up being a bit easier than I thought it would be, though again I'm not sure about the performance of the new app.

Changelog:

  • Used your Excel sheet instead of Inline data
  • FlagVar (and everything related to FlagVar) removed, no longer needed
  • Loops reversed, now start at 15 and work their way back to 1 ('step -1') to make sure we read right to left
  • $(b) is now the first bucket in the row, $(a) is the last (used to be flipped)
  • CurFlag variable removed, as we now only ever have 1 flag field at a time
  • Flagged values are now replaced immediately after flagging, so that the old values are no longer considered in flagging remaining values

The results I got seemed to all have the same totals as before, so I think this has been a success. I will say though that when I compare the values using an Excel formula, I'm getting some 'FALSE's returned, so I think the formatting might be a bit off. However, this should easily be fixed by reloading the data using formatting functions (like text() or num()).

Hope this now does what you were looking for, if not let me know!

svm_1412
Contributor II
Contributor II
Author

Hey,

Perfekt! does exactly what I was looking for.

And the Performance is better than the first, it took around 40 min for 200k+ Rows.

Ya, there was a formatting error in Excel, i corrected them and I have an exact match.

Thank you so much for your time.

Regards,

svm