Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hallo everyone!
I am fairly new to Qlikview. Have a question regarding Data Clean-up.
The below Table shows the values,
Row values marked with the font RED sum up to "Zero", in such cases I want to set the values(that made up the sum) to "Zero".
The combination could be from Bucket_0_30 until Bucket_2880.
Attached is an Excel File explaining it in detail. I have around 200K Rows, and was not able to find a solution.
Account | Bucket_0_30 | Bucket_31_60 | Bucket_61_90 | Bucket_91_120 | Bucket_121_150 | Bucket_151_180 | Bucket_181_360 | Bucket_361_720 | Bucket_721_1080 | Bucket_1081_1440 | Bucket_1441_1800 | Bucket_1801_2160 | Bucket_2161_2520 | Bucket_2521_2880 | Bucket_2880 | Total Buckets/faellig_gesamt |
1 | 0 | 0 | 0 | -5,99 | 0 | 0 | 0 | 0 | -4,99 | 57,81 | 0 | 0 | -15,61 | -37,21 | 0 | -5,99 |
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -66,13 | 0 | -66,13 |
3 | -339,36 | 59,98 | 169,27 | 0 | 0 | 0 | 0 | -19,98 | 19,98 | 0 | 0 | 0 | 0 | -22,21 | 22,21 | -110,11 |
4 | 3,27 | 2,87 | 30,28 | 0 | 55,1 | 0 | -100 | -27,61 | -12,39 | 0 | 0 | 0 | 0 | 40 | 0 | -8,48 |
5 | 1,49 | 5,28 | 0,79 | 1,67 | 1,02 | 0 | 0 | 2,48 | 5,25 | 20,2 | 0 | 20,89 | -40,62 | -99,35 | 0 | -80,9 |
6 | 1,57 | 3,34 | 3,89 | 2,59 | 2,09 | 7,44 | -5,86 | 2,67 | -31,12 | 4,49 | 0 | 0 | 0 | -17,48 | 17,48 | -8,9 |
7 | -933,36 | -363,58 | -77,88 | 0 | 0 | 1327,27 | -1761,93 | 1761,93 | 0 | 0 | -891,61 | -304,01 | 1195,62 | 0 | 0 | -47,55 |
8 | 0 | 0 | -20,67 | 0 | 0 | 0 | -36,99 | 2,86 | -13,79 | 47,92 | 0 | 0 | 0 | -11,64 | 11,64 | -20,67 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | -31,34 | 0 | 17,99 | 0 | 0 | 0 | 0 | -19,94 | 19,94 | -13,35 |
Thanks in Adv
Regards
Manju
So this ended up being a bit easier than I thought it would be, though again I'm not sure about the performance of the new app.
Changelog:
The results I got seemed to all have the same totals as before, so I think this has been a success. I will say though that when I compare the values using an Excel formula, I'm getting some 'FALSE's returned, so I think the formatting might be a bit off. However, this should easily be fixed by reloading the data using formatting functions (like text() or num()).
Hope this now does what you were looking for, if not let me know!
What's the logic behind summing those specific values? Or are the values that need to be summed marked red in the source data as well?
Hey,
No, the values are not marked with Red. I used it to frame the question so I could explain clearly.
My thought process is: sum of the combinations should give a ZERO. It can be two combinations, or any number of combinations.
But its always Continuous. I mean, there may be 0 in between the values(like in the first row below), but they are always Continuous.
Bucket_0_30 | Bucket_31_60 | Bucket_61_90 | Bucket_91_120 | Bucket_121_150 | Bucket_151_180 | Bucket_181_360 | Bucket_361_720 | Bucket_721_1080 | Bucket_1081_1440 | Bucket_1441_1800 | Bucket_1801_2160 | Bucket_2161_2520 | Bucket_2521_2880 | Bucket_2880 |
0 | 0 | 0 | -5,99 | 0 | 0 | 0 | 0 | -4,99 | 57,81 | 0 | 0 | -15,61 | -37,21 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -66,13 | 0 |
-339,36 | 59,98 | 169,27 | 0 | 0 | 0 | 0 | -19,98 | 19,98 | 0 | 0 | 0 | 0 | -22,21 | 22,21 |
thx for your time!
Hey,
PFA
Sorry it took a while, busy weekend.
This was quite a tricky but fun one to do. I'm about 1000% sure there are more efficient solutions to this problem than the one I came up with, but it works! I hope it holds up when using all of your data. Keep in mind that if your real data has more buckets, you'll have to change the script a bit (should be fairly self-explanatory).
I found this video explaining a different logical solution to this problem, though I was unfortunately not able to make it work in Qlikview. If you are able to implement it somehow, it would probably be a lot more efficient than my solution, so I thought I'd post it anyway.
Hope this helps! Let me know if you have any questions about this.
Hey Jens!
thx much for the App, its very well Documented and I could understand what its doing in each and every Step.
I connected it with the actual Data, it took around 2 hours on my Local machine but it came up with the output !!!
Looks very good, but there is one small glitch:
I found that there is a difference in Sum (Buckets Total) between that of Excel and that of Qlikview(around 100.000 € in Total for 200k Rows). Looked up deeper and could find some Rows, that have a difference in SUM.
In the attached Sample I have included some Rows, Sample Account: 13071, 31085, 46627 (highlighted in Yellow).
For Ex. In the Row(13071) it has to sum up into -1,5 but turns out to be 0 in Output. And I could not figure out what exactly is making it to 0.
Thanks & Regards
Hi there,
I think this is the result of checking all continuous combinations before replacing. If you look at columns K through P (for 13071), you'll see that those values sum up to be 0 as well, so those values get replaced too.
I'm sure it wouldn't be impossible to replace after every check, but the question is, in what order should the rows be checked? Depending on your answer, we might have to change the loop logic a bit as well. For example, I could swap them around so that it first checks every combination of continuous rows with a length of 1 value, then 2, then 3, etc. Or we could stick with what it does now, which is to first check every row of any length starting from the first column, then every row of any length starting from the second column, and so on.
Also, should we check left to right, right to left?
I hope this makes sense, if you have any questions please let me know.
Hi Jens,
ach! i understand. That is the reason why its adding up to Zero.
I would say of the Order from Right to Left. Because the first Open Payment is 238,86 and then the customer paid -240,36 and so forth.
The option with checking every combination of continuous rows with a length of 1 value, then 2, then 3 sounds promising.
So for my understanding, the loop first looks up for the Subset in RED(beginning from Right), replaces it and then looks up for the next subset(in our case Green= -1,5 ) correct?
13071 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1,5 | 1,5 | 0 | -210,42 | 210,42 | -240,36 | 238,86 |
But i cannot wrap my head around how much complexity it would consist.
Thanks and regards
Hi,
>The option with checking every combination of continuous rows with a length of 1 value, then 2, then 3 sounds promising.
I believe we might actually have to use the other method I described, as I think this would not return the result you described.
>So for my understanding, the loop first looks up for the Subset in RED(beginning from Right), replaces it and then looks up for the next subset(in our case Green= -1,5 ) correct?
In this case, if we are checking every row of 1 value, then every row of 2 values, etc., we would actually find and replace -210,42/210,42 and -1,5/1,5 before finding the row you marked in red (as this row consists of 6 values), even if we are going right to left.
I'll therefore try to make it so we check every row starting from the final column, then every row starting 1 column in, then 2, etc. No promises yet but I'll see if I can make it work. One thing I can already tell you though is that it will probably perform worse than the previous app I posted, but we'll just try to get the logic right first and worry about performance later.
So this ended up being a bit easier than I thought it would be, though again I'm not sure about the performance of the new app.
Changelog:
The results I got seemed to all have the same totals as before, so I think this has been a success. I will say though that when I compare the values using an Excel formula, I'm getting some 'FALSE's returned, so I think the formatting might be a bit off. However, this should easily be fixed by reloading the data using formatting functions (like text() or num()).
Hope this now does what you were looking for, if not let me know!
Hey,
Perfekt! does exactly what I was looking for.
And the Performance is better than the first, it took around 40 min for 200k+ Rows.
Ya, there was a formatting error in Excel, i corrected them and I have an exact match.
Thank you so much for your time.
Regards,
svm