Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a table containing ServiceDate and ChargeAmount.
I also have a variable where a user can input a number between 0 and 90 (vDaysBack) to indicate the number of days back they want to go from the last day of the month to sum charges. That sum is then divided by the selected number of days (vDaysBack) to give a value of AverageDailyCharges.
This is to be displayed in a straight table chart:
Dimension = Date(Floor(MonthEnd(ServiceDate))) = [Month Year]
Expression1 = Sum of ChargeAmount where ServiceDate >= Latest ServiceDate in current dimension row - vDaysBack = [Charges]
Expression2 = Expression1 / vDaysBack = [Avg Daily Charges]
For example, given the following data:
ServiceDate ChargeAmount
2/15/2012 100.00
2/1/2012 75.00
1/15/2012 125.00
1/1/2012 50.00
12/15/2011 200.00
12/1/2011 250.00
If a user selected 20 for vDaysBack, I want the chart to display
[Month Year] [Charge] [Avg. Daily Charges]
Feb '12 175.00 8.75
Jan '12 125.00 6.25
Dec '11 200.00 10.00
If a user selected 45 for vDaysBack, I want the chart to display
Feb '12 350.00 7.78
Jan '12 175.00 3.89
Dec '11 450.00 10.00
I need to be able to somehow reference the latest service date within each month of the dimension, calculate the date
vDaysBack earlier, and then sum all of the ChargeAmount values where the ServiceDate falls between these two dates,
even if that ServiceDate is outside of the current dimension MonthYear.
I have tried every combination of if statements, set analysis, and aggr that I can think of in my limited experience with Qlikveiw,
and have so far not been able to accomplish this.
I think this should be possible using a data island, see attached sample file.
This is not exactely reproducing your sample expected results, but I couldn't follow your description completely either.
Maybe it's at least a starting point.
Regards,
Stefan
I think this should be possible using a data island, see attached sample file.
This is not exactely reproducing your sample expected results, but I couldn't follow your description completely either.
Maybe it's at least a starting point.
Regards,
Stefan
Stefan,
Thank you very much, that is exactly what I was looking for.
I'm working on testing this with my largest customer dataset right now. In my much smaller test/development dataset, this technique is much slower than using a "non island" [Service Date] for the dimension, but is still acceptable ( 2 or 3 seconds ). It will be interesting to see how this performs with a much larger dataset.
As a takeaway from this, my understanding is that it is not possible to do this when using a dimension that is actually related to the data being summed ( in this case [Service Date] ) via either set analysis or if tests or aggr or any combination. It's not possible to reference data that is related to the dimension being used that is outside the bounds of the current row's dimension value.
Well, I think it might be possible using advance aggregation to access values outside the current dimension values, since the aggr() function will first execute, then limit the results to the current dimension value.
But I don't think the will improve the performance. And another major issue is that you need to query the current dimension value inside the aggr() for your period aggregation (giving the upper limit). This is hard if not impossible to do, if you need to use the same dimension in your aggr() function (here, I really would like to see a GetCurrentTableDimensionValue() function in QV.)
Haven't your sample data at hand, maybe you could use an advanced aggregation, maybe not.
Maybe I missed something even better, let me recap if I find time.
edit:
One possible might be interesting anyway: Use a combination of conditional function like pick() to check the dimension value and branch into several set expressions (looks ugly, but should perform well). Like discussed here:
With a dataset of about 50K rows, using the data island with if tests technique took about 2 to 3 seconds to render the chart. With a dataset of about 3.5 million rows, it takes around 2 minutes, which probably won't be acceptable.
If you can think of any alternatives to explore, I will be very grateful. Thank you very much for your help.
Attached a sample of how these 'picked' set expressions could look like (handcrafted for the few sample dimension values).
For production you would need to use an expression generated like John suggested in the above mentioned blog post.
Regards,
Stefan
Stefan,
Once again, thank you for your quick and helpful responses!
I will have to push off further exploration of this to another Sprint when my product owner can re-prioritize this work.
I'll respond back with the results once I'm able to do so.
Dale
Stefan,
I've been able to get to the point of "handcrafting" similar expressions to test with my largest customer dataset, which has about 5 million rows in the table I need to chart. Also, it's a lot more complicated data model compared to these simple examples.
While this technique is working well in terms of producing the results, it is too slow to work with the full dataset. It takes 3 minutes or so to render the charts that I've built across the full 5 million rows.
For now, I've added a calculation condition to the charts so that they don't attempt to render unless the user has made enough selections to narrow the number of rows being considered to about 500K. With that, it's still taking about 15 to 20 seconds to render the charts, but that may be acceptable. My product owner is reviewing now with our client, and hopefully this will fly.
Thanks again for your help,
Dale
Thanks for the feedback on performance.
Have you noticed John's remarks on performance using or not using variables and getting rid of the match function (I am talking about above referenced link)? If you already have compared the performance, what was your experience?