Solved: Pivot or Straight Chart Question for Disparate dat... - Qlik Community

Report Inappropriate Content · ‎2010-01-27

I have two tables (one is a table that has benchmark info. and the other is a table that has current performance info.), that do not have a common field to link on, and I'm trying to figure out the best way to combine them in either a Straight table or Pivot table. I've attached an example showing a simplified version of the real thing. The simplified version works fine because I only have 12 rows in 1 table and 8 rows in the other. The dataset I'm actually working with is in the hundreds of thousands and the chart refresh is very slow. Is there a better way to build the chart, in the attached example? Keep in mind there is no way to join the two disparate tables together.

thanks,

rob

johnw · ‎2010-01-28

This won't make a fundamental difference, but your expressions can be simplified, which may improve performance somewhat.

Dimension:

if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name)

Benchmark:

num(Benchmark,if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%'))

Current Performance:

num(pick(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6)
,count(if("Opt-in"=1,tran_id))
,count(if(tran_desc='reach',tran_id))/count(tran_id)
,count(if(tran_desc='engage' and "Opt-in"=0,tran_id))/count(tran_id)
,count(if(tran_desc='convert',tran_id))/count(tran_id)
,count(if("Opt-in"=1,tran_id))/count(tran_id)
,count(if("Opt-in"=0,tran_id))/count(tran_id)
,count(if(match(tran_desc,'engage','convert'),tran_id))/count(tran_id)
,count(if(match(tran_desc,'engage','convert','reach'),tran_id))/count(tran_id)
),if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%'))

Status:

"Current Performance"/"Benchmark"-1

But even that's a lot of work for QlikView to do in the chart. Better is to move as much of that work as possible into the load for maximum chart performance. For the dimension, in your sample data, those are the only possible values for qlik_id, so there's no reason to check them. You could just use Row_Name as your dimension. I'm assuming, though, that you have some IDs you don't want to use in your real data set. Even then, you could do:

LEFT JOIN ([Benchmark])
LOAD
qlik_id
,if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name) as Show_Row_Name
RESIDENT [Benchmark]
;

And then use Show_Row_Name as your dimension. That way, you're only checking the IDs on the load, instead of every time the chart is calculated. You can use similar tricks to avoid the rest of the conditions in your chart, since IF statements are going to be slow:

LEFT JOIN ([Benchmark])
LOAD
qlik_id
,if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name) as Show_Row_Name
,if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%') as Format
RESIDENT [Benchmark]
;
LEFT JOIN ([Current_Performance])
LOAD
tran_id
,if("Opt-in"=1,1) as opt_in
,if("Opt-in"=0,1) as opt_out
,if(tran_desc='reach',1) as tran_reach
,if(tran_desc='engage' and "Opt-in"=0,1) as tran_engage_in
,if(tran_desc='convert',1) as tran_convert
,if(match(tran_desc,'engage','convert'),1) as tran_engage_convert
,if(match(tran_desc,'engage','convert','reach'),1) as tran_engage_convert_reach
RESIDENT [Current_Performance]
;

And then use these expressions:

Dimension:

Show_Row_Name

Benchmark:

num("Benchmark","Format")

Current Performance:

num(pick(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6)
,sum(opt_in)
,sum(tran_reach)/count(tran_id)
,sum(tran_engage_in)/count(tran_id)
,sum(tran_convert)/count(tran_id)
,sum(opt_in)/count(tran_id)
,sum(opt_out)/count(tran_id)
,sum(tran_engage_convert)/count(tran_id)
,sum(tran_engage_convert_reach)/count(tran_id)
),"Format"))

Status:

"Current Performance"/"Benchmark"-1

You might also try count() instead of sum() in the Current Performance expressions. Either should return the same value, and one or the other may be faster in your environment.

Another possible enhancement would be to specify the "Current Performance" calculations in your Benchmark table, but I forget how to evaluate a string as an expression, and I should really get back to work.

View solution in original post

johnw · ‎2010-01-28

This won't make a fundamental difference, but your expressions can be simplified, which may improve performance somewhat.

Dimension:

if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name)

Benchmark:

num(Benchmark,if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%'))

Current Performance:

num(pick(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6)
,count(if("Opt-in"=1,tran_id))
,count(if(tran_desc='reach',tran_id))/count(tran_id)
,count(if(tran_desc='engage' and "Opt-in"=0,tran_id))/count(tran_id)
,count(if(tran_desc='convert',tran_id))/count(tran_id)
,count(if("Opt-in"=1,tran_id))/count(tran_id)
,count(if("Opt-in"=0,tran_id))/count(tran_id)
,count(if(match(tran_desc,'engage','convert'),tran_id))/count(tran_id)
,count(if(match(tran_desc,'engage','convert','reach'),tran_id))/count(tran_id)
),if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%'))

Status:

"Current Performance"/"Benchmark"-1

But even that's a lot of work for QlikView to do in the chart. Better is to move as much of that work as possible into the load for maximum chart performance. For the dimension, in your sample data, those are the only possible values for qlik_id, so there's no reason to check them. You could just use Row_Name as your dimension. I'm assuming, though, that you have some IDs you don't want to use in your real data set. Even then, you could do:

LEFT JOIN ([Benchmark])
LOAD
qlik_id
,if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name) as Show_Row_Name
RESIDENT [Benchmark]
;

And then use Show_Row_Name as your dimension. That way, you're only checking the IDs on the load, instead of every time the chart is calculated. You can use similar tricks to avoid the rest of the conditions in your chart, since IF statements are going to be slow:

LEFT JOIN ([Benchmark])
LOAD
qlik_id
,if(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6),Row_Name) as Show_Row_Name
,if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%') as Format
RESIDENT [Benchmark]
;
LEFT JOIN ([Current_Performance])
LOAD
tran_id
,if("Opt-in"=1,1) as opt_in
,if("Opt-in"=0,1) as opt_out
,if(tran_desc='reach',1) as tran_reach
,if(tran_desc='engage' and "Opt-in"=0,1) as tran_engage_in
,if(tran_desc='convert',1) as tran_convert
,if(match(tran_desc,'engage','convert'),1) as tran_engage_convert
,if(match(tran_desc,'engage','convert','reach'),1) as tran_engage_convert_reach
RESIDENT [Current_Performance]
;

And then use these expressions:

Dimension:

Show_Row_Name

Benchmark:

num("Benchmark","Format")

Current Performance:

num(pick(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6)
,sum(opt_in)
,sum(tran_reach)/count(tran_id)
,sum(tran_engage_in)/count(tran_id)
,sum(tran_convert)/count(tran_id)
,sum(opt_in)/count(tran_id)
,sum(opt_out)/count(tran_id)
,sum(tran_engage_convert)/count(tran_id)
,sum(tran_engage_convert_reach)/count(tran_id)
),"Format"))

Status:

"Current Performance"/"Benchmark"-1

You might also try count() instead of sum() in the Current Performance expressions. Either should return the same value, and one or the other may be faster in your environment.

Another possible enhancement would be to specify the "Current Performance" calculations in your Benchmark table, but I forget how to evaluate a string as an expression, and I should really get back to work.

Report Inappropriate Content · ‎2010-02-01

Thanks for your reply John. I'll definitely give this a try.

Report Inappropriate Content · ‎2010-02-02

Hey John,

Your solution above works on my production set of data that has over 8,4M records. The only problem is it literally takes a shad over 5 minutes for my formulas to return, which isn't surprising based on how I have the data laid out and the number of records. The reason I'm using that qlik_id is to control the cell the number goes in for the Benchmark and Current Performance columns. Is there a way to control the cell the Current performance data goes into? I.E. instead of having to call that qlik_id in the current Performance column is there a way to just have an if statement, like the one you have in first chart in the example you provided, that displays the results for each if statement for each row? So instead of matching the sum(opt_in) to qlik_id 5 is there a way to say sum(opt_in) goes on the first row of the Current Performance column?

Thanks again for your help,

rob

johnw · ‎2010-02-02

I think any performance difference would be negligible, but perhaps I'm wrong. If I understood what you were asking for, you could do it like this:

num(pick(rowno()
,sum(opt_in)
,sum(tran_reach)/count(tran_id)
,sum(tran_engage_in)/count(tran_id)
,sum(tran_convert)/count(tran_id)
,sum(opt_in)/count(tran_id)
,sum(opt_out)/count(tran_id)
,sum(tran_engage_convert)/count(tran_id)
,sum(tran_engage_convert_reach)/count(tran_id)
),Format)

Report Inappropriate Content · ‎2010-02-02

You were right, the performance was negligible.

One thing I've done to get around the performance issue is to create a straight table for each row and then move the rows together to give it the ilusion that it's all in one chart. That seems to work except for the fact that the client wants to be able to download the chart to excel which presents a problem.

johnw · ‎2010-02-02

You know, I really wouldn't expect that to take five minutes, even with 8,4 million records. At least if it's anything close to the example. I also wouldn't expect the performance to be more than trivially different from having a straight table for each row, so if doing a straight table for each row is fast, that's a big clue (even if I'm not sure what it tells us).

The Benchmark expression is trivial. Heck, you could make it a dimension instead of an expression, though that breaks the rowno() modification. Shouldn't make anything any worse having it as an expression.

There's really nothing special, processing wise, to the current performance expression either. The qlik_id check would only happen once per row, so should add only negligible time. Checking rowno() should be faster, but not by much. The only thing that would take time are the sum() and count() expressions, which have been made as simple as possible (?), and should be the same in the one chart per row solution (?).

The status expression is a simple division and subtraction, once per row. The gauge settings are fixed, so nothing complicated there.

I just can't think where the time might be going.

Can you maybe post a reduced and scrambled version of the real application, showing both the one table for each row solution and the attempt to put it all in one chart? Or maybe just give the real expressions for both approaches?

Report Inappropriate Content · ‎2010-02-02

I'm not sure what the post size limit is on this forum but my app is over 175 MB so I'm guessing it's too large to post...

Your above post made me take a second look at my structure and the formula I'm using. It is not only pulling from the table with 8.4M records but it's using a flag from a 2.2M record table. I should be able to add the flag to the 8.4 M record table. Below is the actual formula I'm using. The InteractionDate and InteractionID are both from the table with 8.4M records and the Optin_flg is pulling from the 2.2M record table. I hard coded the other rows so I could focus on that first formula.


num(pick(match(qlik_id,5,6,6.1,6.2,6.3,6.4,6.5,6.6)
 ,count(if(InteractionDate >= addmonths(today(),-6) and OptIn_flg=1,InteractionID))
 ,.91
 ,.021
 ,.045
 ,.026
 ,.012
 ,.012
 ,.008
 ),if(qlik_id=5,'#,###; (#,###)','#,###.#%;-#,###.#%'))

johnw · ‎2010-02-02

Well, I doubt this is the whole solution, but count(if(...)) is often slow. That's why I was pushing all of the if() statements into the load. Since today() by default refers to the current date at the time of the load, it shouldn't make any difference in the results, but should run faster. To do it in the load will take adding the flag to the 8.4M record table, which is what you were discussing doing anyway. That seems like a good idea to me. Something to try, anyway. I bet it at least helps.

Report Inappropriate Content · ‎2010-02-15

John,

Would it be possible to push the entire formula into the load statement?


test:
LOAD
/*#2*/
pick(rowno(),5,6,6.1,6.2,6.3,6.4,6.5,6.6) as qlik_id
, (
 count(if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 -
 count(if(bounceback_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 )
 /
 count(if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID)) as static_1b
/*#3*/
,(
 count(distinct if(sent_flg=1 and Email_Cat='C' and ECon_OptIn_flg=1 and EmailInteractionDate >= addmonths(today(),-6) ,EmailPersonID))
 -
 count(distinct if(bounceback_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 )
 /
 count(distinct if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID)) as static_1c
/*#4*/ ,count(if(open_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 /
 count(if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID)) as static_1d
/*#5*/ ,count(if(clickthrough_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 /
 count(if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID)) as static_1e
/*#6*/ ,count(if(open_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID))
 /
 count(if(sent_flg=1 and Email_Cat='C' and EmailInteractionDate >= addmonths(today(),-6),EmailPersonID)) as static_1f
 ,0 as static_1g
 ,0 as static_1h
,1 as qlik_counter
Resident EmailInteraction;

I'm trying to get the first formula to pair with qlik_id 5 and the second formula to pair with 6, etc. Any ideas on what I might be missing in my load? Is this even possible to do?

thanks,

rob

Pivot or Straight Chart Question for Disparate datasets