Solved: Re: Correlation Sample Size - Qlik Community

GLicks_DG · ‎2019-06-12

Hi all!

Currently i'm working a lot with correlations (correl() function). Though I'm pretty much correlating tons of data to see what pops out. Because of the large amount of variables i am correlating, I'm not quite sure about how many samples (pairs) my correlations are actually using.

For example, some of my variables may use hundreds of samples and other might only have as few as 2 or 3 samples. In order to somehow validate my results, I need a way to make sure i have a large enough sample for that particular correl.

So my question is: Is there a way of getting the number of pair-points the correl function is using? Or perhaps some way to only keep a correlation value if i have a given sample size and discard the rest?

@EDIT

If I had the following dataset and I then calculated the correlation coefficient between 'Macro1' and the 3 'Micro' columns, I would like to know how many 'pair-points' did the correlation used.

Data	Macro1	Micro1	Micro2	Micro3
1/1/2018	1000.00	100.00	-	25.00
2/1/2018	1010.00	99.00	-	-
3/1/2018	1005.00	102.00	10.00	-
4/1/2018	1010.00	105.00	12.00	30.00
5/1/2018	995.00	98.00	9.00	29.00
6/1/2018	-	-	-	30.00
7/1/2018	1005.00	101.00	11.00	31.00
8/1/2018	995.00	100.00	10.00	28.00
9/1/2018	1010.00	102.00	-	32.00
10/1/2018	1000.00	98.00	10.00	-

For instance:

Correl(Macro1,Micro1) would use 9 'pair-points' (all but 6/1/2018 row);

Correl(Macro1,Micro2) would use only 6 'pair-points' (3/1, 4/1, 5/1, 7/1, 8/1 and 10/1);

Correl(Macro1,Micro3) would use 6 'pair-points' also (1/1, 4/1, 5/1, 7/1, 8/1, and 9/1);

Is there anyway to get this number of 'pair-points' automatically? Or maybe create some clause to only compute the correlation if there is at least a certain number of 'pair-points'?

Thanks in advance!

GLicks_DG · ‎2019-06-19

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

MM:
LOAD Data,
Year(Data) as Ano,
Macro1,
Micro1,
Micro2,
Micro3
FROM
[Teste_Correl.xlsx]
(ooxml, embedded labels, table is Macros);

Correl:
LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;

View solution in original post

Sergey_Shuklin · ‎2019-06-14

Hello!
Did you try to check variable length? Depend of data store type in your variables there are different ways to check this out. Can you provide a sample of variable value for better understanding content of it?

GLicks_DG · ‎2019-06-18

@Sergey_Shuklin Thank you, but I did not quite get what you meant by variable length. I did add some information to my original post, hopefully it can shed some light on my question.

Sergey_Shuklin · ‎2019-06-19

Hello!

Don't know how exactly expected output should looks like, but there is one of many solutions:

Load additional table with "dim" field with two values - 1 and 2.

After that you will be able to calculate all in one table.

In "Micro3" expression I used an IF-condition for checking count of pairs.

If you'd like to see same result via textboxes you should use Set Analysis and Aggr() function with Data aggregation.

If the number of columns "Micro" is dynamic you should use a CrossTable in the script and get a transposed table with "Micro" field. After that the same result can be achieved with Pivot Table chart where "Micro" field will be dragged as columns.

GLicks_DG · ‎2019-06-19