Announcements
Product Release Webinar: Qlik Insider airing December 6! REGISTER TODAY!
cancel
Showing results for
Did you mean:
Contributor III

## Correlation Sample Size

Hi all!

Currently i'm working a lot with correlations (correl() function). Though I'm pretty much correlating tons of data to see what pops out. Because of the large amount of variables i am correlating, I'm not quite sure about how many samples (pairs) my correlations are actually using.

For example, some of my variables may use hundreds of samples and other might only have as few as 2 or 3 samples. In order to somehow validate my results, I need a way to make sure i have a large enough sample for that particular correl.

So my question is: Is there a way of getting the number of pair-points the correl function is using? Or perhaps some way to only keep a correlation value if i have a given sample size and discard the rest?

@EDIT

If I had the following dataset and I then calculated the correlation coefficient between 'Macro1' and the 3 'Micro' columns, I would like to know how many 'pair-points' did the correlation used.

 Data Macro1 Micro1 Micro2 Micro3 1/1/2018 1000.00 100.00 - 25.00 2/1/2018 1010.00 99.00 - - 3/1/2018 1005.00 102.00 10.00 - 4/1/2018 1010.00 105.00 12.00 30.00 5/1/2018 995.00 98.00 9.00 29.00 6/1/2018 - - - 30.00 7/1/2018 1005.00 101.00 11.00 31.00 8/1/2018 995.00 100.00 10.00 28.00 9/1/2018 1010.00 102.00 - 32.00 10/1/2018 1000.00 98.00 10.00 -

For instance:

Correl(Macro1,Micro1) would use 9 'pair-points' (all but 6/1/2018 row);

Correl(Macro1,Micro2) would use only 6 'pair-points' (3/1, 4/1, 5/1, 7/1, 8/1 and 10/1);

Correl(Macro1,Micro3) would use 6 'pair-points' also (1/1, 4/1, 5/1, 7/1, 8/1, and 9/1);

Is there anyway to get this number of 'pair-points' automatically? Or maybe create some clause to only compute the correlation if there is at least a certain number of 'pair-points'?

1 Solution

Accepted Solutions
Contributor III
Author

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

MM:
Year(Data) as Ano,
Macro1,
Micro1,
Micro2,
Micro3
FROM
[Teste_Correl.xlsx]
(ooxml, embedded labels, table is Macros);

Correl:
LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;

5 Replies
Specialist

Hello!
Did you try to check variable length? Depend of data store type in your variables there are different ways to check this out. Can you provide a sample of variable value for better understanding content of it?

Contributor III
Author

@Sergey_Shuklin Thank you, but I did not quite get what you meant by variable length. I did add some information to my original post, hopefully it can shed some light on my question.

Specialist

Hello!

Don't know how exactly expected output should looks like, but there is one of many solutions:

Load additional table with "dim" field with two values - 1 and 2.

After that you will be able to calculate all in one table.

In "Micro3" expression I used an IF-condition for checking count of pairs.

If you'd like to see same result via textboxes you should use Set Analysis and Aggr() function with Data aggregation.

If the number of columns "Micro" is dynamic you should use a CrossTable in the script and get a transposed table with "Micro" field. After that the same result can be achieved with Pivot Table chart where "Micro" field will be dragged as columns.

Contributor III
Author

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

MM:
Year(Data) as Ano,
Macro1,
Micro1,
Micro2,
Micro3
FROM
[Teste_Correl.xlsx]
(ooxml, embedded labels, table is Macros);

Correl:
LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;

Specialist

Hello!

Glad you found a solution. Looks succinctly!

Community Browser