Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
GLicks_DG
Contributor III
Contributor III

Correlation Sample Size

Hi all!

Currently i'm working a lot with correlations (correl() function). Though I'm pretty much correlating tons of data to see what pops out. Because of the large amount of variables i am correlating, I'm not quite sure about how many samples (pairs) my correlations are actually using. 

For example, some of my variables may use hundreds of samples and other might only have as few as 2 or 3 samples. In order to somehow validate my results, I need a way to make sure i have a large enough sample for that particular correl.

So my question is: Is there a way of getting the number of pair-points the correl function is using? Or perhaps some way to only keep a correlation value if i have a given sample size and discard the rest?

@EDIT

If I had the following dataset and I then calculated the correlation coefficient between 'Macro1' and the 3 'Micro' columns, I would like to know how many 'pair-points' did the correlation used.

DataMacro1Micro1Micro2Micro3
1/1/20181000.00100.00-25.00
2/1/20181010.0099.00--
3/1/20181005.00102.0010.00-
4/1/20181010.00105.0012.0030.00
5/1/2018995.0098.009.0029.00
6/1/2018---30.00
7/1/20181005.00101.0011.0031.00
8/1/2018995.00100.0010.0028.00
9/1/20181010.00102.00-32.00
10/1/20181000.0098.0010.00-

For instance:

Correl(Macro1,Micro1) would use 9 'pair-points' (all but 6/1/2018 row);

Correl(Macro1,Micro2) would use only 6 'pair-points' (3/1, 4/1, 5/1, 7/1, 8/1 and 10/1); 

Correl(Macro1,Micro3) would use 6 'pair-points' also (1/1, 4/1, 5/1, 7/1, 8/1, and 9/1);

 

Is there anyway to get this number of 'pair-points' automatically? Or maybe create some clause to only compute the correlation if there is at least a certain number of 'pair-points'?

Thanks in advance!

1 Solution

Accepted Solutions
GLicks_DG
Contributor III
Contributor III
Author

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

MM:
LOAD Data,
Year(Data) as Ano,
Macro1,
Micro1,
Micro2,
Micro3
FROM
[Teste_Correl.xlsx]
(ooxml, embedded labels, table is Macros);

Correl:
LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;

 

View solution in original post

5 Replies
Sergey_Shuklin
Specialist
Specialist

Hello!
Did you try to check variable length? Depend of data store type in your variables there are different ways to check this out. Can you provide a sample of variable value for better understanding content of it?

GLicks_DG
Contributor III
Contributor III
Author

@Sergey_Shuklin Thank you, but I did not quite get what you meant by variable length. I did add some information to my original post, hopefully it can shed some light on my question.

Sergey_Shuklin
Specialist
Specialist

Hello!

Don't know how exactly expected output should looks like, but there is one of many solutions:

Load additional table with "dim" field with two values - 1 and 2.

After that you will be able to calculate all in one table.

correl_calc_pic1.png

In "Micro3" expression I used an IF-condition for checking count of pairs.

If you'd like to see same result via textboxes you should use Set Analysis and Aggr() function with Data aggregation.

If the number of columns "Micro" is dynamic you should use a CrossTable in the script and get a transposed table with "Micro" field. After that the same result can be achieved with Pivot Table chart where "Micro" field will be dragged as columns.

GLicks_DG
Contributor III
Contributor III
Author

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

MM:
LOAD Data,
Year(Data) as Ano,
Macro1,
Micro1,
Micro2,
Micro3
FROM
[Teste_Correl.xlsx]
(ooxml, embedded labels, table is Macros);

Correl:
LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;
LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;

 

Sergey_Shuklin
Specialist
Specialist

Hello!

Glad you found a solution. Looks succinctly!