Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Qlik Community
- :
- All Forums
- :
- QlikView App Dev
- :
- Re: Correlation Sample Size

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

GLicks_DG

Contributor III

2019-06-12
04:43 PM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Correlation Sample Size

Hi all!

Currently i'm working a lot with correlations (correl() function). Though I'm pretty much correlating tons of data to see what pops out. Because of the large amount of variables i am correlating, I'm not quite sure about how many samples (pairs) my correlations are actually using.

For example, some of my variables may use hundreds of samples and other might only have as few as 2 or 3 samples. In order to somehow validate my results, I need a way to make sure i have a large enough sample for that particular correl.

So my question is: Is there a way of getting the number of pair-points the correl function is using? Or perhaps some way to only keep a correlation value if i have a given sample size and discard the rest?

@EDIT

If I had the following dataset and I then calculated the correlation coefficient between 'Macro1' and the 3 'Micro' columns, I would like to know how many 'pair-points' did the correlation used.

Data | Macro1 | Micro1 | Micro2 | Micro3 |

1/1/2018 | 1000.00 | 100.00 | - | 25.00 |

2/1/2018 | 1010.00 | 99.00 | - | - |

3/1/2018 | 1005.00 | 102.00 | 10.00 | - |

4/1/2018 | 1010.00 | 105.00 | 12.00 | 30.00 |

5/1/2018 | 995.00 | 98.00 | 9.00 | 29.00 |

6/1/2018 | - | - | - | 30.00 |

7/1/2018 | 1005.00 | 101.00 | 11.00 | 31.00 |

8/1/2018 | 995.00 | 100.00 | 10.00 | 28.00 |

9/1/2018 | 1010.00 | 102.00 | - | 32.00 |

10/1/2018 | 1000.00 | 98.00 | 10.00 | - |

For instance:

Correl(Macro1,Micro1) would use 9 'pair-points' (all but 6/1/2018 row);

Correl(Macro1,Micro2) would use only 6 'pair-points' (3/1, 4/1, 5/1, 7/1, 8/1 and 10/1);

Correl(Macro1,Micro3) would use 6 'pair-points' also (1/1, 4/1, 5/1, 7/1, 8/1, and 9/1);

Is there anyway to get this number of 'pair-points' automatically? Or maybe create some clause to only compute the correlation if there is at least a certain number of 'pair-points'?

Thanks in advance!

1,391 Views

1 Solution

Accepted Solutions

GLicks_DG

Contributor III

2019-06-19
08:38 AM

Author

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

*MM:**LOAD Data,**Year(Data) as Ano, **Macro1, **Micro1, **Micro2, **Micro3**FROM**[Teste_Correl.xlsx]**(ooxml, embedded labels, table is Macros);*

*Correl:**LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;**LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;**LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;*

5 Replies

Sergey_Shuklin

Specialist

2019-06-14
01:59 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello!

Did you try to check variable length? Depend of data store type in your variables there are different ways to check this out. Can you provide a sample of variable value for better understanding content of it?

1,365 Views

GLicks_DG

Contributor III

2019-06-18
11:10 AM

Author

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Sergey_Shuklin Thank you, but I did not quite get what you meant by variable length. I did add some information to my original post, hopefully it can shed some light on my question.

1,351 Views

Sergey_Shuklin

Specialist

2019-06-19
01:05 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello!

Don't know how exactly expected output should looks like, but there is one of many solutions:

Load additional table with "dim" field with two values - 1 and 2.

After that you will be able to calculate all in one table.

In "Micro3" expression I used an IF-condition for checking count of pairs.

If you'd like to see same result via textboxes you should use Set Analysis and Aggr() function with Data aggregation.

If the number of columns "Micro" is dynamic you should use a CrossTable in the script and get a transposed table with "Micro" field. After that the same result can be achieved with Pivot Table chart where "Micro" field will be dragged as columns.

1,319 Views

GLicks_DG

Contributor III

2019-06-19
08:38 AM

Author

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Sergey_Shuklin Thanks again for the reply!

I kept on testing and found a easy enough way to calculate the sample size of my correlations. All it took was calculating the difference between fields and then counting the numeric values this calculation got. My understanding of it was that the calculation only returned a numeric value when both columns had numeric values to begin with, so if i count those values I should get the exact number of pair-points I have for that particular correlation.

The code and sample table is as it follows:

*MM:**LOAD Data,**Year(Data) as Ano, **Macro1, **Micro1, **Micro2, **Micro3**FROM**[Teste_Correl.xlsx]**(ooxml, embedded labels, table is Macros);*

*Correl:**LOAD 'Macro1' as Macro, 'Micro1' as Micro, NumericCount(Macro1-Micro1) as Samples, Correl(Macro1,Micro1) as Correl Resident MM;**LOAD 'Macro1' as Macro, 'Micro2' as Micro, NumericCount(Macro1-Micro2) as Samples, Correl(Macro1,Micro2) as Correl Resident MM;**LOAD 'Macro1' as Macro, 'Micro3' as Micro, NumericCount(Macro1-Micro3) as Samples, Correl(Macro1,Micro3) as Correl Resident MM;*

Sergey_Shuklin

Specialist

2019-06-20
12:25 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello!

Glad you found a solution. Looks succinctly!

1,298 Views