Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everyone,
I want to calculate the correlation of two variables but I know that there exists a timeshift/delay between them.
An example data is the following:
DAY | VAR1 | VAR2 |
01/01/2002 | 1 | 2 |
02/01/2002 | 2 | 3 |
03/01/2002 | 3 | 4 |
04/01/2002 | 4 | 5 |
05/01/2002 | 5 | 4 |
06/01/2002 | 4 | 3 |
07/01/2002 | 3 | 2 |
08/01/2002 | 2 | 1 |
09/01/2002 | 1 | 2 |
10/01/2002 | 2 | 3 |
11/01/2002 | 3 | 4 |
12/01/2002 | 4 | 5 |
13/01/2002 | 5 | 4 |
14/01/2002 | 4 | 3 |
15/01/2002 | 3 | 2 |
16/01/2002 | 2 | 1 |
17/01/2002 | 1 | 2 |
18/01/2002 | 2 | 3 |
19/01/2002 | 3 | 4 |
20/01/2002 | 4 | 5 |
21/01/2002 | 5 | 4 |
22/01/2002 | 4 | 3 |
23/01/2002 | 3 | 2 |
24/01/2002 | 2 | 1 |
25/01/2002 | 1 | 2 |
If I use the traditional correlation, the outcome is 0.67 but i know that if i shift the first var by one day, the correlation will be 1.
Is there a way to calculate the "real" correlation (meaning 1) without knowing the time shift?
Thanks in advance,
Panagiotis
Yeah i would prefer python over qlik if you dont have any UI calcs. Also in qlik you dont need to create
VAR1_1 = Above(VAR1, 1)
VAR1_2 = Above(VAR1, 2)
you write a little for loop with n, nbeing the number lagged variables and it creats as many lagged fields as you want.
The only drawback of python processing is that we are consuming the Qlik analytics through an html mashup and the users accessing it will not be from our team.
So the actual Cross Correlation is something we can test in python but the final users will only have access to the approximation using specific time shifts.
Multiple time shift variables using a loop with a high number of n will improve the approximation error. And also we might try to select smaller time intervals for the shifts, i.e. hours instead of days.