Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have this requirement that i would like to share with you in case you can find me a solution:
I load accounts and ballances for three consequent months. Some accounts may be open in the first month but they could be closed in the second and third so they do not exist in the dataset of the last two months...
What i want is to create the missing records for every account in every period and for the missing periods fill the balance field with the value of the last period populated.
A small example:
Period 1
---------------
Acc, Period, Balance
a ,201405, 10
b ,201405, 20
c ,201405, 15
Period 2
---------------
Acc, Period, Balance
a ,201406, 10
b ,201406, 30
d ,201406, 40
Period 3
---------------
Acc, Period, Balance
a ,201407, 20
b ,201407, 30
e ,201407, 50
What i want to create is the following table:
All Periods
------------------
Acc, Period, Balance, Activity_Flag
a ,201405, 10, Y
b ,201405, 20, Y
c ,201405, 15, Y
a ,201406, 10, Y
b ,201406, 30, Y
c ,201406, 15, N
d ,201406, 40, Y
a ,201407, 20, Y
b ,201407, 30, Y
c ,201407, 15, N
d ,201407, 40, N
e ,201407, 50, Y
I have already read this article but i cannot adapt it in order to work for my case...
Any ideas???
Hi Demetrios,
Solution below
tmp:
LOAD *, 'Y' AS FLAG INLINE [
Acc, Period, Balance
a ,201405, 10
b ,201405, 20
c ,201405, 15
a ,201406, 10
b ,201406, 30
d ,201406, 40
a ,201407, 20
b ,201407, 30
e ,201407, 50
];
LEFT JOIN (tmp)
LOAD DISTINCT Acc,
Max(Period) AS MaxPeriod
Resident tmp
GROUP BY Acc;
NoConcatenate
tmp2:
LOAD Acc,
Period,
Balance
Resident tmp
Where Period=MaxPeriod;
[tmpMaxPer]:
LOAD Max(Period) AS LasPeriod
Resident tmp;
LET vLastPer=Peek('LasPeriod');
DROP Table tmpMaxPer;
Concatenate(tmp)
LOAD Acc,
Period+IterNo() AS Period,
Balance,
'N' AS FLAG
Resident tmp2
WHILE Period+IterNO()<=$(vLastPer);
;
DROP Tables tmp2;
DROP Field MaxPeriod;
PFA an example.
Hi Demetrios,
you should have a reference_table - inline would suffice, I guess, if the list of accounts you have doesn't change too often - with the list of accounts, and then you can, after sorting the data_table by (first account and then period) check, based on the current >> Period <<, whether you have (a record for) each and all of the accounts. If the record for one does not exist, then you can create a hard-coded new record (within your LOAD statement), using the balance from the previous record (see the PREVIOUS() function for that) and an 'N' for the >> activity_flag <<.
All that will be quite difficult in just one LOAD statement, so I'd suggest you try in several steps at first - first load your data and see if you can find out which records are missing.
HTH
Best regards,
DataNibbler
Hello, DataNibbler!
Can you please elaborate some more on your solution?
Thank you very much in adavance!
Hi Demetrios,
Solution below
tmp:
LOAD *, 'Y' AS FLAG INLINE [
Acc, Period, Balance
a ,201405, 10
b ,201405, 20
c ,201405, 15
a ,201406, 10
b ,201406, 30
d ,201406, 40
a ,201407, 20
b ,201407, 30
e ,201407, 50
];
LEFT JOIN (tmp)
LOAD DISTINCT Acc,
Max(Period) AS MaxPeriod
Resident tmp
GROUP BY Acc;
NoConcatenate
tmp2:
LOAD Acc,
Period,
Balance
Resident tmp
Where Period=MaxPeriod;
[tmpMaxPer]:
LOAD Max(Period) AS LasPeriod
Resident tmp;
LET vLastPer=Peek('LasPeriod');
DROP Table tmpMaxPer;
Concatenate(tmp)
LOAD Acc,
Period+IterNo() AS Period,
Balance,
'N' AS FLAG
Resident tmp2
WHILE Period+IterNO()<=$(vLastPer);
;
DROP Tables tmp2;
DROP Field MaxPeriod;
PFA an example.
Hi Demetrios,
[a bit too late, but I have already written this, so here it is]
it is a bit difficult since you have a different list of accounts for every period - only 3 in 05, 4 in 06 and 5 in 07.
=> You need to have a reference_table with the accounts (letters) and the valid_from period. Thus you'd have
- acc. a - valid_from 201405
- acc. b - valid_from 201405
- acc. c - valid_from 201405
- acc. d - valid from 201406
- acc. e - valid from 201407
=> So you can calculate like >> [current_period] - [valid_from_period] + 1 << to know how many times you need to consider each account - that would be 3 for acc. a, b, c; 2 for acc. d; 1 for acc. e
=> Adding those results together would tell you there should be 12 rows in your data_table.
=> You could build a loop going through the data_table 12 times and checking whether there is a record for
- 201405, acc. a
- 201406, acc. a
- 201407, acc. a
-- 201406, acc. d
- 201407, acc. d
and so on.
If one period-acct-combination cannot be found, you could bild an IF_THEN_ENDIF block with a RESIDENT LOAD, appending a hard-coded record for just that combination.
HTH
Best regards,
DataNibbler
You are great, this is just what i needed!!!
Hi Sergey
I see your solution worked in this case, however I am having issues implementing this scenario with my data.
My data is structured as follows:
sku_soh_date, sku_id, soh ---- [date recorded, the unique SKU id, stock on hand recorded]
Example entries:
2014-10-01, 1, 3
2014-10-03, 1, 7
2014-10-05, 1, 5
And I would want it to load as:
2014-10-01, 1, 3
2014-10-02, 1, 3
2014-10-03, 1, 7
2014-10-04, 1, 7
2014-10-05, 1, 5
I think the problem may lie with my date - since it isn't in the same format as the example above?
Please find my code below. I'm hoping you can help with this!
Thanks very much in advance
tmp:
LOAD *, 'Y' AS FLAG ;
SQL SELECT GOES HERE [date is loaded in format - YYYY-MM-DD]
LEFT JOIN (tmp)
LOAD DISTINCT sku_id,
Max(date(sku_soh_date)) AS MaxSkuSohDate
Resident tmp
GROUP BY sku_id;
NoConcatenate
tmp2:
LOAD sku_id,
sku_soh_date,
soh
Resident tmp
Where sku_soh_date=MaxSkuSohDate;
[tmpMaxSkuSohDate]:
LOAD Max(date(sku_soh_date)) AS LasPeriod
Resident tmp;
LET vLastPer=date(Peek('LasPeriod')+10);
DROP Table tmpMaxSkuSohDate;
Concatenate(tmp)
LOAD sku_id,
date(sku_soh_date+IterNo()) AS sku_soh_date,
soh,
'N' AS FLAG
Resident tmp2
WHILE date(sku_soh_date+IterNo())<=date($(vLastPer));
;
DROP Tables tmp2;
DROP Field MaxSkuSohDate;
Hi Shaun,
Please share your app. It would be faster
Regards,
Sergey
Hi Sergey
Great. Please find attached.
The inline load is identical to what is loaded from the DB using SQL.
The solutions is below
tmp:
LOAD *, 'Y' AS FLAG INLINE [
sku_soh_date, sku_id, soh
2014-10-01, 10, 1
2014-10-01, 11, 2
2014-10-02, 10, 3
2014-10-03, 11, 1
2014-10-05, 10, 6
2014-10-05, 11, 6
];
LEFT JOIN (tmp)
LOAD DISTINCT sku_id,
Max(Num(sku_soh_date)) AS MaxSkuSohDate
Resident tmp
GROUP BY sku_id;
NoConcatenate
tmp2:
LOAD sku_id,
sku_soh_date,
soh
Resident tmp
Where Num(sku_soh_date)=MaxSkuSohDate;
[tmpMaxSkuSohDate]:
LOAD Max(Num(sku_soh_date)) AS LasPeriod
Resident tmp;
LET vLastPer=Peek('LasPeriod')+10;
DROP Table tmpMaxSkuSohDate;
//Concatenate(tmp)
LOAD sku_id,
date(Num(sku_soh_date)+IterNo()) AS sku_soh_date,
soh,
'N' AS FLAG
Resident tmp2
WHILE Num(sku_soh_date)+IterNo()<=$(vLastPer);
;
DROP Tables tmp2;
DROP Field MaxSkuSohDate;