Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
dapostolopoylos
Creator III
Creator III

Create missing records for accounts per period

I have this requirement that i would like to share with you in case you can find me a solution:

I load accounts and ballances for three consequent months. Some accounts may be open in the first month but they could be closed in the second and third so they do not exist in the dataset of the last two months...

What i want is to create the missing records for every account in every period and for the missing periods fill the balance field with the value of the last period populated.

A small example:

Period 1

---------------

Acc, Period, Balance

a     ,201405,     10

b     ,201405,     20

c     ,201405,     15

Period 2

---------------

Acc, Period, Balance

a     ,201406,     10

b     ,201406,     30

d     ,201406,     40

Period 3

---------------

Acc, Period, Balance

a     ,201407,     20

b     ,201407,     30

e     ,201407,     50

What  i want to create is the following table:

All Periods

------------------

Acc, Period, Balance, Activity_Flag

a     ,201405,     10,     Y

b     ,201405,     20,     Y

c     ,201405,     15,     Y

a     ,201406,     10,     Y

b     ,201406,     30,     Y

c     ,201406,     15,     N

d     ,201406,     40,     Y

a     ,201407,     20,     Y

b     ,201407,     30,     Y

c     ,201407,     15,     N

d     ,201407,     40,     N

e     ,201407,     50,     Y

I have already read this article but i cannot adapt it in order to work for my case...

Any ideas???

Father/Husband/BI Developer
1 Solution

Accepted Solutions
SergeyMak
Partner Ambassador
Partner Ambassador

Hi Demetrios,

Solution below

tmp:
LOAD *, 'Y' AS FLAG  INLINE [
Acc, Period, Balance
a    ,201405,    10
b    ,201405,    20
c    ,201405,    15
a    ,201406,    10
b    ,201406,    30
d    ,201406,    40
a    ,201407,    20
b    ,201407,    30
e    ,201407,    50
];





LEFT JOIN (tmp)
LOAD DISTINCT Acc,
 
Max(Period) AS MaxPeriod
Resident tmp
GROUP BY Acc;

NoConcatenate
tmp2:
LOAD Acc,
Period,
Balance
Resident tmp
Where Period=MaxPeriod;


[tmpMaxPer]:
LOAD Max(Period) AS LasPeriod
Resident tmp;

LET vLastPer=Peek('LasPeriod');

DROP Table tmpMaxPer;


Concatenate(tmp)
LOAD Acc,
Period+IterNo() AS Period,
Balance,
'N'
AS FLAG
Resident tmp2
WHILE Period+IterNO()<=$(vLastPer);
;
DROP Tables tmp2;

DROP Field MaxPeriod;

PFA an example.



Regards,
Sergey

View solution in original post

10 Replies
datanibbler
Champion
Champion

Hi Demetrios,

you should have a reference_table - inline would suffice, I guess, if the list of accounts you have doesn't change too often - with the list of accounts, and then you can, after sorting the data_table by (first account and then period) check, based on the current >> Period <<, whether you have (a record for) each and all of the accounts. If the record for one does not exist, then you can create a hard-coded new record (within your LOAD statement), using the balance from the previous record (see the PREVIOUS() function for that) and an 'N' for the >> activity_flag <<.

All that will be quite difficult in just one LOAD statement, so I'd suggest you try in several steps at first - first load your data and see if you can find out which records are missing.

HTH

Best regards,

DataNibbler

dapostolopoylos
Creator III
Creator III
Author

Hello, DataNibbler!

Can you please elaborate some more on your solution?

Thank you very much in adavance!

Father/Husband/BI Developer
SergeyMak
Partner Ambassador
Partner Ambassador

Hi Demetrios,

Solution below

tmp:
LOAD *, 'Y' AS FLAG  INLINE [
Acc, Period, Balance
a    ,201405,    10
b    ,201405,    20
c    ,201405,    15
a    ,201406,    10
b    ,201406,    30
d    ,201406,    40
a    ,201407,    20
b    ,201407,    30
e    ,201407,    50
];





LEFT JOIN (tmp)
LOAD DISTINCT Acc,
 
Max(Period) AS MaxPeriod
Resident tmp
GROUP BY Acc;

NoConcatenate
tmp2:
LOAD Acc,
Period,
Balance
Resident tmp
Where Period=MaxPeriod;


[tmpMaxPer]:
LOAD Max(Period) AS LasPeriod
Resident tmp;

LET vLastPer=Peek('LasPeriod');

DROP Table tmpMaxPer;


Concatenate(tmp)
LOAD Acc,
Period+IterNo() AS Period,
Balance,
'N'
AS FLAG
Resident tmp2
WHILE Period+IterNO()<=$(vLastPer);
;
DROP Tables tmp2;

DROP Field MaxPeriod;

PFA an example.



Regards,
Sergey
datanibbler
Champion
Champion

Hi Demetrios,

[a bit too late, but I have already written this, so here it is]

it is a bit difficult since you have a different list of accounts for every period - only 3 in 05, 4 in 06 and 5 in 07.

=> You need to have a reference_table with the accounts (letters) and the valid_from period. Thus you'd have

- acc. a - valid_from 201405

- acc. b - valid_from 201405

- acc. c - valid_from 201405

- acc. d - valid from 201406

- acc. e - valid from 201407

=> So you can calculate like >> [current_period] - [valid_from_period] + 1 << to know how many times you need to consider each account - that would be 3 for acc. a, b, c; 2 for acc. d; 1 for acc. e

=> Adding those results together would tell you there should be 12 rows in your data_table.

=> You could build a loop going through the data_table 12 times and checking whether there is a record for

- 201405, acc. a

- 201406, acc. a

- 201407, acc. a

-- 201406, acc. d

- 201407, acc. d

and so on.

If one period-acct-combination cannot be found, you could bild an IF_THEN_ENDIF block with a RESIDENT LOAD, appending a hard-coded record for just that combination.

HTH

Best regards,

DataNibbler

dapostolopoylos
Creator III
Creator III
Author

You are great, this is just what i needed!!!

Father/Husband/BI Developer
Anonymous
Not applicable

Hi Sergey

I see your solution worked in this case, however I am having issues implementing this scenario with my data.

My data is structured as follows:

sku_soh_date, sku_id, soh       ---- [date recorded, the unique SKU id, stock on hand recorded]

Example entries:

2014-10-01, 1, 3

2014-10-03, 1, 7

2014-10-05, 1, 5

And I would want it to load as:

2014-10-01, 1, 3

2014-10-02, 1, 3

2014-10-03, 1, 7

2014-10-04, 1, 7

2014-10-05, 1, 5

I think the problem may lie with my date - since it isn't in the same format as the example above?

Please find my code below. I'm hoping you can help with this!

Thanks very much in advance

tmp:

LOAD *, 'Y' AS FLAG  ;

SQL SELECT GOES HERE [date is loaded in format - YYYY-MM-DD]

LEFT JOIN (tmp)

LOAD DISTINCT sku_id,

  Max(date(sku_soh_date)) AS MaxSkuSohDate

Resident tmp

GROUP BY sku_id;

NoConcatenate

tmp2:

LOAD sku_id,

sku_soh_date,

soh

Resident tmp

Where sku_soh_date=MaxSkuSohDate;

[tmpMaxSkuSohDate]:

LOAD Max(date(sku_soh_date)) AS LasPeriod

Resident tmp;

LET vLastPer=date(Peek('LasPeriod')+10);

DROP Table tmpMaxSkuSohDate;

Concatenate(tmp)

LOAD sku_id,

date(sku_soh_date+IterNo()) AS sku_soh_date,

soh,

'N' AS FLAG

Resident tmp2

WHILE date(sku_soh_date+IterNo())<=date($(vLastPer));

;

DROP Tables tmp2;

DROP Field MaxSkuSohDate;

SergeyMak
Partner Ambassador
Partner Ambassador

Hi Shaun,

Please share your app. It would be faster

Regards,

Sergey

Regards,
Sergey
Anonymous
Not applicable

Hi Sergey

Great. Please find attached.

The inline load is identical to what is loaded from the DB using SQL.

SergeyMak
Partner Ambassador
Partner Ambassador

The solutions is below

tmp:
LOAD *, 'Y' AS FLAG INLINE [
sku_soh_date, sku_id, soh
2014-10-01, 10, 1
2014-10-01, 11, 2
2014-10-02, 10, 3
2014-10-03, 11, 1
2014-10-05, 10, 6
2014-10-05, 11, 6

];

LEFT JOIN (tmp)
LOAD DISTINCT sku_id,
  
Max(Num(sku_soh_date)) AS MaxSkuSohDate
Resident tmp
GROUP BY sku_id;

NoConcatenate
tmp2:
LOAD sku_id,
sku_soh_date,
soh
Resident tmp
Where Num(sku_soh_date)=MaxSkuSohDate;


[tmpMaxSkuSohDate]:
LOAD Max(Num(sku_soh_date)) AS LasPeriod
Resident tmp;

LET vLastPer=Peek('LasPeriod')+10;

DROP Table tmpMaxSkuSohDate;


//Concatenate(tmp)

LOAD sku_id,
date(Num(sku_soh_date)+IterNo()) AS sku_soh_date,
soh,
'N'
AS FLAG
Resident tmp2
WHILE Num(sku_soh_date)+IterNo()<=$(vLastPer);
;
DROP Tables tmp2;

DROP Field MaxSkuSohDate;
 


Regards,
Sergey