Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
Client is a small bank and they requested me to showcase how the deposit balance is growing over a last 5 years. I have calculated End of day balance at each customer level , since the bank had nearly 100,000 customers, my data has blown to 100,000*5*365= 182.5 Million records. In the end this made the application performance slow. Kindly advice me how to handle this issue.
You might not have transactions of every customer for everyday.
Suppose I opened my account on 1st Jan 2017 with balance 1000.
Then I added 500 on 15th Jan, 500 on 1st Feb, 500 on 15th Feb and 500 on 1st March.
Your Final fact table will have 5 rows for my data. (By your method it would have been 65)
*Last Valid to is today's date.
Now You have to Link Key with Calendar with an intermediate Table that contains all dates valid from and valid for a key.
Use interval match to make such table.
I have created the sample data in attached excel.
This should reduce your application size significantly.
Hi Guruprem,
I think need to agree the bank changing tasks so that you can reduce the number of records at load time at the expense of preliminary calculations and generalizations of source data. Not the fact that the bank needed to the calculation accuracy before each record.
Regards,
Andrey
Thanks for reply.
Do you mean I need to agree on reduce the number of Customers records or dates that comes on tho the App? It seems this is a possible solution.
Also let me know if there are any other ideas?
You might not have transactions of every customer for everyday.
Suppose I opened my account on 1st Jan 2017 with balance 1000.
Then I added 500 on 15th Jan, 500 on 1st Feb, 500 on 15th Feb and 500 on 1st March.
Your Final fact table will have 5 rows for my data. (By your method it would have been 65)
*Last Valid to is today's date.
Now You have to Link Key with Calendar with an intermediate Table that contains all dates valid from and valid for a key.
Use interval match to make such table.
I have created the sample data in attached excel.
This should reduce your application size significantly.
Fantastic new approach!
I really appreciate you for creating a very simple example for understanding the problem correctly. I will now work with your example and see if I can manage to load a less number of records.
Hi Guruprem,
I had a pretty similar issue than yours; I will show you how I handled it and I hope that would help you :
Here is what I have :
this is my 'customer table', each custmer (id) has a situation (A or B), in a certain date (you can have only one date and create a period in the script to use it in the intervalMatch later):
this is my 'Fact' table:
Now, the point is to track the sales of each client and see their state (if they were actif or blocked (A or B)
But if a client (ID 1) was actif like this :
10/01/2017 | 10/01/2017 | A |
11/01/2017 | 11/01/2017 | A |
12/01/2017 | 12/01/2017 | A |
13/01/2017 | 13/01/2017 | A |
14/01/2017 | 14/01/2017 | B |
I would rather have a table like this :
client | |period |situation
1 | 10/01/2017 13/01/2017 |A
1 | 14/01/2017 14/01/2017 |B
Which will significally reduce the size of my table
And then , I will match the sales dates of the fact table with these periods to see why my clients are not having any sales; (in fact they were blocked)
Hope the situation is rather clear, so you can alter my script to you need:
Here is the script:
Fact:
LOAD
id,
"date",
sales
FROM [lib://source/source.xlsx]
(ooxml, embedded labels, table is Feuil2);
Table:
LOAD
id,
debut,
fin,
reclamation,
situation
FROM [lib://source/source.xlsx]
(ooxml, embedded labels, table is Feuil1);
TempTable:
LOAD *,
If(id = Previous(id),
If(situation = Previous(situation), Peek('Flag'), RangeSum(Peek('Flag'), 1)), 1) as Flag
Resident Table
Order By id, debut;
load*,
DATE(MinDate)&' - '&date(MaxDate) as Period;
FinalTable:
LOAD id,
situation,
reclamation,
Flag,
Date(Min(debut)) as MinDate,
Date(Max(fin)) as MaxDate
Resident TempTable
Group By id, situation, reclamation,Flag;
DROP Tables Table, TempTable;
Bridge:
inner join IntervalMatch("date",id)
Load distinct
MinDate,
MaxDate,id Resident FinalTable;
The result :
Hope this helps,
Omar,
Dear Omar,
Thanks for showing the real example. I just studiedIntervalMatchthen seen your example and convienced that Interval Match is the solution to reduce the number of records.