Hello! Hope someone could help me with this problem.
I have a large dataset (~2mil records) that is updated periodically, and I need to keep track of the changes in this set. Whenever there's a new update, I'll receive the new data dump with ALL records, not just the updated ones. It looks like this:
Update | Period | Item | Value |
Update 1 | 2020-01 | A | 1000 |
| 2020-01 | B | 500 |
| 2020-01 | C | 4000 |
Update 2 | 2020-02 | A | 1000 |
| 2020-02 | B | 1000 |
| 2020-02 | C | 2000 |
Update 3 | 2020-03 | A | 1000 |
| 2020-03 | B | 1000 |
| 2020-03 | C | 3000 |
Update 4 | 2020-04 | A | 1000 |
| 2020-04 | B | 2500 |
| 2020-04 | C | 3000 |
| 2020-04 | D | 500 |
My question is, what is the best way to model the data to keep track of Value changes?
If I keep all historical records like the above example, I'll end up with ~8 millions records after just 4 updates.
But if I keep only the changes, or the latest record + changes if any, I cannot visualize the changes in a pivot table (the final requirement). Before() doesn't work, since the data is missing:
Looking forward to your ideas! If you need any clarifications, please feel free to ask.