Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
hello,
i have a huge table (3 billions rows) with the cost of product in each store per day.
the cost doesnt change frequently
I want to reduce the number of rows os this table using 2 columns of date interval (date_from and date_to) instead one row per day
original table:
date | store | material | cost |
01/01/2018 | a | 123 | 10 |
02/01/2018 | a | 123 | 10 |
03/01/2018 | a | 123 | 100 |
04/01/2018 | a | 123 | 10 |
05/01/2018 | a | 123 | 200 |
06/01/2018 | a | 123 | 200 |
result table:
store | material | cost | date_from | date_to |
a | 123 | 10 | 01/01/2018 | 02/01/2018 |
a | 123 | 100 | 03/01/2018 | 03/01/2018 |
a | 123 | 10 | 04/01/2018 | 04/01/2018 |
a | 123 | 200 | 05/01/2018 | 06/01/2018 |
i tried to use group by with min(date) and max(date) but it didnt work (see the case of cost 10 in the example above, I got from 01/01 from 04/01)
Im also trying something with one incremental auxiliar column using previous but without success
date | store | material | cost | aux |
01/01/2018 | a | 123 | 10 | 1 |
02/01/2018 | a | 123 | 10 | 1 |
03/01/2018 | a | 123 | 100 | 2 |
04/01/2018 | a | 123 | 10 | 3 |
05/01/2018 | a | 123 | 200 | 4 |
06/01/2018 | a | 123 | 200 | 4 |
does anyone can help me?
thanks in advance
Try this:
table:
Load date(date#(date,'DD/MM/YYYY')) as date, store, material, cost Inline [
date, store, material, cost
date, store, material, cost
01/01/2018, a, 123, 10
02/01/2018, a, 123, 10
03/01/2018, a, 123, 100
04/01/2018, a, 123, 10
05/01/2018, a, 123, 200
06/01/2018, a, 123, 200
07/01/2018, a, 123, 10
08/01/2018, a, 123, 10
09/01/2018, a, 123, 10
];
NoConcatenate
TempTable:
LOAD *,
If(material = Previous(material),
If(cost = Previous(cost), Peek('Flag'), RangeSum(Peek('Flag'), 1)), 1) as Flag
Resident table
Order By store, material;
Drop Table table;
finalTable:
NoConcatenate
LOAD store,
material,
cost,
Flag,
Date(Min(date)) as MinDate,
Date(Max(date)) as MaxDate
Resident TempTable
Group By store, material, cost,Flag;
drop Field Flag;
drop Table TempTable;
Result:
Try this,
Data:
Load *,If(Previous(cost) = cost,1,0) as flag inline [
date, store, material, cost
01/01/2018, a, 123, 10
02/01/2018, a, 123, 10
03/01/2018, a, 123, 100
04/01/2018, a, 123, 10
05/01/2018, a, 123, 200
06/01/2018, a, 123, 200
];
Data1:
Load *,if(Previous(flag)=1 and flag=0,1,flag) as flag1
Resident Data order by date desc;
Drop table Data;
Load store as s, material as m, cost as c,Date(min(date)) as startdate, Date(max(date)) as enddate
Resident Data1
group by store,material,cost,flag1;
Regards,
Kaushik Solanki
thanks for the answer
but if you have another cost 10 after 200, it wont work
date, store, material, cost
01/01/2018, a, 123, 10
02/01/2018, a, 123, 10
03/01/2018, a, 123, 100
04/01/2018, a, 123, 10
05/01/2018, a, 123, 200
06/01/2018, a, 123, 200
07/01/2018, a, 123, 10
08/01/2018, a, 123, 10
09/01/2018, a, 123, 10
];
im trying something with autonumber
Try this:
table:
Load date(date#(date,'DD/MM/YYYY')) as date, store, material, cost Inline [
date, store, material, cost
date, store, material, cost
01/01/2018, a, 123, 10
02/01/2018, a, 123, 10
03/01/2018, a, 123, 100
04/01/2018, a, 123, 10
05/01/2018, a, 123, 200
06/01/2018, a, 123, 200
07/01/2018, a, 123, 10
08/01/2018, a, 123, 10
09/01/2018, a, 123, 10
];
NoConcatenate
TempTable:
LOAD *,
If(material = Previous(material),
If(cost = Previous(cost), Peek('Flag'), RangeSum(Peek('Flag'), 1)), 1) as Flag
Resident table
Order By store, material;
Drop Table table;
finalTable:
NoConcatenate
LOAD store,
material,
cost,
Flag,
Date(Min(date)) as MinDate,
Date(Max(date)) as MaxDate
Resident TempTable
Group By store, material, cost,Flag;
drop Field Flag;
drop Table TempTable;
Result:
life saver!
thanks a lot