Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am still reworking an old piece of code. In short, it does this:
- There is a target_value for every day, for each of the two shifts, for each of the areas
-> there are about 30-40 values every day
- That table is joined, via the date, to one Autogenerate_table with minute_intervals from 5:20 am to 10:30 pm
-> Then - here comes the big one - a keyfield is loaded that is made up of the date and any of these minute_intervals - so there are
a bit over 1.000 keyfields for one day. That blows up the table to millions and millions of records of course.
=> Is there any way of doing this more efficiently (without touching the key first, it is assumedly required on a minute_basis)?
Thanks a lot!
Best regards,
DataNibbler
Afaik, you can use two separate fields.
This works:
TableA:
load id,
date_pk,
minute_pk
from tableA.qvd (qvd);
TableB:
left join(TableA)
load date_column as date_pk,
minute_column as minute_pk,
target_value
from TableB.qvd (qvd);
eg:
TableA:
[id,date_pk,minute_pk
1,22,05
1,22,06
2,22,06
2,22,07];
TableB:
[date_column, minute_column, target_value
21,05,10
21,06,11
21,07,10.5
22,05,15
22,06,16
22,07,16
22,08,15];
Output would still be
[id,date_pk,minute_pk, target_value
1,22,05,15
1,22,06,16
2,22,06,16
2,22,07,16];
Hi DataNibbler,
I think the best would be to avoid such a key-field. For joining your target-table you didn't need these key only date is enough. Are there joins to other tables? Then it could be a solution to create a bigger fact-table per joining and/or mapping and afterwards delete this key-field and use instead two fields "date" and "time" in your data-model.
- Marcus
Hi Marcus,
I assume that it has been a requirement that we break down our target_value by minute so that we have all the possibilities - though we probably won't be able to calculate and display this every minute - so, yes, I do need that keyfield.
(those are the target_data which I then have to link to the current_data to show "where we are" production-wise)
Splitting such a thing in two saves a lot, I know, but I cannot use two separate fields as key without combining them, can I?
Hi,
Am not clear on why a combined Date&minute key won't work. Will it be possible for you to paste the code here?
Hi,
well, this is the critical question - it's probably me who is unclear about this.
AfaIk I cannot use two fields as key as long as they are separate, can I? This assumption is why I concatenate the date and the time (which looks like this)
>>> LOAD
..
num(Date_PK) & '_' & Interval_Min as Keyfield,
...
RESIDENT ...
; <<<
Of course, if I could leave those two fields separate and still use their combination(s) as key, that would save me a few million records. Would that work?
Afaik, you can use two separate fields.
This works:
TableA:
load id,
date_pk,
minute_pk
from tableA.qvd (qvd);
TableB:
left join(TableA)
load date_column as date_pk,
minute_column as minute_pk,
target_value
from TableB.qvd (qvd);
eg:
TableA:
[id,date_pk,minute_pk
1,22,05
1,22,06
2,22,06
2,22,07];
TableB:
[date_column, minute_column, target_value
21,05,10
21,06,11
21,07,10.5
22,05,15
22,06,16
22,07,16
22,08,15];
Output would still be
[id,date_pk,minute_pk, target_value
1,22,05,15
1,22,06,16
2,22,06,16
2,22,07,16];
Hi,
well, that would be cool. Just by splitting that one high-cardinality field into two, I will save some mio records, and hopefully some processing time.
Thanks a lot!