Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

High-cardinality keyfield - any way to improve?

Hi,

I am still reworking an old piece of code. In short, it does this:

- There is a target_value for every day, for each of the two shifts, for each of the areas
   -> there are about 30-40 values every day

- That table is joined, via the date, to one Autogenerate_table with minute_intervals from 5:20 am to 10:30 pm

-> Then - here comes the big one - a keyfield is loaded that is made up of the date and any of these minute_intervals - so there are

    a bit over 1.000 keyfields for one day. That blows up the table to millions and millions of records of course.

=> Is there any way of doing this more efficiently (without touching the key first, it is assumedly required on a minute_basis)?

Thanks a lot!

Best regards,

DataNibbler

1 Solution

Accepted Solutions
Not applicable

Afaik, you can use two separate fields.

This works:

TableA:

load id,

       date_pk,

       minute_pk

from tableA.qvd (qvd);

TableB:

left join(TableA)

load date_column as date_pk,

       minute_column as minute_pk,

       target_value

from TableB.qvd (qvd);

eg:

TableA:

[id,date_pk,minute_pk

1,22,05

1,22,06

2,22,06

2,22,07];

TableB:

[date_column, minute_column, target_value

21,05,10

21,06,11

21,07,10.5

22,05,15

22,06,16

22,07,16

22,08,15];

Output would still be

[id,date_pk,minute_pk, target_value

1,22,05,15

1,22,06,16

2,22,06,16

2,22,07,16];

View solution in original post

6 Replies
marcus_sommer

Hi DataNibbler,

I think the best would be to avoid such a key-field. For joining your target-table you didn't need these key only date is enough. Are there joins to other tables? Then it could be a solution to create a bigger fact-table per joining and/or mapping and afterwards delete this key-field and use instead two fields "date" and "time" in your data-model.

- Marcus

datanibbler
Champion
Champion
Author

Hi Marcus,

I assume that it has been a requirement that we break down our target_value by minute so that we have all the possibilities - though we probably won't be able to calculate and display this every minute - so, yes, I do need that keyfield.

(those are the target_data which I then have to link to the current_data to show "where we are" production-wise)

Splitting such a thing in two saves a lot, I know, but I cannot use two separate fields as key without combining them, can I?

Not applicable

Hi,

Am not clear on why a combined Date&minute key won't work. Will it be possible for you to paste the code here?

datanibbler
Champion
Champion
Author

Hi,

well, this is the critical question - it's probably me who is unclear about this.

AfaIk I cannot use two fields as key as long as they are separate, can I? This assumption is why I concatenate the date and the time (which looks like this)

>>> LOAD

       ..

       num(Date_PK) & '_' & Interval_Min as Keyfield,

       ...

RESIDENT ...

; <<<

Of course, if I could leave those two fields separate and still use their combination(s) as key, that would save me a few million records. Would that work?

Not applicable

Afaik, you can use two separate fields.

This works:

TableA:

load id,

       date_pk,

       minute_pk

from tableA.qvd (qvd);

TableB:

left join(TableA)

load date_column as date_pk,

       minute_column as minute_pk,

       target_value

from TableB.qvd (qvd);

eg:

TableA:

[id,date_pk,minute_pk

1,22,05

1,22,06

2,22,06

2,22,07];

TableB:

[date_column, minute_column, target_value

21,05,10

21,06,11

21,07,10.5

22,05,15

22,06,16

22,07,16

22,08,15];

Output would still be

[id,date_pk,minute_pk, target_value

1,22,05,15

1,22,06,16

2,22,06,16

2,22,07,16];

datanibbler
Champion
Champion
Author


Hi,

well, that would be cool. Just by splitting that one high-cardinality field into two, I will save some mio records, and hopefully some processing time.

Thanks a lot!