Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
bgerchikov
Partner - Creator III
Partner - Creator III

Concatenate big tables

Hello,

I'm trying to concatenate two very big tables to implement star schema in Data model. Having these tables linked makes chart performance very slow.

The first one has 50 fields and 440 millions records

The second one has about 160 fields (80% of them are flags) and about 100 million records.

The problem is the Concatenate operation uses one core only, so this process might take couple days to be accomplished.

Has anybody met such problem?

Thanks!

Boris

16 Replies
fkeuroglian
Partner - Master
Partner - Master

Hi boris,

in the two tables all the fields are different? you have any similar field?

you will have to concatenate one time only and the you will use it from a qvd , could be an option

bgerchikov
Partner - Creator III
Partner - Creator III
Author

Hi Fernando,

Both tables have common key field used to link them. All other fields are different.

Load from QVD has a similar cost and doesn't resolve the problem.

Thanks!

fkeuroglian
Partner - Master
Partner - Master

Ok, perfect

How many cpu do you have?

what amount of ram does the server have?

do you have enough space on disk?

rbecher
MVP
MVP

Hi Boris,

can you post your script? Is it an un-optimized load from QVD due to some transformations or additional functions you use? Usually, an optimized QVD load should work smoothly, as long as you have enough memory available..

- Ralf

Astrato.io Head of R&D
bgerchikov
Partner - Creator III
Partner - Creator III
Author

Hi Ralf,

I cannot post the script, but I can say that the qvd load is optimized. The problem is that Concatinate operation uses 1 thread only while the server has 16 CPUs. It takes approximately the same time eather for the preprocessed in the same script tables or loaded from QVD.

Thanks!

rbecher
MVP
MVP

It's still not clear to me what's happen in your script. However, sometimes load from resident is slower than from QVD. So, it could help to store a processed resident table (esp. when loaded distinct) into QVD and then load again from it, and then concatenate it..

Astrato.io Head of R&D
bgerchikov
Partner - Creator III
Partner - Creator III
Author

Ralf,

I'm trying to concatenate two large tables. I was trying to do this in several ways:

Both tables are residential

One residential, second loaded from QVD (optimized load)

Both tables are from QVD (optimized load)

The problem is this process is extremely slow because only one CPU thread is enabled, When I join these tables QV is using several cores. It looks like Concatenate operation has the same problem as Group By or Inner Join. So, the question is has anybody mitigated similar issue in some way.

Thanks!  

fkeuroglian
Partner - Master
Partner - Master

Boris, if you think that is it a memory or cpu problem , the solution would be add more RAM and get a better cpu

Not applicable

What is wrong with the standard outer join in this case?

"Both tables have common key field used to link them. All other fields are different."

If your data are solid it should work?

Table_test:

Load * Inline

[key, testfield, flag1, flag2, flag3

A, 125, 1,0,1

B, 123, 1,0,1

C, 155, 0,1,0

D, 110, 0,0,1];

Join

//Concatenate

Table_test2:

Load * Inline

[key, testfield2 ,flag4, flag5, flag6

A, 125, 0,1,0

B, What, 1,0,1

C, Why, 0,1,0

D, How, 1,1,1];

Then you have:

With concatenate activated instead you have:

So why do you need the concatenate version?