Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi experts,
I am loading one table (cca 600 000 000 rows, 20 columns)... It is going pretty quickly. But, I need a new column add to this, from another table. Another table have like 5 000 000 rows.
It is very slowly if I use left join, what can make it faster?
Thank you
Jasmina
Personally I would use mapping like the others already suggested. It avoids all the risks which could occur with joins (adding or removing records depending on the join-type) and it will be surely faster as a join. Further it provides the possibility to define any default-value if there is no match and also to do further calculations at once - and not like by join which would require another load for any further checks/calculations. Also I would look if the mapping-table might not be reduced to a lesser number of records with some kind of advanced filtering. This may look like:
TableForExists: load article from ArticleRawdata where category = 'very important';
/* it's just an example - it may also some generic approach like the recent one from yours (this with the concatenating) with creating the dates from the current year */
ExtractMappingData: load lookup, return from SecondTable where exists(article);
Map: mapping load * resident ExtractMappingData;
drop tables ExtractMappingData, TableForExists;
It may look like expensive efforts but nothing worked better to optimize the load-performance by large datasets - unless you used incremental approaches. In your case with such large tables I would try to implement incremental loadings at first and if it's then further necessary to reduce the load-times the above mentioned approach could be also implemented on top.
- Marcus
I think you could do the second step within the first one, like:
if(column2=0, applymap(), column2)
- Marcus
Can't you add directly into the fact table for new column also?
In this moment no.
What is fact, I have something like this:
In big table ( more than half billion rows) i have something like this
column1 column2
1 0
2 0
3 0
4 40
5 60
In other table i have
column1 column2
1 30
2 20
3 20
This column2 from second table is from big importance to me, I need that data...But how...
I was thinking like:
table1:
load * from table1.
left join (table1)
load column1,
column2 as different_column
from table2;
From this I will get a new column, but I would have data. I need it in this table1, because this table1 is connected to other data that I need.
Jasmina
Have you looked at using ApplyMap? Here is an article by Henric Cronström explaining why it is better than joining if you just want to get a single value from a table.
Hope this helps.
Regards,
Mauritz
Since there was same column, Simply do concatenate on both
Load column1, column2 From T1:
Concatenate
Load column1, column2 From T2:
Or, you can define using ApplyMap concept
If I use concatenate and get this:
column1 column2 column3 column4
1 0 A B
2 0 A B
3 0 A B
1 20 null null
2 30 null null
3 20 null null
then how can i make table:
column1 column2 column3 column4
1 20 A B
2 30 A B
3 20 A B
Jasmina
p.s. now I will look applying map
You will get the requested table iIf column1 is a unique key then having:
1) column1 as a dimension
2) column2 as a measure with the expression Sum(column2)
3) column3 as a measure with the expression Concat(column3) or Maxstring(column3) or Minstring(column3)
4) Same as for column3 but with column4 instead.
Below should explain the ApplyMap principle.
Table2:
Mapping
LOAD * INLINE [
column1,column2
1,30
2,20
3,20
];
Table1:
LOAD * INLINE [
column1,column2
1,0
2,0
3,0
];
Rename Table Table1 to Temp;
NoConcatenate
Table1:
LOAD
column1,
ApplyMap('Table2', column1, Null()) as column2
Resident Temp;
Drop Table Temp;
In your case you won't have to do the rename and drop as you'll already have the table. You can obviously use logic to only use ApplyMap if the original table didn't have a value, or if it was 0, etc.
Regards,
Mauritz
columns that i present are not measures, just dimensions....so i cannot use sum
Then you can have a preceding LOAD with a GROUP BY clause to contract the table into what you want. It doesnt matter whether the fields are dimensions or not in the load script. What you want to do is get rid of the null values and merge the rows that belong together....
But you might have to do a RESIDENT on the big concatenated table so I don't know if the performance or memory consumption will be good... ApplyMap might work better for you here. You'll have to give it a try.