Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
vikasshana
Creator II
Creator II

How to retrieve more than one record using applymap?

Hi,

Need your help in order to retrieve more than one record in apply map.  Below is the sample data which I've have

Table1:

ID Task Number
1 123
1 456

 

Table2:

ID Sales Cost
1 150 100
2 200 150
3 250 200

 

Code:

Dim_Table:

Mapping Load ID, Task Number resident Table1;

Drop table Table1;

Fact_Table:

Load ID,

Sales,

Costs,

Applymap('Dim_Table', ID, Null()) as Task_Details

resident Table2;

Drop table Table2;

Receiving Output:

ID Sales Cost Task_Details
1 150 100 123

 

Expected Output:

ID Sales Cost Task_Details
1 150 100 123
1 150 100 456

 

I can achieve it by using a left join but I've millions of records in my main table, so if I do a left join the app is crashing and not getting reloaded.

Any suggestion? 

Regards,

Vikas

6 Replies
Or
MVP
MVP

What you are describing is not a use case for mapping load, and you are correct in using left join instead. Even with millions of rows, I don't see why this would cause the app to crash - I regularly do this with tens of millions of rows and occasionally with hundreds of millions with no issues. It sounds like you either need more RAM (if the cause of the crash is insufficient resources) or perhaps your field names for the join aren't perfectly identical, in which case left join is Cartesian and could indeed cause lack of resources / stuck / crash.

vikasshana
Creator II
Creator II
Author

Thanks for that, alternative is to use left join. I tried debugging it to 1 million records by using left join and is working fine but when I do a full reload app is taking around 12 - 13 hours to reload and failing. And the fields I'm using to join are identical and I don't find any issues while debugging.

Or
MVP
MVP

I'm not really sure how to further assist you with this, since it sounds like a localized issue.

I ran the following on my laptop:

Load RowNo() as Field1, Rand() as Field2
Autogenerate(10000000);
Left join
Load RowNo() as Field1, 2*Rand() as Field3
Autogenerate(10000000);

 

And it finished in about 35 seconds, after generating 10 million rows twice and left joining the results. Unless your resident tables are huge and/or you're working with very little RAM, I can't figure out why something would work correctly with a small dataset but not with a medium one.

marcus_sommer

It could be done with a mapping, too. For example with something like this:

Code:

Dim_Table:

Mapping Load ID, concat([Task Number], '|') resident Table1 group by ID;

Drop table Table1;

Fact_Table:

Load ID,

Sales,

Costs,

subfield(Applymap('Dim_Table', ID, Null()), '|') as Task_Details

resident Table2;

Drop table Table2;

 

Even by a larger data-set and millions of records I would expect a run-time from just a few minutes.

- Marcus

vikasshana
Creator II
Creator II
Author

Thanks for that, I tried with the sample code and is working. But when I implement the same in the main code it is throwing me below error as I was extracting 80 million records from the code and using Concat function. Even I tried debugging for 10 records and it still says the same.

vikasshana_0-1648792030789.png

 

marcus_sommer

A string-aggregation with 80 M of records is surely a heavy transformation and by a aggregation rate of around 50% it would probably result in table with 40 M of records and more important two fields with 40 M of distinct field-values and one of them is a string. Such table itself may cost alone 20 GB of RAM and splitting it within the next step with a subfield-loop would also need some resources. From this point of view it's just a question of how many resources you have available within your environment.

Quite independent from the kind of solution which is in the end implemented I would suggest to apply an incremental logic with at least 3 layers - maybe more. With it you may remain by the intended way to create the final table - regardless if you use such mapping or maybe another way - but I suggest to rethink this model carefully because my impression is that's not really suitable - then your shown final table duplicates the sales and cost values which will probably cause other problems (of course solvable but again with additionally efforts and performance needs).

- Marcus