TMap and Lookups 20+M records

Anonymous · ‎2019-09-14

Hello,

I was trying to convert Informatica mappings to Talend.

Following table stats

1. Lookup-1: 28M (PostgreSQL Input with SQL join & Filter) - Cursor Size - 1M

Store on Disk
Load Once
First match

2. Lookup-2: 35M (PostgreSQL Input with SQL Filter) - Cursor Size - 1M

Store on Disk
Load Once
First match

Lookups in Parallel

3. Main Table: 27M (PostgreSQL Joins with multiple tables and Date Filters)

Max Memory Settings I provided was 8GB

Lookups are running fine, but once it reaches Main Table read it will slowdowns and after 1hr of running the whole process it comes out as Java Heap Memory error.

Not sure what else I got to look to make this work. And this is just one small Mapping and going fwd down the line got too complex mappings with huge data.

Does parallelization help?

Does Multi Thread execution help? If Yes, what buffer unit size should set to?

Or Custom Batch processing process every 5M records will help?

Please do advice, Thanks.

fdenis · ‎2019-09-16

hi,
Java Heap Memory error is due to allowed memory to java process.
so you may (depend on your job):
- incrase -Xmx param
- split process
- use reload for etch row on tmap
- …
good luck

Anonymous · ‎2019-09-18

Thanks @fdenis.

If I use reload for each row - as far as I know this decreases the overall execution time

fdenis · ‎2019-09-19

it depend on how is build your job and how is filtred your lookup.

Talend Data Integration

v7.x