Skip to main content
Announcements
Qlik Introduces a New Era of Visualization! READ ALL ABOUT IT
cancel
Showing results for 
Search instead for 
Did you mean: 
haiderpasha
Contributor III
Contributor III

AWS Aurora PostgreSQL to S3 - Source_lookup performance issues

Hello Team

 

We are using AWS Aurora PostgreSQL as our source and have incorporated source_lookup into the task to enhance the table structure. However, its performance is not as expected. The table on which the source lookup is configured has 600K records, and the lookup table also contains 600K records. Out target is S3.

source lookup condition: source_lookup('NO_CACHING','public','custacct','cust_acct_bs_nbr','cust_acct_id=?',$cust_acct_shipto)

It seems the processing speed is around 10K per 5mins. You can check the same on the attached screenshot.

When we tested the same configuration with an AS400 DB2 source, the performance was markedly better. IT was around 1000 rows per second.

Thank you!

Haider

 

Labels (1)
4 Replies
Heinvandenheuvel
Specialist II
Specialist II

110000/ in 47+ minutes is just under 40 rows/sec.

>> When we tested the same configuration with an AS400 DB2 source, the performance was markedly better. IT was around 1000 rows per second.

To me that makes it clear that the issue is with querying  AWS, not a replicate issue.

Actually I think the issue is the solution design.

source_lookup - notably NO-cache - are singleton selects, which are NOT bundled, and handled sequentialy. Replicate sends a request for each lookup and waits for the answer. What is the 'ping' time for the  AWS database?

40 Row/sec = 25 ms round-trip. - 12 ms to ask, 1 ms to process, 12 ms to return the answer. T

That is the best any reasonable designer can expect, maybe you can get 10ms web latency (I doubt it) and go to 50 rows/sec, but never 1000 rows/sec and more likely you'll get 30ms latency some days and drop to 15 row/sec.

If this is full-load all the time, then you'd better switch to a VIEW to incorporate the other table data.

Or just forget the current design, replicate both tables, and create a view on the target. Bingo.

Hein.

haiderpasha
Contributor III
Contributor III
Author

Thank you for the response.

>> If this is full-load all the time, then you'd better switch to a VIEW to incorporate the other table data.

It's not going to be a full load all the time. After the full load is done, the incremental load will take over. Could you please elaborate on the "switch to a VIEW to incorporate the other table data" aspect? 

>> Or just forget the current design, replicate both tables, and create a view on the target.

Our target is S3, so the possibility of creating a view doesn't exist.

 

haiderpasha
Contributor III
Contributor III
Author

When the source is AS400 DB2 source then the source_lookup performance is as expected.

Heinvandenheuvel
Specialist II
Specialist II

>>> When the source is AS400 DB2 source then the source_lookup performance is as expected.

Yes, obviously. Because the source DB is closer by, possibly in the same room. The impact of the speed of light will be minimal for LAN access. But for a web server it becomes the limiting factor. It's called science.