SQL script returns more data than I can store in RAM
Hi!
I use a large SQL script in tPostgresqlInput, which returns more than 100 GB of data while I am using a server with RAM 16 GB.
There is no way to cache all the data in memory, so setting xms/xmx is not a solution. Could you please give me a piece of advice what I can do in such a situation? Maybe I can fetch the data from PostgreSQL partitially (but remember that the SQL script is rather complex, it builds a data mart table) or swap selected data to the server hdd?
Thanks!
Hi Sabrina! Yes, it is another DB (PostgreSQL) on another server. There is no direct connection between source and target databases but both of them are connected to ETL-server.
As TRF said, you need to break this problem down. Even if your SQL query is really complicated, it can be filtered with a WHERE clause. A nice way to try to solve this problem could be as simple as iterating over multiple calls of this query and changing the WHERE clause each time. This would break the memory load into manageable chunks while not having the overhead of creating a load a files. You will need to think about what you will filter on and whether it is suitable to break the data into the chunk sizes you can work with though.
TRF, rhall_2.0
thanks for your answers. I thought there could be a way to fetch data from source RDBMS server by batches without any additional rework of my jobs.