Loading different csv files to databases by settin... - Page 2 - Qlik Community

Anonymous · ‎2018-07-08

Hi,

I need to load different csv files to different databases. The database to connect should be set dynamically from file path. I have below flow:

tFileList --> tFileInputDelemeted --> tJavaFlex --> tPostgresqlOutput

I am extracting the DBname and tablename from csv file path and setting to context variables in tJavaFlex and then setting the context variables in tPostgresqlOutput database name and table name properties.

But I am getting

org.postgresql.util.PSQLException: ERROR: zero-length delimited identifier at or near """"

as the value comes null or empty string for dbname or table properties, but I can see the values coming properly in tJavaFlex logs.

The reason can be the tPostgresqlOutput connection details are getting initialized even before the flow starts.

Someone please help me on this...

Thanks in advance

Sree

Jesperrekuh · ‎2018-07-09

Your network (wifi) probably the performance killer... If you need high troughput, suggest generate batch (insert into /t-sql) files, upload this file to your DB server first and then process it.
If you use talend Im not sure you could use/change batch size... and make sure you 'trim' your input data.

The amount of people complaining about performance issues are becoming more frequent specially when cloud is involved, upload speed is often not as fast as dload speed. And if you work from home I guarantee that HOME->VPN->Office->Cloud is the serial killer.
Even cloud up/down speed are managed and based on concurrent user loads ... so...
And what about if your (cloud) database is replicated or audit logs are active ...

Upload 25MByte file and see what your avg upload speed is... I think its a very good indicator to use for your max throughput calculation using TCP/IP. Next step is your DB.

Anonymous · ‎2018-07-09

I am running it locally, both DBServer and file repo are same system.

I tested the same 300K records with only "insert" with parallelism and it was finished in less than 4 seconds. So can conclude that it is the "insert and update" without parallelism (As I need to create tables dynamically, which is not available with parallelism) makes it slower.

Jesperrekuh · ‎2018-07-09

Ohw that sucks... maybe first into tmp tables just inserts and a t-sql with: exists in (select target table)
This how I do a lot of stuff to fill a dwh (datavault with hashkeys)... the overhead in the insert/update is way to much...

Anonymous · ‎2018-07-09

Thank you very much for the input. I will try that.

Loading different csv files to databases by setting databases name from context variable/current file path dynamically

Big Data

Java

Other

v7.x