Loading different csv files to databases by settin... - Qlik Community

Anonymous · ‎2018-07-08

Hi,

I need to load different csv files to different databases. The database to connect should be set dynamically from file path. I have below flow:

tFileList --> tFileInputDelemeted --> tJavaFlex --> tPostgresqlOutput

I am extracting the DBname and tablename from csv file path and setting to context variables in tJavaFlex and then setting the context variables in tPostgresqlOutput database name and table name properties.

But I am getting

org.postgresql.util.PSQLException: ERROR: zero-length delimited identifier at or near """"

as the value comes null or empty string for dbname or table properties, but I can see the values coming properly in tJavaFlex logs.

The reason can be the tPostgresqlOutput connection details are getting initialized even before the flow starts.

Someone please help me on this...

Thanks in advance

Sree

Anonymous · ‎2018-07-09

Any idea or pointers?

Jesperrekuh · ‎2018-07-09

You could look at the code to check whats happening, but your assumption is probably right.
Make a subJob where you pass the context vars, in the subjob you make the flexible connection.

Anonymous · ‎2018-07-09

Thanks for your reply. Yes, when I created a sub job with same database and table name it works. But I want dynamic database/table names which is possible only by iteration. When we do iteration, how can I create subjob with in that?

I have attached screen shot of my workflow. Here in tJava component I am setting the database/table name and the directory to load the csv.

Talend_Flow.JPG

Jesperrekuh · ‎2018-07-09

First Option;
After the tFileList > tRunJob ...
In the tRunJob you add context vars: targetTable and fileNamePath which you set dynamically in your (main) job.
Do all files have the same columns? I assume not... so lets assume you have customers and order files:

In the tRunJob you start :
* tJava1 -> If (context.targetTable.equals("customer")) -> tFileInputDelimited -> tPostgresqlOutput
* tJava1 -> If (context.targetTable.equals("orders")) -> tFileInputDelimited -> tPostgresqlOutput

In every job you still need to define the file input and output mapping
In .equals() is case sensitive otherwise use .equalsIgnoreCase()

Second option (prefered):
After the tFileList > tRunJob ... checkbox dynamic, a job is triggered based on a context var which would be the Jobname which should be triggered, your jobname should match exactly the context var.

Create a job for each file type
In the tRunJob you still add a context vars: targetTable and fileNamePath which you set dynamically in your (main) job.
In every job you still need to define the file input and output mapping

Anonymous · ‎2018-07-09

Hi Dijk,

The 2nd approach is interesting and will try out. But as I mentioned earlier, the CSVs are dynamic and so the jobs. So cant create job specific for CSVs like (if var.equals("Customer" ) or var.equals("Order"). From the folder path am identifying the database and connecting to specific tables. I don't have control over the CSVs or its contents. The jobs you mentioned should be created dynamically and that is the challenge.

Jesperrekuh · ‎2018-07-09

From what you are saying, it sounds like there are a couple of folders, which indicates the target/output database table ... do your files within a subfolder are always the same structure and contain the same columns in the same order? If not, first create a process to determine which type it is.

If there's no control... how do you know which file contains say customers? There should be something which indicates what type of data is presented in your file? Your folder, filename or its header.

If its really really really dynamic, you need to use regex to extract parts of data from within the file (header should indicate), use this to determine the type of data. But you still need to map your input data into output columns postgresdb (unless you store 1 column which contains the content from 1 row from the file).

Good luck!

Anonymous · ‎2018-07-09

Yes, infact it is really, really dynamic. I will know the database name and table name from file path. Also am using the dynamic datatype to map the columns so I need not worry on the csv columns and schema.

The main issue is setting database connection before loading. As a simple flow, tFileList-->tFileInputDelemeted-->tJavaFlex-->tPostgresqlOutput should work where tJavaFlex will set the database name and table name. But thats not happening. Even before the tFilelist starts loading, the tPostgresqlOutput will give error.

Jesperrekuh · ‎2018-07-09

By setting database connection you actually connect to multiple databases? or multiple tables? or both?
Maybe ... just a 'wild' idea... you could generate an INSERT INTO scripts and upload as bulk?

Anonymous · ‎2018-07-09

Yes, am connecting multiple databases and tables inside those database dynamically while loading csv.

I tried a work around now to find the file/db/table and then looped that map and then tried the same flow which I mentioned in the beginning and it is working now.

But now the problem is with performance issue. It is taking a 300K records more than hour. In tPostgresql, since am using insert or update and dynamic table creation (action on table) it is pretty slow now. I think need to find a way to segregate records to be inserted and updated and handle separately.

If there is any easy approach/component can be used to segregate, please let me know

Anyway, thanks a lot for all your inputs.

Loading different csv files to databases by setting databases name from context variable/current file path dynamically

Big Data

Java

Other

v7.x