Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Loading different csv files to databases by setting databases name from context variable/current file path dynamically

Hi,

    I need to load different csv files to different databases. The database to connect should be set dynamically from file path. I have below flow:

tFileList --> tFileInputDelemeted --> tJavaFlex --> tPostgresqlOutput

I am extracting the DBname and tablename from csv file path and setting to context variables in tJavaFlex and then setting the context variables in tPostgresqlOutput database name and table name properties.

 

But I am getting 

org.postgresql.util.PSQLException: ERROR: zero-length delimited identifier at or near """"

as the value comes null or empty string for dbname or table properties, but I can see the values coming properly in tJavaFlex logs.

The reason can be the tPostgresqlOutput connection details are getting initialized even before the flow starts.

 

Someone please help me on this...

Thanks in advance

Sree

Labels (4)
13 Replies
Anonymous
Not applicable
Author

Any idea or pointers?

Jesperrekuh
Specialist
Specialist

You could look at the code to check whats happening, but your assumption is probably right.
Make a subJob where you pass the context vars, in the subjob you make the flexible connection.
Anonymous
Not applicable
Author

Thanks for your reply. Yes, when I created a sub job with same database and table name it works. But I want dynamic database/table names which is possible only by iteration. When we do iteration, how can I create subjob with in that?

 

I have attached screen shot of my workflow. Here in tJava component I am setting the database/table name and the directory to load the csv.


Talend_Flow.JPG
Jesperrekuh
Specialist
Specialist

First Option;
After the tFileList > tRunJob ...
In the tRunJob you add context vars: targetTable and fileNamePath which you set dynamically in your (main) job.
Do all files have the same columns? I assume not... so lets assume you have customers and order files:

In the tRunJob you start :
* tJava1 -> If (context.targetTable.equals("customer")) -> tFileInputDelimited -> tPostgresqlOutput
* tJava1 -> If (context.targetTable.equals("orders")) -> tFileInputDelimited -> tPostgresqlOutput

In every job you still need to define the file input and output mapping
In .equals() is case sensitive otherwise use .equalsIgnoreCase()

Second option (prefered):
After the tFileList > tRunJob ... checkbox dynamic, a job is triggered based on a context var which would be the Jobname which should be triggered, your jobname should match exactly the context var.

Create a job for each file type
In the tRunJob you still add a context vars: targetTable and fileNamePath which you set dynamically in your (main) job.
In every job you still need to define the file input and output mapping
Anonymous
Not applicable
Author

Hi Dijk,

      The 2nd approach is interesting and will try out. But as I mentioned earlier, the CSVs are dynamic and so the jobs. So cant create job specific for CSVs like (if var.equals("Customer" ) or var.equals("Order").  From the folder path am identifying the database and connecting to specific tables. I don't have control over the CSVs or its contents. The jobs you mentioned should be created dynamically and that is the challenge. 

Jesperrekuh
Specialist
Specialist

From what you are saying, it sounds like there are a couple of folders, which indicates the target/output database table ... do your files within a subfolder are always the same structure and contain the same columns in the same order? If not, first create a process to determine which type it is.

If there's no control... how do you know which file contains say customers? There should be something which indicates what type of data is presented in your file? Your folder, filename or its header.

If its really really really dynamic, you need to use regex to extract parts of data from within the file (header should indicate), use this to determine the type of data. But you still need to map your input data into output columns postgresdb (unless you store 1 column which contains the content from 1 row from the file).

Good luck!
Anonymous
Not applicable
Author

Yes, infact it is really, really dynamic. I will know the database name and table name from file path. Also am using the dynamic datatype to map the columns so I need not worry on the csv columns and schema. 

The main issue is setting database connection before loading. As a simple flow, tFileList-->tFileInputDelemeted-->tJavaFlex-->tPostgresqlOutput should work where tJavaFlex will set the database name and table name. But thats not happening. Even before the tFilelist starts loading, the tPostgresqlOutput will give error.

 

Jesperrekuh
Specialist
Specialist

By setting database connection you actually connect to multiple databases? or multiple tables? or both?
Maybe ... just a 'wild' idea... you could generate an INSERT INTO scripts and upload as bulk?
Anonymous
Not applicable
Author

Yes, am connecting multiple databases and tables inside those database dynamically while loading csv.

I tried a work around now to find the file/db/table and then looped that map and then tried the same flow which I mentioned in the beginning and it is working now.

 

But now the problem is with performance issue. It is taking a 300K records more than hour. In tPostgresql, since am using insert or update and dynamic table creation (action on table) it is pretty slow now. I think need to find a way to segregate records to be inserted and updated and handle separately.

 

If there is any easy approach/component can be used to segregate, please let me know

 

Anyway, thanks a lot for all your inputs.