Dynamic SQL query and Schema

jensej · ‎2022-11-10

Hi Guys!

I have 200+ postgres databases that i work with. They all have the same schema and tables.

I oft want to compare results from the different databases with each other.

To be able to run the same sql query on each database and compare the results with each other I have built a job where I iterate over every connection and writes the result in a csv with the hostname.

For example I query "select count(*) from employees" and in my excel i will get

database1 100

database2 200

database3 99

database4 330

and so on...

This all works perfect except of one thing.

In this case my query is hardcoded in my tDBInput togheter with the schema. If i now want to change the query to "select firstname,lastname from employees where firstname = 'Peter'" i have to open the job in talend studio and change the query and also the schema since it now has one column more and then save the job and run it.

Is there some way to be able to for example outside the job just paste a SQL in a text file and then fill tDBInput with the SQL from text file? If yes how can i solve the problem with the schema?

My wish it just start a batch job or something and get a popup where i can paste a SQL query. After that my job will start using this SQL query and also somehow quess the schema for the sql.

Anonymous · ‎2022-11-10

OK, I think I understand where you are going here and I have written a blog a while ago about a similar, but not absolutely the same, requirement. FYI my intro was a somewhat flippant account of a couple of questions I had received from people who were pretty demanding of an easy solution that would require only a few components....so please don't assume I am talking about questions like this 😉

https://web.archive.org/web/20201204175952/https://www.talend.com/blog/2019/11/11/migrate-data-betwe...

The blog talks about the freedom of the Dynamic Schema (only in the Enterprise Edition). You would also need some functionality to supply your altered query. Since the queries in SQL components are essentially Java Strings, you can quite easily pass a query into the job via Context variable, reading a file, a database, etc. I believe a combination of providing the query at runtime and using a Dynamic Schema will help.

jensej · ‎2022-11-11

Hi @rhall

Interesting blog article, sadly I don't have the Enterprise Edition and wont be able to try your solution.

What if just make a schema with a lot of text columns and then in my SQL I always make sure that I will convert everything to text?

I mean that I have my DBInput schema with 15 columns all text.

Then all of this examples would work or not? Select count(*) from employees and Select firtname,lastname from employees where firstname = 'Peter', Select id::text, birthday::text from employees.

I have to try it

Anonymous · ‎2022-11-11

That is an absolutely fine way of working here. The things to consider would be....

Ensuring that there is no loss of precision with casting to a String and then back again to the original data type.
How your logic will be applied. Possibly best applied in a Java routine with maybe your requirements for each table to be supplied as parameters which are identified by the source/targets your are working with.

One of the benefits of Talend is that you can build upon what is supplied with a bit of Java knowledge.

If you do this, I'd be interested in your results.

jensej · ‎2022-11-15

Hi @Richard Hall

So im really close to achieve my goal with this job.

I just added a lot of text columns and then make sure I convert my sql to text and that works. The first problem I run into was that I was missing my header (Column names).

To solve this problem I added the columnnames with a union all to my sql so that my columnnames appears on row 1. Then in the first step I fill my csv. with the column names and afterwards appends the rest of my result to it.

Next step I want to solve is to try to clean the empty rows in my schema. Now the csv. looks like this when I don't use all of the columns. id,lastname,firstname,,,,,,,,,,,,

1,peters,mike,,,,,,,,,,,,

I want to get rid of the ,,,,,,,,,,,,

Anonymous · ‎2022-11-22

If you are writing to a CSV file with this many columns, it will always add the separators as in many situations you may have data like this...

1,peters,mike,,,,,12,,true,,,,

In the case above, if you removed the separators, the data would not be aligned properly. We don't handle this scenario out of the box, but you can build this in yourself. All it would take would be some logic to prepare your row in code. You could use a tJavaFlex to build your row according to some logic in Java and output it in a single column. You would then use a tFileOutputRaw component to output your computed row as a single column.

https://help.talend.com/r/en-US/8.0/raw/tfileoutputraw

Talend Data Integration

v7.x