Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have a rather complex case and I don't know how to proceed. My process should be as follows : for example, I have 50 rows (3 columns : id, date, indicator) to insert in the database.
I have to insert row by row, run an SQL query that checks if the date value already exists, then update the row (indicator column : 0 if the date does not exist : 1 if it does). And I repeat 50 times because there are 50 rows.
Demonstration of what I want :
Insert row 1 : 1; 11/17/22; 0
Insert row 2 : 2; 17/11/22; 1
Insert row 3 : 3; 17/11/22; 1
Insert row 4 : 4; 18/11/22; 0
Insert row 5 : 5; 19/11/22; 0
Insert row 6 : 6; 20/11/22; 0
Insert row 7 : 7; 19/11/22; 1
Insert row 8 : 8; 19/11/22; 1
...
How to do this and maybe the most optimized method because I imagine that doing this process iteratively 50 times is costly in performance.
Or another idea is to insert all the lines once, which gives :
Demonstration of what I currently have :
Insert row 1 : 1; 17/11/22; 0
Insert row 2 : 2; 17/11/22; 0
Insert row 3 : 3; 17/11/22; 0
Insert row 4 : 4; 18/11/22; 0
Insert row 5 : 5; 19/11/22; 0
Insert row 6 : 6; 20/11/22; 0
Insert row 7 : 7; 19/11/22; 0
Insert row 8 : 8; 19/11/22; 0
...
Then create a code or using components, a technique to read the date of each line and check if this date is present in a previous line. In this case, set the indicator to 1 otherwise leave it at 0.
There is an easy way to do this. First you need to sort your data however you want it ordered. Then put it through a tMap. In a tMap variable, add a variable which uses the routines.Numeric.sequence(null, 0, 0) method. Replace null with something like "DateCount" and replace the last 0 with 1. What this will do is create a count for every date you get. So you can use this count the number of dates or just see if it is greater than 0.
Use the variable as an output value.
If your variable is called "DateCountVar", you would add it to your output column with a bit of code like this....
Var.DateCountVar>0? 1 : 0
That means, if it is greater than 0, set it to 1 otherwise set it to 0.
Now this won't be the most memory efficient way of doing this, but it will work and I doubt you will get too many issues doing it this way.
Hello,
Talend CDC feature can capture the data changes in the source table, and then you are able to synchronize these change data to target application.
Please refer to this online documentation: TalendHelpCenter: Change Data Capture (CDC) and let us know if it is what you are looking for.
Best regards
Sabrina
Hello @Stéphane Barbezier ,
A little bit out of the box thinking but during insert we could try and run a sql query to look up the value of the column. In case your database support it you could use the Advanced Settings ->Additional Columns field and define an SQL Expression.
https://help.talend.com/r/en-US/8.0/mysql/tmysqloutput-tmap-trowgenerator-trowgenerator-inserting-column-and-altering-data-using-tmysqloutput-standard-component-this
So at the end of the day your generated query would be:
INSERT INTO table (col1, col2, col3) VALUES ( ? , ? , (CASE WHEN (SELECT count(*) FROM table WHERE date = ? ) = 0 THEN 0 ELSE 1 END) );
So by using SQL Expression to Replace Date we could do a lookup.
Most likely this won't be compatible with Batch Mode.
Regards,
Balázs
Hello @Xiaodi Shi and @Balazs Gunics ,
I admit that I don't really understand your solutions and whether they really apply to what I'm looking for. I will show you precisely the concrete case I am trying to solve. Here is the process :
I have an SQL query
select ID_HIST
from HIST690 h
where h.HIST_DATCPT = '03/12/21';
The determining field is the date. If the query doesn't return any result then I set the value of a column to 0 otherwise to 1.
Here are the two columns and the expected result of the process :
Unfortunately, this is what I get :
This is what I did in Talend :
tDBInput
tMap
And so I get the impression that it finds 0 for a given date, but that this value is stored for the same date even if it comes after. I don't think the process I did in Talend is what I want, which is :
1) Runs the query a first time and depending on my condition, returns me 0 or 1, then inserts it into the database.
2) Run the query a second time (while considering the previous insertion) and depending on my condition, return 0 or 1.
And this process for as many rows as I have to process.
This would be a process of the type: search the database, then insert iteratively, whereas in my process, it is search everything at once, then insert at once.
How do I make this possible?
Or imagine a solution that returns 1 when the same date is found for the second and subsequent times (except for the first time). I'm trying to think of what I can do, but I admit I'm at my limits of Talend technical knowledge to apply this.
Make sure that you're not using Batched output. So every record you insert to the database will be visible instantly.
In the tMap try to change the Lookup Model:
What this will do is:
For the incoming row it puts the row2.newColumn value to the globalMap.
Then it triggers the lookup flow and reloads all the data. (If this is coming from your database then it will call the DBInput)
On this input you could use a where HIST_DATCPT = '" + (String)globalMap.get("hist_datcpt") +" (this would minimalize the number of records we receive)
You could also make that a query that returns the key , value:
" SELECT '" + (String)globalMap.get("hist_datcpt") +" ' as hist_datctp,
CASE WHEN count(*) = 0 THEN 0 ELSE 1 END as value
from HIST690 h
where h.HIST_DATCPT = '03/12/21';"
And then join this. Basically this way for each input you execute a lookup query.
Hope this helps.
There is an easy way to do this. First you need to sort your data however you want it ordered. Then put it through a tMap. In a tMap variable, add a variable which uses the routines.Numeric.sequence(null, 0, 0) method. Replace null with something like "DateCount" and replace the last 0 with 1. What this will do is create a count for every date you get. So you can use this count the number of dates or just see if it is greater than 0.
Use the variable as an output value.
If your variable is called "DateCountVar", you would add it to your output column with a bit of code like this....
Var.DateCountVar>0? 1 : 0
That means, if it is greater than 0, set it to 1 otherwise set it to 0.
Now this won't be the most memory efficient way of doing this, but it will work and I doubt you will get too many issues doing it this way.
Hello @Balazs Gunics ,
Thank you very much for this example. I have tried the different methods and the processing time of my job is getting longer and the result is not what I expected. I did as in your screenshot.
But thanks for the discovery of this globalMap Key feature, I didn't know it because I never needed to use it. I will read up on it 😃
Hello @Richard Hall ,
Thank you very much for this very effective solution. I have read a bit of the documentation on Numeric.sequence(). I have adapted the solution to my case and it works very well.
I had 3 different dates (02/05/2022, 03/05/2022 and 05/05/2022) in a field of my database. In the case below, I removed the rows with 05/05/2022 before running the job.
So, I don't know why, but it's with the last two arguments in 0 that it increments and my result is the expected one. I tried with 1 too, but it is not the case.
I'd need to see the whole job to be able to work this out. Your code looks OK, so this could be because of some other issue.