Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
As described in the subject, I am having an issue with the tBufferOutput/Input components. I have a table in a MariaDB database which contains a column which is BINARY datatype (in this column I have a hash value calculated with SHA256). For this example I used a single row so it would be easy to notice the difference.
Here is a screenshot of my job:
Following are screenshots of the schemas of the job components:
And finally here is the result (testDataHash1 is the source table and testDataHash2 is the destination table - both have the same definition - only one column BINARY with length 64).
This is a simplified scenario to demonstrate the issue I am facing - I actually have complex parent-child jobs where I need to propagate data from the child to the parent job (including the DataHash column which I am later using in a tMap component as a lookup; the lookup fails because the values are different because of the tBuffer components).
There is no workaround in my job logic (in terms of not using the tBufferOuput/Input), so I would appreciate any advice on how to tackle this issue!
Thanks!
for example you can create a private static List<byte[]> in a routine.
on the child job you use tJavaFlex : in the begin part you instanciate the liste, in the main you add the byte[] of the current row in the ArrayList on the end part you set the private var with the list you filled.
on the father job you use another tJavaFlex with a foreach clause inside to send the list to a flow.
here for eg I use a list of string and globalVar for put or get the list :
you just have to transpose it to list<byte[]> and use getter setter of private variable in a routine instead of globalVar
Hi, it seem you have an encoding problem,
check the encoding used in the advanced parameters of tBuffer components .
Send me love and Kudos
Thank you for your answer but I'm afraid that is not the case.
I forgot to mention it in my question but I have already tried executing the job with different encoding selected in the advanced settings of the tBufferInput component (I used UTF-8, ISO-8859-15, ASCII, Cp1252, UTF16, UTF32 and some other encodings that I thought would make a difference) and all I got was multiple different values for the DataHash column - none of them is the same as the original.
If you have some other suggestions please share them with me.
Thanks.
you can add additionnal parameters in your tDB components to force encoding
I don't think that the encoding would affect a binary column - but I tried this as you suggested and it does not fix the problem.
I have tested this also without the tBuffer components (just passing the row from tDBInput to tDBOutput) with different encodings in the additional JDBC parameters and it did not affect the value at all (I used latin1 and UTF8). So I think that the encoding in the source-destination is not the problem, the problem is with the tBuffer components (and again not with the encoding because changing it did nothing to fix the issue).
Update: I also tried force encoding in the source as latin1 and mapping it in a destination with UTF8 (without the tBuffer components) - the value of DataHash stayed the same.
have you tried with other component like tHashinput and tHashOutput to replace tBuffer component ?
tHashinput and tHashOutput components work perfectly well with hashed values - and I use them in my jobs - but tHashinput and tHashOutput are not useful in this situation because I need to propagate data from a child to a parent job (what is shown in the screenshots in my question is just a simple example to demonstrate the issue and not my actual job) and as far as I know I can't achieve that with the tHash components, that is why I am using tBuffer components in the first place.
hi you could use this :
https://help.talend.com/r/i6eFKBuNsRD2KzBCYnXHhw/4jPcdaVw7eaDvMyLDjdYfQ
instead of a String you declare a byte[], then you can get the byte[] or set it in all your job
This would probably work if I wanted to pass a single value - because for multiple rows the variable value would be reset for every row and in the parent job I would get only the last value of millions of rows. Maybe my example job confused you because in the example I am moving a single row (single value), but I am not actually working with a single value - I will have millions of rows for the DataHash column.
Thanks a lot for your time, I really appreciate your input!
so you can use the method with private variable and getter setter with a List<byte[]> instead of byte[]