[resolved] tFilterRow Advanced Mode Regex match fails?
Hi All,
I have a simple job that iterates through a directory of files. I am seeking to filter the files where the name matches a list of regular expressions. I want to report on those files that match, and those that dont. (Those that match will continue for further processing).
I have:
tFileList -> tFileProperties -> tFilterRow - tLogRow (filter)
|
------- tLogRow (reject)
In the configuration of the tFilterRow, I have specified "advanced mode" and used the statement:
input_row.basename.matches("^\\d+")
as my initial test is simply to identify files beginning with any sequence of digits.
Currently all rows route through the reject log.
I have tested the same regex statement with tExtractRegexFields and it worked.
Does tFilterRow NOT support regex in this way?
Thanks
Hi souha,
tMemorizeRows component is from Talend Open Studio for data integration and not from paid / enterprise ...
Which version of talend you are using?
Another way could be to save the value of respective column to some context variable and then checking that value with the current row... after checking the value, at the end again save current value to context variable...
Vaibhav
Hi, So far, tFilterRow don't support for regex. Could you please elaborate your case with an example with input and expected output values so that we can see if there is an alternative solution for your case. Best regards Sabrina
Anyway, my use case is essentially an ETL migration from a filesystem and database through to a new target system. The filesystem contains binary files, the database contains metadata. A primary key (id) number links the file to the database record.
I want to iterate through the file-system to identify and join the files to the metadata, then transform them into a new format for import into a new system.
I want to report on files and folders that were migrated, or failed, because they were unidentifiable etc..
I have a file-system containing about 30Tb of various image files and folders.
Examples would be:
12345_sometitle.jpg
46602_shot.psd
Latest_3498912.gif
Some_3452_file.bmp
fail_example.jpg
As I iterate through the file-system, I want to evaluate both files and folders against a regular expression that is designed to pickup an ID number that may or may not be in the name of the file/folder. There may be multiple regexes to support different criteria.
If one of the criteria matches, I want to extract the ID, then continue processing -- I will do a join to another data source to identify more metadata.
Ultimately I will write an XML file next to the binary file in a new directory, where it will be loaded into the new system.
I have been playing around with Talend for a few days to evaluate whether it will be a useful tool or not... I am new to Talend, but have a Java background.
Thanks,
Rob
Hi Rob,
Based on description above, I understood that major challenge is in extraction of ID which may or may not be available in file system. And you have strong business rules or definitions to get the ID. Once ID is extracted, further process is simple to you i.e. inner join in tmap to get the file ID from database...
For extraction of above id from file system, I would recommend to use tJavaRow and multiple if clauses based on your regular expressions implemented using Java. As you have a java background, this will not be difficult for you.
Once you have extracted ID from file system, you can use tMap to join with the database and continue with your further processes.
Please let me know if it helps and the understanding is the same as you desire.
Thanks
Vaibhav
Bonjour, SVP j'ai besoin de votre aide. J'ai un fichier texte contenant des lignes de 250 caractères comme des relevés bancaires. J'ai besoin de lire le fichier par bloc. Par exemple:
Je voudrai lire le fichier par partie, par exemple pour chaque ligne commençant par "08", je prends les lignes qui la suivent commençant par "05" jusqu'à arriver à la ligne "08" ainsi de suite. Avez vous une idée SVP.
Hi Souha,
This is an international forum and English is the language we use. Posting in English will allow you to get more visibility and more help. Thanks for your understanding!
Best regards
Sabrina
Hi every one, I need your help please, I have text file(positionnel file) where each line have 250 characters, like that 083005600V300026EUR2 0026000eeeeee270614 270614VIREMENT eeeeeeee YCI5 0671 05067120 0530056 00026EUR2 0026000eeeeee270614 NPYXXXXX 0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeeeeee 0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeee 0530056 00026EUR2 0026000eeeeee270614 eeeeeeeeeeeeeeeeeeeeeee 083005600V300026EUR2 0026000eeeeee270614 270614VIREMENT eeeeeeee YCI5 0671 05067120 I would like to read my file like that: IF the line starts with "08", I have to check the next line, if it is starting with "05" , a Msg Box having "NPYXXXXX " will be appeared Else Msg Box having "VIREMENT " will be appeared.
Hi Souha,
Read your input file...
- Use tjavarow
- Use string handling left to get 2 chars in some variable from first column
- Compare these variables with 08 and 05 using if then else clause
- Execute whatever code you want...
I will not recommend to use the msg box, in place you use System.out.println()... else you will get 100s of msg boxes on screen and will not be able to identify what is happening...
Thanks
Vaibhav