
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Transform a rdi file into a xml file
Hi everybody,
I'm a total newbie on Talend and I'm trying to transform a rdi file into a xml file.
What is a rdi file ? It's a text file coming from SAP (RDI = Row Data Interface)
Its structure is columns fixed, here below is a sample :
H046A011000000653776FZSCRIPT_FACTURE PRINTER
DMAIN XX B_TITLE 002MR
DMAIN B_LASTNAME 003DOE
DMAIN B_FIRSTNAME 004John
DMAIN B_STREET 0231 RUE DE LA TOUR EIFFEL
DMAIN B_POSTCODE 00575000
DMAIN B_CITY 005PARIS
H046A011000000653776FZSCRIPT_FACTURE PRINTER
DMAIN XX B_TITLE 002MME
DMAIN B_LASTNAME 003DOE
DMAIN B_FIRSTNAME 004Jane
DMAIN B_STREET 0233 RUE DES CHAMPS ELYSEES
DMAIN B_POSTCODE 00575000
DMAIN B_CITY 005PARIS
You can see a structure like this :
- prefix
- field
- size
- value
Forthermore, you can have more than 1 record in a single rdi file.
Now, what I would like in XML output would be like this :
<Documents>
<Document>
<title>MR</title>
<lastname>DOE</lastname>
<firstname>John</firstname>
</Document>
<Document>
<title>MME</title>
<lastname>DOE</lastname>
<firstname>Jane</firstname>
</Document>
</Documents>
I've used tFileList, tFileInputPositional, tFilterRow, tMap and tFileOutputXML components.
On tFilterRow, I've added these conditions :
- field = B_TITRE or field = B_LASTNAME or field = B_FIRSTNAME
These on tMap I've mapped fieldnames :
- B_TITLE with title
- B_LASTNAME with lastname
- B_FIRSTNAME with firstname
Unfortunately I did no succeed in naming output xml fields.
I've this :
<Documents>
<Document>
<field>B_TITLE</field>
<value>MR</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>
<Document>
<field>B_FIRSTNAME</field>
<value>John</value>
</Document>
<Document>
<field>B_TITLE</field>
<value>MME</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>
<Document>
<field>B_FIRSTNAME</field>
<value>Jane</value>
</Document>
</Documents>
Thank you for your help !
Nicolas
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Of course !
So first, I used a tFilterRow to retrieve on the fields that interest me.
On advanced mode, I put this code (don't forget the space after field name) :
input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")
Then I used a tReplace component with the following regexp :
Pattern : "H046A.*"
Replaced with : "<Document>"
Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool
Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"
Replaced with : "</Document>"
Description : this pattern identifies the ending of a document inside a whole batch rdi spool
Pattern : "DMAIN.{36}(.{131}).{3}(.*)"
Replaced with : "\t<$1>$2</$1>"
Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :
- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces
- column 42 to 172 (131 columns) : field name with spaces after
- column 173 to 175 (3 columns) : field value size
- columns 176 to the end of line : field value
This we retrieve field name in $1 and field value in $2
Pattern : " "
Replaced with : ""
Description : in order to delete spaces in field name
Pattern : " >"
Replaced with : ">"
Description : in order to delete some remaining spaces
Pattern : "B_TITLE"
Replaced with : "title"
Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)
And that's it.
I hope it's understandable.
If there is a better or easier solution, I'm interested !
Regards,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank for your reply François.
Finally, I've used a tFileInputDelimited and a tReplace with regexp to retrieve fields.
Regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @nhalicka,
Would you mind sharing your regexp in tReplace component on forum?
We will appreciate it a lot.
Best regards
Sabrina

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Of course !
So first, I used a tFilterRow to retrieve on the fields that interest me.
On advanced mode, I put this code (don't forget the space after field name) :
input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")
Then I used a tReplace component with the following regexp :
Pattern : "H046A.*"
Replaced with : "<Document>"
Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool
Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"
Replaced with : "</Document>"
Description : this pattern identifies the ending of a document inside a whole batch rdi spool
Pattern : "DMAIN.{36}(.{131}).{3}(.*)"
Replaced with : "\t<$1>$2</$1>"
Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :
- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces
- column 42 to 172 (131 columns) : field name with spaces after
- column 173 to 175 (3 columns) : field value size
- columns 176 to the end of line : field value
This we retrieve field name in $1 and field value in $2
Pattern : " "
Replaced with : ""
Description : in order to delete spaces in field name
Pattern : " >"
Replaced with : ">"
Description : in order to delete some remaining spaces
Pattern : "B_TITLE"
Replaced with : "title"
Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)
And that's it.
I hope it's understandable.
If there is a better or easier solution, I'm interested !
Regards,
