Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Transform a rdi file into a xml file

Hi everybody,

 

I'm a total newbie on Talend and I'm trying to transform a rdi file into a xml file.

What is a rdi file ? It's a text file coming from SAP (RDI = Row Data Interface)

Its structure is columns fixed, here below is a sample :

 

H046A011000000653776FZSCRIPT_FACTURE PRINTER 

DMAIN    XX        B_TITLE             002MR
DMAIN              B_LASTNAME          003DOE
DMAIN              B_FIRSTNAME         004John
DMAIN              B_STREET            0231 RUE DE LA TOUR EIFFEL
DMAIN              B_POSTCODE          00575000
DMAIN              B_CITY              005PARIS

H046A011000000653776FZSCRIPT_FACTURE PRINTER 

DMAIN    XX        B_TITLE             002MME
DMAIN              B_LASTNAME          003DOE
DMAIN              B_FIRSTNAME         004Jane
DMAIN              B_STREET            0233 RUE DES CHAMPS ELYSEES
DMAIN              B_POSTCODE          00575000
DMAIN              B_CITY              005PARIS

 

You can see a structure like this :

- prefix

- field

- size

- value

Forthermore, you can have more than 1 record in a single rdi file.

 

Now, what I would like in XML output would be like this :

 

<Documents>
<Document>
<title>MR</title>
<lastname>DOE</lastname>
<firstname>John</firstname>
</Document>
<Document>
<title>MME</title>
<lastname>DOE</lastname>
<firstname>Jane</firstname>
</Document>
</Documents>

 

I've used tFileList, tFileInputPositional, tFilterRow, tMap and tFileOutputXML components.

On tFilterRow, I've added these conditions :

- field = B_TITRE or field = B_LASTNAME or field = B_FIRSTNAME

These on tMap I've mapped fieldnames :

- B_TITLE with title

- B_LASTNAME with lastname

- B_FIRSTNAME with firstname

 

Unfortunately I did no succeed in naming output xml fields.

I've this :

 

<Documents>
<Document>
<field>B_TITLE</field>
<value>MR</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>

<Document>
<field>B_FIRSTNAME</field>
<value>John</value>
</Document>

<Document>
<field>B_TITLE</field>
<value>MME</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>

<Document>
<field>B_FIRSTNAME</field>
<value>Jane</value>
</Document>

</Documents>

 

Thank you for your help !

 

Nicolas

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

Of course !

 

So first, I used a tFilterRow to retrieve on the fields that interest me.

On advanced mode, I put this code (don't forget the space after field name) :

 

input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")

 

Then I used a tReplace component with the following regexp :

 

Pattern : "H046A.*"

Replaced with : "<Document>"

Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool

 

Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"

Replaced with : "</Document>"

Description : this pattern identifies the ending of a document inside a whole batch rdi spool

 

Pattern : "DMAIN.{36}(.{131}).{3}(.*)"

Replaced with : "\t<$1>$2</$1>"

Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :

- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces 

- column 42 to 172 (131 columns) : field name with spaces after

- column 173 to 175 (3 columns) : field value size

- columns 176 to the end of line : field value

This we retrieve field name in $1 and field value in $2

 

Pattern : "  "

Replaced with : ""

Description : in order to delete spaces in field name

 

Pattern : " >"

Replaced with : ">"

Description : in order to delete some remaining spaces

 

Pattern : "B_TITLE"

Replaced with : "title"

Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)

 

And that's it.

I hope it's understandable.

If there is a better or easier solution, I'm interested !

 

Regards,

View solution in original post

4 Replies
fdenis
Master

you have to use tDenormalize to have one row by documents.
Anonymous
Not applicable
Author

Thank for your reply François.

 

Finally, I've used a tFileInputDelimited and a tReplace with regexp to retrieve fields.

 

Regards

Anonymous
Not applicable
Author

Hello @nhalicka,

Would you mind sharing your regexp in tReplace component on forum?

We will appreciate it a lot.

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi,

 

Of course !

 

So first, I used a tFilterRow to retrieve on the fields that interest me.

On advanced mode, I put this code (don't forget the space after field name) :

 

input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")

 

Then I used a tReplace component with the following regexp :

 

Pattern : "H046A.*"

Replaced with : "<Document>"

Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool

 

Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"

Replaced with : "</Document>"

Description : this pattern identifies the ending of a document inside a whole batch rdi spool

 

Pattern : "DMAIN.{36}(.{131}).{3}(.*)"

Replaced with : "\t<$1>$2</$1>"

Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :

- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces 

- column 42 to 172 (131 columns) : field name with spaces after

- column 173 to 175 (3 columns) : field value size

- columns 176 to the end of line : field value

This we retrieve field name in $1 and field value in $2

 

Pattern : "  "

Replaced with : ""

Description : in order to delete spaces in field name

 

Pattern : " >"

Replaced with : ">"

Description : in order to delete some remaining spaces

 

Pattern : "B_TITLE"

Replaced with : "title"

Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)

 

And that's it.

I hope it's understandable.

If there is a better or easier solution, I'm interested !

 

Regards,