Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Transform a rdi file into a xml file

Hi everybody,

 

I'm a total newbie on Talend and I'm trying to transform a rdi file into a xml file.

What is a rdi file ? It's a text file coming from SAP (RDI = Row Data Interface)

Its structure is columns fixed, here below is a sample :

 

H046A011000000653776FZSCRIPT_FACTURE PRINTER 

DMAIN    XX        B_TITLE             002MR
DMAIN              B_LASTNAME          003DOE
DMAIN              B_FIRSTNAME         004John
DMAIN              B_STREET            0231 RUE DE LA TOUR EIFFEL
DMAIN              B_POSTCODE          00575000
DMAIN              B_CITY              005PARIS

H046A011000000653776FZSCRIPT_FACTURE PRINTER 

DMAIN    XX        B_TITLE             002MME
DMAIN              B_LASTNAME          003DOE
DMAIN              B_FIRSTNAME         004Jane
DMAIN              B_STREET            0233 RUE DES CHAMPS ELYSEES
DMAIN              B_POSTCODE          00575000
DMAIN              B_CITY              005PARIS

 

You can see a structure like this :

- prefix

- field

- size

- value

Forthermore, you can have more than 1 record in a single rdi file.

 

Now, what I would like in XML output would be like this :

 

<Documents>
<Document>
<title>MR</title>
<lastname>DOE</lastname>
<firstname>John</firstname>
</Document>
<Document>
<title>MME</title>
<lastname>DOE</lastname>
<firstname>Jane</firstname>
</Document>
</Documents>

 

I've used tFileList, tFileInputPositional, tFilterRow, tMap and tFileOutputXML components.

On tFilterRow, I've added these conditions :

- field = B_TITRE or field = B_LASTNAME or field = B_FIRSTNAME

These on tMap I've mapped fieldnames :

- B_TITLE with title

- B_LASTNAME with lastname

- B_FIRSTNAME with firstname

 

Unfortunately I did no succeed in naming output xml fields.

I've this :

 

<Documents>
<Document>
<field>B_TITLE</field>
<value>MR</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>

<Document>
<field>B_FIRSTNAME</field>
<value>John</value>
</Document>

<Document>
<field>B_TITLE</field>
<value>MME</value>
</Document>
<Document>
<field>B_LASTNAME</field>
<value>DOE</value>
</Document>

<Document>
<field>B_FIRSTNAME</field>
<value>Jane</value>
</Document>

</Documents>

 

Thank you for your help !

 

Nicolas

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

 

Of course !

 

So first, I used a tFilterRow to retrieve on the fields that interest me.

On advanced mode, I put this code (don't forget the space after field name) :

 

input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")

 

Then I used a tReplace component with the following regexp :

 

Pattern : "H046A.*"

Replaced with : "<Document>"

Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool

 

Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"

Replaced with : "</Document>"

Description : this pattern identifies the ending of a document inside a whole batch rdi spool

 

Pattern : "DMAIN.{36}(.{131}).{3}(.*)"

Replaced with : "\t<$1>$2</$1>"

Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :

- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces 

- column 42 to 172 (131 columns) : field name with spaces after

- column 173 to 175 (3 columns) : field value size

- columns 176 to the end of line : field value

This we retrieve field name in $1 and field value in $2

 

Pattern : "  "

Replaced with : ""

Description : in order to delete spaces in field name

 

Pattern : " >"

Replaced with : ">"

Description : in order to delete some remaining spaces

 

Pattern : "B_TITLE"

Replaced with : "title"

Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)

 

And that's it.

I hope it's understandable.

If there is a better or easier solution, I'm interested !

 

Regards,

View solution in original post

4 Replies
fdenis
Master
Master

you have to use tDenormalize to have one row by documents.
Anonymous
Not applicable
Author

Thank for your reply François.

 

Finally, I've used a tFileInputDelimited and a tReplace with regexp to retrieve fields.

 

Regards

Anonymous
Not applicable
Author

Hello @nhalicka,

Would you mind sharing your regexp in tReplace component on forum?

We will appreciate it a lot.

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi,

 

Of course !

 

So first, I used a tFilterRow to retrieve on the fields that interest me.

On advanced mode, I put this code (don't forget the space after field name) :

 

input_row.newColumn.contains("H046A")
||input_row.newColumn.contains("B_TITLE ")
||input_row.newColumn.contains("B_LASTNAME ")
||input_row.newColumn.contains("B_FIRTSNAME ")
||input_row.newColumn.contains("B_STREET ")
||input_row.newColumn.contains("B_POSTCODE ")
||input_row.newColumn.contains("B_CITY ")
||input_row.newColumn.contains("CRDI-CONTROL %%LINES-END FIN_DOCUMENT")

 

Then I used a tReplace component with the following regexp :

 

Pattern : "H046A.*"

Replaced with : "<Document>"

Description : H046A is a tag that identifies the beginning of a document inside a whole batch rdi spool

 

Pattern : "CRDI-CONTROL %%LINES-END FIN_DOCUMENT TEXT ST FR"

Replaced with : "</Document>"

Description : this pattern identifies the ending of a document inside a whole batch rdi spool

 

Pattern : "DMAIN.{36}(.{131}).{3}(.*)"

Replaced with : "\t<$1>$2</$1>"

Description : this pattern extracts the name and the value of a field inside the rdi spool according the following structure :

- column 1 to 41 (41 columns) : field prefix with DMAIN and 36 spaces 

- column 42 to 172 (131 columns) : field name with spaces after

- column 173 to 175 (3 columns) : field value size

- columns 176 to the end of line : field value

This we retrieve field name in $1 and field value in $2

 

Pattern : "  "

Replaced with : ""

Description : in order to delete spaces in field name

 

Pattern : " >"

Replaced with : ">"

Description : in order to delete some remaining spaces

 

Pattern : "B_TITLE"

Replaced with : "title"

Description : in order to rename field (you can do the same for B_LASTNAME, B_FIRSTNAME, ...)

 

And that's it.

I hope it's understandable.

If there is a better or easier solution, I'm interested !

 

Regards,