Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
vvazza10
Contributor III
Contributor III

tXMLMap taking long time to process XML data to load to snowflake

I have an XML file (2MB) that was various child elements - Employee, chemical, electrical, transport, mechanical, which I am trying to load to Snowflake table. When the job was executed, tXMLMap seem to take most of the time in processing the data.

0695b00000EZuXvAAL.png

The tXMLMap read is taking around 15-20 minutes to complete for a small set of data. The job design would look as below -

0695b00000EZuZNAA1.png

Please advise how to design a job that reads XML which contains various sub elements and attributes.

11 Replies
gjeremy1617088143

Hi, maybe you can cut your job in two, first read the xml transform it and write the output on other files.

then read the files to load data to snowflakes and by this way you can also load all your data in parallel if you make each (read --> load) in separate job.

Send Me Love and Kudos

vvazza10
Contributor III
Contributor III
Author

Thanks for the response ! Are you suggesting to transform the xml and write it to a file in S3 and load to snowflake from there? Could you please tell me what components i can use?

gjeremy1617088143

0695b00000Ea1vfAAB.jpg0695b00000Ea1vaAAB.jpg0695b00000Ea1vQAAR.jpg

gjeremy1617088143

so first you write all your data in files, then you run a job which execute all your output in parrallel in simple job (read csv --> load snowflake).

 You could use the component textractxmlfield wich is lighter than txmlMap.

And you could also increase your Jvm max memory for better performance

vvazza10
Contributor III
Contributor III
Author

Thank you ! Let me try this approach. I still feel the tXMLMap is going to take time to process the data. Even in the approach that i followed, the snowflake load happened in seconds, but the tXMLMap processing took a good amount of time. Am thinking if it could be because of the loop conditions set to read XML.

vvazza10
Contributor III
Contributor III
Author

@guenneguez jeremy​  - I went with the approach you had suggested. The xml read completed in 0.64 seconds, but tXMLMap took 20 minutes to spit out to delimited files. What is going wrong here?

gjeremy1617088143

so an other way : use textractxmlfield for each output instead of one xmlMap. And run them in parrallel.

 

vvazza10
Contributor III
Contributor III
Author

@guenneguez jeremy​ - A small correction. The first approach of tfileInputXML -> tXMLMap -> tFileDelimitedOutput

 

and parallel runs of tFileDelimitedInput -> tSnowflakeOutput

 

took 7 minutes to complete.

 

Let me try using tExtractXMLField component and check how long it takes

vvazza10
Contributor III
Contributor III
Author

@guenneguez jeremy​ @Shicong Hong​ Jeremy, in your approach, won't the same XML file be read multiple times based on number of tExtractXMLField that's there in the design? I would want XML to be read once and written to multiple tables. How can this be achieved?