Solved: Re: Using HTML tag on xml file from txmlmap - Page 3 - Qlik Community

Anonymous · ‎2018-02-14

Hello, I have a XML file with many rows, and in my tXmlMap, I need one rows who contains html

In this rows, I use the html tag in my tXmlMap, but he stop read at the first line and talend send me error

ORA-01400: Cannot insert NULL into ("DB"."table"."column")

But, other xml file with many html rows , its working for exemple, after my Hello, I press enter to make a new line, its working

Edit: I tried to use this

StringHandling.EREPLACE(row2.html,"</p>","</p><br>")

but nothing

Anonymous · ‎2018-02-16

I don't understand what you are saying as there is ambiguity in what you're telling me and no examples given. This is likely because of a language barrier (....and your English is much better than my ....any other language 🙂 ). So, lets try this. I'll write out what I think you are asking and would you post exactly the XML you are trying and failing with. You say it doesn't work with ONE , but I believe every piece of XML you have sent has had TWO .

Do you want the text between the <html> tags with NO html code OR do you want all of the text and code?

Anonymous · ‎2018-02-16

Yea, sorry, i'm very bad to explain hard problem

So, Its doesnt work with only one tag like this xml file, but its working for 2 or more.

I'm using this XPath query

"my:DATA/my:Description/html"

I just want the data between html code , because I can have into tag or <table> , and I insert all data into Oracle DB

Besoin.xml

Anonymous · ‎2018-02-16

OK, this is because the XML parser assumes that the <html> and are part of the XML. Since there is only 1 , it sees the XML as broken. This is actually badly formatted XML. XML or HTML within XML should be held in a <![CDATA[ section or maybe encoded to base64.

Do you have any control over the content of these XML files? If so, ensure that ALL opening tags have closing tags. If not, read the file in as a String, search for and , then remove them. After that convert the String to XML using a tConvertType component and process using the XPath you have.

I know this sounds like a pain, but XML parsers are quite strict with this. Microsoft are bending the rules again.....

Anonymous · ‎2018-02-16

So yea its badly formated, I can't have acces because I'm using tSoap and get the file on Microsoft SharePoint...

How can I remove the ?

Because I have tried to use on my tXMLmap

StringHandling.EREPLACE(row2.html,"<p>","<p></p><p>"

and

StringHandling.EREPLACE(row2.html,"</p>","<br/></p>")

But its stil same problem and don't show the html

Anonymous · ‎2018-02-16

OK. I have a bit of a hack you can use. It is relatively convoluted, but it works. This is how you do it.

1) Read the data in as a String using a tFileInputRaw component.

2) The schema of the tFileInputRaw component will be a column called "content" of type String. Connect this to a tConvertType and convert it to a String.

3) Connect a tJavaFlex and use the code below.....

row11.content = row12.content.replaceAll("<p>", "").replaceAll("</p>", "").replaceAll("\\<\\?xml(.+?)\\?\\>", "").replaceAll("\\<\\?mso(.+?)\\?\\>", "").replaceAll("\\s{2,}", "").replaceAll("[^\\x20-\\x7e]", "").trim();

This removes all of the rubbish that Talend does not like and creates a reasonably well formatted piece of XML. It also removes all and tags.

4) Connect this output to a tConvertType and convert the String to a Document.

5) Connect to a tExtractXMLField and use the XPaths you used before.

This works. I've tried it with your sample file with and without unmatched tags.

Anonymous · ‎2018-02-16

I'm sorry I'm lost

Do you have a screen of the job you'have create ? ^^'

Anonymous · ‎2018-02-16

The file is read by the tFileInputRaw. The code goes in the tJavaFlex. The rest is explained by my last post.

Anonymous · ‎2018-02-16

I got error on tJavaFlex

Exception in component tJavaFlex_1
java.lang.NullPointerException

Anonymous · ‎2018-02-16

You tConvertType is the problem. Either tick "Auto Cast" (catches me out all the time) or configure the manual cast.

Anonymous · ‎2018-02-16

Did this work for you?

Using HTML tag on xml file from txmlmap

Talend Data Integration