Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Using HTML tag on xml file from txmlmap

Hello, I have a XML file with many rows, and in my tXmlMap, I need one rows who contains html

0683p000009LscJ.png

In this rows, I use the html tag in my tXmlMap, but he stop read at the first line and talend send me error

ORA-01400: Cannot insert NULL into ("DB"."table"."column")

But, other xml file with many html rows , its working for exemple, after my <p> Hello, </p> I press enter to make a new line, its working

 

Edit: I tried to use this

StringHandling.EREPLACE(row2.html,"</p>","</p><br>") 

but nothing

 

 

Labels (1)
42 Replies
Anonymous
Not applicable
Author

I don't understand what you are saying as there is ambiguity in what you're telling me and no examples given. This is likely because of a language barrier (....and your English is much better than my ....any other language 🙂 ). So, lets try this. I'll write out what I think you are asking and would you post exactly the XML you are trying and failing with. You say it doesn't work with ONE <p>, but I believe every piece of XML you have sent has had TWO <p>. 

 

Do you want the text between the <html> tags with NO html code OR do you want all of the text and code?

 

Anonymous
Not applicable
Author

Yea, sorry, i'm very bad to explain hard problem 0683p000009MAiG.png

 

So, Its doesnt work with only one <p> tag like this xml file, but its working for 2 or more.

I'm using this XPath query

"my:DATA/my:Description/html"

I just want the data between html code , because I can have <span> into <p> tag or <table>  , and I insert all data into Oracle DB

 


Besoin.xml
Anonymous
Not applicable
Author

OK, this is because the XML parser assumes that the <html> and <p> are part of the XML. Since there is only 1 <p>, it sees the XML as broken. This is actually badly formatted XML. XML or HTML within XML should be held in a <![CDATA[ section or maybe encoded to base64.

 

Do you have any control over the content of these XML files? If so, ensure that ALL opening tags have closing tags. If not, read the file in as a String, search for <p> and </p>, then remove them. After that convert the String to XML using a tConvertType component and process using the XPath you have.

 

I know this sounds like a pain, but XML parsers are quite strict with this. Microsoft are bending the rules again.....

Anonymous
Not applicable
Author

So yea its badly formated, I can't have acces because I'm using tSoap and get the file on Microsoft SharePoint...

How can I remove the <p> ?

Because I have tried to use on my tXMLmap

StringHandling.EREPLACE(row2.html,"<p>","<p></p><p>"

and

StringHandling.EREPLACE(row2.html,"</p>","<br/></p>") 

But its stil same problem and don't show the html

 

Anonymous
Not applicable
Author

OK. I have a bit of a hack you can use. It is relatively convoluted, but it works. This is how you do it.

 

1) Read the data in as a String using a tFileInputRaw component.

2) The schema of the tFileInputRaw component will be a column called "content" of type String. Connect this to a tConvertType and convert it to a String.

3) Connect a tJavaFlex and use the code below.....

row11.content = row12.content.replaceAll("<p>", "").replaceAll("</p>", "").replaceAll("\\<\\?xml(.+?)\\?\\>", "").replaceAll("\\<\\?mso(.+?)\\?\\>", "").replaceAll("\\s{2,}", "").replaceAll("[^\\x20-\\x7e]", "").trim();

This removes all of the rubbish that Talend does not like and creates a reasonably well formatted piece of XML. It also removes all <p> and </p> tags.

4) Connect this output to a tConvertType and convert the String to a Document.

5) Connect to a tExtractXMLField and use the XPaths you used before.

 

This works. I've tried it with your sample file with and without unmatched <p> tags.

Anonymous
Not applicable
Author

I'm sorry I'm lost 0683p000009MAiG.png

Do you have a screen of the job you'have create ? ^^'

Anonymous
Not applicable
Author

The file is read by the tFileInputRaw. The code goes in the tJavaFlex. The rest is explained by my last post.
0683p000009Ls7O.png

Anonymous
Not applicable
Author

0683p000009LshL.png

I got error on tJavaFlex

Exception in component tJavaFlex_1
java.lang.NullPointerException
Anonymous
Not applicable
Author

You tConvertType is the problem. Either tick "Auto Cast" (catches me out all the time) or configure the manual cast. 

Anonymous
Not applicable
Author

Did this work for you?