Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello, I have a XML file with many rows, and in my tXmlMap, I need one rows who contains html
In this rows, I use the html tag in my tXmlMap, but he stop read at the first line and talend send me error
ORA-01400: Cannot insert NULL into ("DB"."table"."column")
But, other xml file with many html rows , its working for exemple, after my <p> Hello, </p> I press enter to make a new line, its working
Edit: I tried to use this
StringHandling.EREPLACE(row2.html,"</p>","</p><br>")
but nothing
I don't understand what you are saying as there is ambiguity in what you're telling me and no examples given. This is likely because of a language barrier (....and your English is much better than my ....any other language 🙂 ). So, lets try this. I'll write out what I think you are asking and would you post exactly the XML you are trying and failing with. You say it doesn't work with ONE <p>, but I believe every piece of XML you have sent has had TWO <p>.
Do you want the text between the <html> tags with NO html code OR do you want all of the text and code?
Yea, sorry, i'm very bad to explain hard problem
So, Its doesnt work with only one <p> tag like this xml file, but its working for 2 or more.
I'm using this XPath query
"my:DATA/my:Description/html"
I just want the data between html code , because I can have <span> into <p> tag or <table> , and I insert all data into Oracle DB
OK, this is because the XML parser assumes that the <html> and <p> are part of the XML. Since there is only 1 <p>, it sees the XML as broken. This is actually badly formatted XML. XML or HTML within XML should be held in a <![CDATA[ section or maybe encoded to base64.
Do you have any control over the content of these XML files? If so, ensure that ALL opening tags have closing tags. If not, read the file in as a String, search for <p> and </p>, then remove them. After that convert the String to XML using a tConvertType component and process using the XPath you have.
I know this sounds like a pain, but XML parsers are quite strict with this. Microsoft are bending the rules again.....
So yea its badly formated, I can't have acces because I'm using tSoap and get the file on Microsoft SharePoint...
How can I remove the <p> ?
Because I have tried to use on my tXMLmap
StringHandling.EREPLACE(row2.html,"<p>","<p></p><p>"
and
StringHandling.EREPLACE(row2.html,"</p>","<br/></p>")
But its stil same problem and don't show the html
OK. I have a bit of a hack you can use. It is relatively convoluted, but it works. This is how you do it.
1) Read the data in as a String using a tFileInputRaw component.
2) The schema of the tFileInputRaw component will be a column called "content" of type String. Connect this to a tConvertType and convert it to a String.
3) Connect a tJavaFlex and use the code below.....
row11.content = row12.content.replaceAll("<p>", "").replaceAll("</p>", "").replaceAll("\\<\\?xml(.+?)\\?\\>", "").replaceAll("\\<\\?mso(.+?)\\?\\>", "").replaceAll("\\s{2,}", "").replaceAll("[^\\x20-\\x7e]", "").trim();
This removes all of the rubbish that Talend does not like and creates a reasonably well formatted piece of XML. It also removes all <p> and </p> tags.
4) Connect this output to a tConvertType and convert the String to a Document.
5) Connect to a tExtractXMLField and use the XPaths you used before.
This works. I've tried it with your sample file with and without unmatched <p> tags.
I'm sorry I'm lost
Do you have a screen of the job you'have create ? ^^'
The file is read by the tFileInputRaw. The code goes in the tJavaFlex. The rest is explained by my last post.
I got error on tJavaFlex
Exception in component tJavaFlex_1 java.lang.NullPointerException
You tConvertType is the problem. Either tick "Auto Cast" (catches me out all the time) or configure the manual cast.
Did this work for you?