Skip to main content
Announcements
A fresh, new look for the Data Integration & Quality forums and navigation! Read more about what's changed.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

XML file with carriage return

Hi there I am trying to parse an xml file into a postgres db and everything seems to be working fine going from the tfileInputXML to the tMap and finally onto the tPostgresqlOutput the only problem is the way my xml files are set up it does not take in all the details for example here is what the xml file contains
<id>1234</id>
<name> John Smith</name>
<address> 123 Main Street
SomeTown
SomeCountry</address>
The problem is only the id 1234 the name John Smith and part of the address get entered 123 Main Street but because of the carriage return I am missing the rest of the address is there anyway to rectify this problem in Talend or will all of the xml files need to be edited to remove the carriage returns.
Labels (4)
10 Replies
Anonymous
Not applicable
Author

I think I have heard of this before with regard to Postrgres dbs. I may be wrong. Have you tried using a tLogRow after the    tfileInputXML component to see if Talend is actually reading the full address? I suspect it will be. If it is not, this could be a Talend bug. What version are you using?
Anonymous
Not applicable
Author

Hi there thanks for the quick response. I just set it up with a tLogRow component there and yes it does seem to take the full address after the tFileInputXML component however it still puts the 2nd line of the address onto a new line like so
1234|John Smith|123 Main Street
SomeTown
SomeCountry|

I am using Talend Open Studio v 5.1.1
Anonymous
Not applicable
Author

That is entirely expected (that it would keep the carriage return and/or line feed when displaying the record with the tLogRow). I suspect that this is an issue with either Postgres or the Postgres Talend component. Would it be a deal breaker to remove the carriage returns?
Anonymous
Not applicable
Author

Hi rhall_2.0 unfortunately yes it would be a deal breaker as you put it to remove all the carriage returns as I am dealing with around 20,000 separate xml files most with the problem and only a few without judging by manual checking. I cannot seem to find a setting anywhere within talend to allow it to recognize a /n or carriage return or linefeed.
Does this mean I will need to implement another kind of solution in order to rectify this error.
Anonymous
Not applicable
Author

Maybe try this method. Create a routine with this method in it...
public static String removeChar(int ascii, String value, String replaceVal) {
String returnVal = null;

if(value!=null){
char asciiChar = (char)ascii;
String replaceString = ""+asciiChar;
returnVal = value.replaceAll(replaceString, replaceVal);
}
return returnVal;
}

You need to supply the ascii character number that represents the characters that are causing you an issue. You can find them here.
If you call this method in a tXMLMap (for example) for every element that you get this problem with, it will edit the string as you process it. You will therefore not have to worry about doing it manually.
You will need to try a few things out, but I think this should work.
Note: I wrote this method in this post and it is not tested. You will need to test it first.
Anonymous
Not applicable
Author

Hi thanks for the response again, which component should I be used in order to implement this piece of code? tJava, tJavaFlex or tJavaRow? and also where in my job flow I should be placing it?
Anonymous
Not applicable
Author

Create a routine (under the "Code" section in your project tree) and add this method. If you create a routine called MyRoutine, then you can use this method anywhere that you wish by using the following code.....
routines.MyRoutine.removeChar(13,row1.column1, "")

The above would replace carriage returns (13) in the column called "column1" from row1 with an empty String.
I would recommend using it in a tMap or tXMLMap for the columns/entities that need to use it. If you are dealing with a carriage return and a line feed you may need to edit the method to deal with both or call it twice with different ascii numbers.
Anonymous
Not applicable
Author

Hi thanks again I managed to work that out after a bit of googling around talend routines, I have applied it using just the carriage return value(13) and unfortunately did not work that way, going to try to edit the java code to include both a carriage return and a line feed I will return if I run into further problems and still retrieve the full data.
Anonymous
Not applicable
Author

As an experiment to find out what combination of characters you need to remove, you can try something like this....
routines.MyRoutine.removeChar(13,routines.MyRoutine.removeChar(10,row1.column1, ""), "")

The method returns a String so you can nest it inside another call to the same method. It will work from the inside out, so in the example above the ascii 10 chars are removed first, then the ascii 13 ones.
This may need some experimentation until you nail the characters you need to find and remove.