
Anonymous
Not applicable
2015-07-22
08:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
XML file with carriage return
Hi there I am trying to parse an xml file into a postgres db and everything seems to be working fine going from the tfileInputXML to the tMap and finally onto the tPostgresqlOutput the only problem is the way my xml files are set up it does not take in all the details for example here is what the xml file contains
<id>1234</id>
<name> John Smith</name>
<address> 123 Main Street
SomeTown
SomeCountry</address>
The problem is only the id 1234 the name John Smith and part of the address get entered 123 Main Street but because of the carriage return I am missing the rest of the address is there anyway to rectify this problem in Talend or will all of the xml files need to be edited to remove the carriage returns.
<id>1234</id>
<name> John Smith</name>
<address> 123 Main Street
SomeTown
SomeCountry</address>
The problem is only the id 1234 the name John Smith and part of the address get entered 123 Main Street but because of the carriage return I am missing the rest of the address is there anyway to rectify this problem in Talend or will all of the xml files need to be edited to remove the carriage returns.
683 Views
- « Previous Replies
-
- 1
- 2
- Next Replies »
10 Replies

Anonymous
Not applicable
2015-07-22
09:24 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I have heard of this before with regard to Postrgres dbs. I may be wrong. Have you tried using a tLogRow after the
tfileInputXML component to see if Talend is actually reading the full address? I suspect it will be. If it is not, this could be a Talend bug. What version are you using?
619 Views

Anonymous
Not applicable
2015-07-22
10:09 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there thanks for the quick response. I just set it up with a tLogRow component there and yes it does seem to take the full address after the tFileInputXML component however it still puts the 2nd line of the address onto a new line like so
1234|John Smith|123 Main Street
SomeTown
SomeCountry|
I am using Talend Open Studio v 5.1.1
1234|John Smith|123 Main Street
SomeTown
SomeCountry|
I am using Talend Open Studio v 5.1.1
619 Views

Anonymous
Not applicable
2015-07-22
10:19 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is entirely expected (that it would keep the carriage return and/or line feed when displaying the record with the tLogRow). I suspect that this is an issue with either Postgres or the Postgres Talend component. Would it be a deal breaker to remove the carriage returns?
619 Views

Anonymous
Not applicable
2015-07-22
10:50 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rhall_2.0 unfortunately yes it would be a deal breaker as you put it to remove all the carriage returns as I am dealing with around 20,000 separate xml files most with the problem and only a few without judging by manual checking. I cannot seem to find a setting anywhere within talend to allow it to recognize a /n or carriage return or linefeed.
Does this mean I will need to implement another kind of solution in order to rectify this error.
Does this mean I will need to implement another kind of solution in order to rectify this error.
619 Views

Anonymous
Not applicable
2015-07-22
11:06 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe try this method. Create a routine with this method in it...
You need to supply the ascii character number that represents the characters that are causing you an issue. You can find them here.
If you call this method in a tXMLMap (for example) for every element that you get this problem with, it will edit the string as you process it. You will therefore not have to worry about doing it manually.
You will need to try a few things out, but I think this should work.
Note: I wrote this method in this post and it is not tested. You will need to test it first.
public static String removeChar(int ascii, String value, String replaceVal) {
String returnVal = null;
if(value!=null){
char asciiChar = (char)ascii;
String replaceString = ""+asciiChar;
returnVal = value.replaceAll(replaceString, replaceVal);
}
return returnVal;
}
You need to supply the ascii character number that represents the characters that are causing you an issue. You can find them here.
If you call this method in a tXMLMap (for example) for every element that you get this problem with, it will edit the string as you process it. You will therefore not have to worry about doing it manually.
You will need to try a few things out, but I think this should work.
Note: I wrote this method in this post and it is not tested. You will need to test it first.
619 Views

Anonymous
Not applicable
2015-07-23
05:32 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi thanks for the response again, which component should I be used in order to implement this piece of code? tJava, tJavaFlex or tJavaRow? and also where in my job flow I should be placing it?
619 Views

Anonymous
Not applicable
2015-07-23
05:51 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Create a routine (under the "Code" section in your project tree) and add this method. If you create a routine called MyRoutine, then you can use this method anywhere that you wish by using the following code.....
The above would replace carriage returns (13) in the column called "column1" from row1 with an empty String.
I would recommend using it in a tMap or tXMLMap for the columns/entities that need to use it. If you are dealing with a carriage return and a line feed you may need to edit the method to deal with both or call it twice with different ascii numbers.
routines.MyRoutine.removeChar(13,row1.column1, "")
The above would replace carriage returns (13) in the column called "column1" from row1 with an empty String.
I would recommend using it in a tMap or tXMLMap for the columns/entities that need to use it. If you are dealing with a carriage return and a line feed you may need to edit the method to deal with both or call it twice with different ascii numbers.
619 Views

Anonymous
Not applicable
2015-07-23
06:15 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi thanks again I managed to work that out after a bit of googling around talend routines, I have applied it using just the carriage return value(13) and unfortunately did not work that way, going to try to edit the java code to include both a carriage return and a line feed I will return if I run into further problems and still retrieve the full data.
619 Views

Anonymous
Not applicable
2015-07-23
06:24 AM
Author
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As an experiment to find out what combination of characters you need to remove, you can try something like this....
The method returns a String so you can nest it inside another call to the same method. It will work from the inside out, so in the example above the ascii 10 chars are removed first, then the ascii 13 ones.
This may need some experimentation until you nail the characters you need to find and remove.
routines.MyRoutine.removeChar(13,routines.MyRoutine.removeChar(10,row1.column1, ""), "")
The method returns a String so you can nest it inside another call to the same method. It will work from the inside out, so in the example above the ascii 10 chars are removed first, then the ascii 13 ones.
This may need some experimentation until you nail the characters you need to find and remove.
619 Views

- « Previous Replies
-
- 1
- 2
- Next Replies »