Solved: [resolved] Automatically trim non-String fields? - Qlik Community

alevy · ‎2011-03-30

Just wanted to put a thought out there before I submit a feature request:
One of our newbies was having difficulty because the input file he was trying to use in a job had unnecessary spaces trailing the data for a column defined in the schema as Integer, which caused every row to be rejected. It seems to me that it would be appropriate for all input components to automatically trim any numeric column before attempting to parse it to the correct data type, regardless of the advanced trim settings.
Comments, please?

alevy · ‎2011-04-11

While I certainly understand janhess's point of view, I also look at SQL Server which happily equates a number to a string with leading or trailing spaces i.e. 1 = ' 1 '.
However, there is an easy solution ( if you understand the problem in the first place). Also, making a change like this to tFileInput suggests, for consistency, tSchemaComplianceCheck and tConvertType should also be changed...
So, for now, I've left well-enough alone. (I've already submitted enough feature requests

! Someone else can have a go if they feel more strongly about it.)

View solution in original post

Anonymous · ‎2011-03-31

Hi
There will be a lot of work to do, we need update the javajet file of all input components, This is unacceptable changes. ;(
Best regards
Shong

Anonymous · ‎2011-03-31

Just wanted to put a thought out there before I submit a feature request:
One of our newbies was having difficulty because the input file he was trying to use in a job had unnecessary spaces trailing the data for a column defined in the schema as Integer, which caused every row to be rejected. It seems to me that it would be appropriate for all input components to automatically trim any numeric column before attempting to parse it to the correct data type, regardless of the advanced trim settings.
Comments, please?

Yes it sound reasonable to me... the user declared clearly its intent to get an Integer from the flow, so it should ignore the "spaces/tabs" or at least produce a very clear exception during parsing about the reason.
Not such big show stopper, but it will improve the learning curve of the product ( and many times a lot of time with Talend is spent in debugging this little glitches in the flow .... )

janhess · ‎2011-03-31

The data should be rejected as it is an invalid integer. It would be if it was validated against the schema. This sort of data should be defined as string and trimmed. It can be converted to integer later if this is necessary.

Anonymous · ‎2011-04-05

Shong is right here... this would require changes to all the input components.

at least produce a very clear exception during parsing about the reason.

This might be just as broad-- the components would have to figure out what the exception was caused by -- it could be a value of 'aaa' or ' 123 '
the only (not so elegant) easy solution to this would be to read all values as strings, trim everything and then convert it all to the types you want.
I use the following code to apply the same transform to all columns-- it could be used to apply trim in this instance:

//grab fields with reflection magic
java.lang.reflect.Field[] input_fields = input_row.getClass().getDeclaredFields();
//now we go through the fields
for( java.lang.reflect.Field field : input_fields ) {
	//set accessable so we dont explode with exception.
	field.setAccessible(true);
	//skip the silly byte arrays
	if(field.getName() == "commonByteArrayLock" || field.getName() == "commonByteArray" ) {
		continue;
	}
	//retrieve the field so we can play with it
	Object col_value = field.get(input_row);
	
	
    //Now we populate output schema
    //this should feel familiar 🙂
    java.lang.reflect.Field[] output_fields = output_row.getClass().getDeclaredFields();
    for ( java.lang.reflect.Field output_field : output_fields ) {
        output_field.setAccessible(true);

//apply the transform
        col_value = routines.StringHandling.TRIM(((String) col_value));

        //if these are matching schema columns, set the value of output to our changed value.
    	if(field.getName() == output_field.getName()) {
    	   output_field.set(output_row, col_value);
    	}
    }
}

alevy · ‎2011-04-06

I should perhaps clarify that simply selecting that field to be trimmed using the advanced settings solved the problem i.e. the value was trimmed as a String before it was cast to an Integer. I just thought Talend should perhaps always take this approach for numerics, even if the field is not set to be trimmed using the advanced settings.

alevy · ‎2011-04-11

While I certainly understand janhess's point of view, I also look at SQL Server which happily equates a number to a string with leading or trailing spaces i.e. 1 = ' 1 '.
However, there is an easy solution ( if you understand the problem in the first place). Also, making a change like this to tFileInput suggests, for consistency, tSchemaComplianceCheck and tConvertType should also be changed...
So, for now, I've left well-enough alone. (I've already submitted enough feature requests

! Someone else can have a go if they feel more strongly about it.)

alevy · ‎2012-08-23

Shong is right here... this would require changes to all the input components.

I don't believe this is true. All input components should use Talend's ParserUtils so only these need be modified and not every component e.g. becoming

public static int parseTo_int(String s) {
  if (s!=null) {
    s = s.trim();
  }
  return Integer.parseInt(s);
}

Anonymous · ‎2014-12-30

I would like some help. I want to know how to read csv files and make calculations as grouped together every 12 lines and calculate their average use over some places the rows to replace a string by another and end insert these values into a table in a mysql data base
tFileList -> tFileInputDelemited -> tjava -> tmap -> tLogRow
here I managed to recover and play mas info I do not manage to use the replace and consolidate every 12 line together and calculated the average
thx

[resolved] Automatically trim non-String fields?

Other

Talend Data Integration