Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have the following json file (tfileinputRaw input file )
Input :
[{"_comp":"1","Name":"ABC"} ,
{"_comp":"1","Name":"ZYX"} ,
{"_comp":"1","Name":"MNO"},
{"_comp":"1","Name":"YZX"}]
Output : Expected
{"_comp":"1","Name":"ABC"}
{"_comp":"1","Name":"ZYX"}
{"_comp":"1","Name":"MNO"}
{"_comp":"1","Name":"YZX"}
My requirement is
1. To remove the square brackets [] at the start and end of the file. (There can be square brackets inside the file at other places, it has to be left untouched; only if there is any square brackets at start and end of the file it has to be removed if there is no square brackets it can be left as it is)
2. To remove comma (,) after each record is done.
New records can be identified whenever {"_comp": is encountered.
So I plan the job to be
tfileinputRaw ==> tjava ==>tfileoutputRaw
in tjava logic planned => if there is , (comma) or [ (open square brackets) before {"_comp": then it has to be replaced with space else do no changes. Also remove ] (close square brackets) at EOF if present, if not present nothing else to be done.
Please help me how to implement this logic in tjava.
Hi,
Everything works fine except for a single scenario,
if there is a json format file with array inside an element; that array (square brackets) is also getting missed, in the below example
see the array [group_1_id ] and [group_2_id] are missed from the resulting json format which we have.
tjava_code used : output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll("} *,\\r\\n", "}\r\n");
Input :
[{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":["group_1_id"],"grp2":["group_2_id"]}}}]
Excepted :
{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":["group_1_id"],"grp2":["group_2_id"]}}}
Observed :
{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":"group_1_id","grp2":"group_2_id"}}}
Expect your inputs to get this problem resolved. Kindly assist.
Replace tFileInputRaw by tFileInputFullRow to read input file line by line.
Connect to tJavaRow with this line of code:
output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll(",$", "");
For the example I connect tJavaRow to tLogRow.
Here is the result:
Starting job test at 15:31 25/05/2017. [statistics] connecting to socket on port 3552 [statistics] connected {"_comp":"1","Name":"ABC"} {"_comp":"1","Name":"ZYX"} {"_comp":"1","Name":"MNO"} {"_comp":"1","Name":"YZX"} [statistics] disconnected Job test ended at 15:31 25/05/2017. [exit code=0]
Just wanted to know,
1. Will this code removes , (comma) where ever it encounters? Because, in my data file there can be comma's in other fields too. I need only to remove comma's while the script encounters {"_comp": . Will the below scenario be handled with this same code ?
2. Also since you give tFileInputFullRow as an input; if there is any \n encountered during json, will that then be affected by this tjava code and make any changes to the data file after \n
3. BTW, I am getting the below error, can you assist me to know where I am wrong.
output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll(",$", "");
If I give the row separator as "" instead of "\n" in tFileInputFullRow component nothing is working as you mention. This is what i mentioned in the point no. 2. I basically wanted the whole file to be treated as a single row and do the changes for the requirement mentioned above. Because at the input \n may be not given exactly at each record level.The solution you propose is not working when improper \n are appearing inside the input file for just a single record.
Hence i had to use the input component as tFileInputRaw during my initial Query. Can you please tell me how to manage the above scenario when treating the whole file in a single row?
Hence i wanted to search for the string ,{"_com": and if exists do the change as mentioned above. For this, \n does not matter and it replaces directly.
Can you think of any other idea to get my requirement fulfilled?
Was not explained as it in your 1st post.
Well, you can convert object issued from tFileInputRaw to string using tConvert.
After that, you can use regex (as shown in my 1st answer) to arrange the content as desired.
Here is the complete job with the tConvert and its schema:
As you can see, there is only 1 line from tFileInputRaw. It contains the complete file.
Last part of the tJavaRow is a little bit changed like this:
output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll("} *,", "}");
In the input file is like this:
[{"_comp":"1","Name":"ABC"} , {"_comp":"1","Name":"ZYX"} , {"_comp":"1","Name":"MNO"}, {"_comp":"1","Name":"YZX"}]
The result is:
Starting job test at 21:20 25/05/2017. [statistics] connecting to socket on port 3738 [statistics] connected {"_comp":"1","Name":"ABC"} {"_comp":"1","Name":"ZYX"} {"_comp":"1","Name":"MNO"} {"_comp":"1","Name":"YZX"} [statistics] disconnected Job test ended at 21:20 25/05/2017. [exit code=0]
In the input file is like this (with a splitted line):
[{"_comp":"1","Name":"ABC"} , {"_comp":"1","Name":"ZYX"} , {"_comp":"1","Name end of the name on a 2nd line":"MNO"}, {"_comp":"1","Name":"YZX"}]
The result is:
Starting job test at 21:31 25/05/2017. [statistics] connecting to socket on port 3422 [statistics] connected {"_comp":"1","Name":"ABC"} {"_comp":"1","Name":"ZYX"} {"_comp":"1","Name end of the name on a 2nd line":"MNO"} {"_comp":"1","Name":"YZX"} [statistics] disconnected Job test ended at 21:31 25/05/2017. [exit code=0]
Is it what you expect?
Just seen your PM after my last post, change tJavaRow like this
output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll("} *,\\r\\n", "}\r\n");
Hi,
Everything works fine except for a single scenario,
if there is a json format file with array inside an element; that array (square brackets) is also getting missed, in the below example
see the array [group_1_id ] and [group_2_id] are missed from the resulting json format which we have.
tjava_code used : output_row.line = input_row.line.replaceAll("[\\[\\]]", "").replaceAll("} *,\\r\\n", "}\r\n");
Input :
[{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":["group_1_id"],"grp2":["group_2_id"]}}}]
Excepted :
{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":["group_1_id"],"grp2":["group_2_id"]}}}
Observed :
{"_com":"com_id1","_time1":{"gmt":"2017-04-04T18:46:46.392Z","ist":"2017-04-04T04:45:09.776Z"},"_time2":{"creator":"creator1","groups":{"grp1":"group_1_id","grp2":"group_2_id"}}}
Expect your inputs to get this problem resolved. Kindly assist.
Not exactly the same case, but with a little change to the tJavaRow gives what you ask for:
output_row.line = input_row.line.replaceAll("^\\[", "").replaceAll("\\]$", "").replaceAll("} *,\\r\\n", "}\r\n");
The last replaceAll is not mandatory with your sample as there is only 1 line, but if you have more than 1, it should be usefull.