Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
My Input log file looks like this
2017-05-09 10:18:52.743 INFO (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1] webapp=/solr path=/update params={}{} 0 66 2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
I am using tFileInputRegex component
The regex to parse the file is as shown here
"^"+ "([0-9]{4}\\-[0-9]{2}\\-[0-9]{2})"+" "+ "([0-9]{2}\\:[0-9]{2}\\:[0-9]{2}\\.[0-9]{3})"+" "+ "(.*?)"+" "+ "\\((.*)\\)"+" "+ "\\[(.*)\\]"+" "+ "(.*)"
I am getting the partial output as shown below
.----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------. | tLogRow_1 | |=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+------------------------------------------=| |Date |Time |Log_Level|App_Thread |Collection |Message | |=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+------------------------------------------=| |2017-05-09|10:18:52.743|INFO |qtp1543727556-22| x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1 | webapp=/solr path=/update params={}{} 0 66| |2017-05-09|10:18:52.745|ERROR |qtp1543727556-22| x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1|unknown field 'sentence' | '----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------'
But i want tFileInputRegex to ignore the row separator ("\n") when parsing the above input file and need to include the error message in the second line in the last column by ignoring the row separator. Please suggest if any solution.
Hello
tFileInputRegex read the file line by line, each line will be parsed with regex. As a workaround, read the whole file content as a string, replace all the new line character+at character to a special character, output the string to a temporary file before parsing it with regex. After parsing the file, replace all the special characters with new line character+at if needed, for example:
tfileinputRaw--main--tJavaRow1--main--tFileOutputDelimited
|
onsubjobok
|
tFileInputRegex--main--tJavaRow2--main--tLogRow
tFileInputRegex: read the new file generated by tfileOuputDelimited.
on tJavaRow1:
output_row.content = (input_row.content.toString()).replaceAll("\r\n at","@");
on tJavaRow2:
output_row.Date=input_row.Date;
//...other columns....
output_row.Message=input_row.replaceAll("@","\r\n");
Regards
Shong
Hello
tFileInputRegex read the file line by line, each line will be parsed with regex. As a workaround, read the whole file content as a string, replace all the new line character+at character to a special character, output the string to a temporary file before parsing it with regex. After parsing the file, replace all the special characters with new line character+at if needed, for example:
tfileinputRaw--main--tJavaRow1--main--tFileOutputDelimited
|
onsubjobok
|
tFileInputRegex--main--tJavaRow2--main--tLogRow
tFileInputRegex: read the new file generated by tfileOuputDelimited.
on tJavaRow1:
output_row.content = (input_row.content.toString()).replaceAll("\r\n at","@");
on tJavaRow2:
output_row.Date=input_row.Date;
//...other columns....
output_row.Message=input_row.replaceAll("@","\r\n");
Regards
Shong
Thanks For your Support. Really It helped a lot.
I am working on it. But Got stuck with very little Error..
tfileinputRaw--main--tJavaRow1--main--tFileOutputDelimited
This is my tJavaRow1
output_row.content = (input_row.content.toString()).replaceAll("\n\tat","@");
Below is my input file
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
Output in the tFileOutputDelimiteris
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' @ org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) @ org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) @ org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) @ org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) @ org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) @ org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
if i use tJavaRow2 and put the following command below replaceAll("\n@","@") is not working. I am getting output as above
tLogRow output is
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' @ org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) @ org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) @ org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) @ org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) @ org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) @ org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) [statistics] disconnected
Now I want to remove \n before @ in my output file.
My expected output is
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence’@ org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183)@ org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)@ org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)@ org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)@ org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)@ org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
if I put the same multiline in Eclipse and use val btring = a.replaceAll("\n@", "@"); in scala output is getting in single line.
can u please suggest something on this.
Thanks In Advance....
Thanks for the reply and support. I tried yours tJavaRow Code
output_row.content = (input_row.content.toString()).replaceAll("\r\n at","@");
It is not showing any changes
my Input file contains first \n after \r and at. may be for that.
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
By yours tJavaCode i am getting same output like below (after executing)
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
so after trying with yours i changed to
output_row.content = (input_row.content.toString()).replaceAll("\n\tat","@");
which is giving
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' @ org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) @ org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) @ org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) @ org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) @ org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) @ org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
now i want to get the above output in a single line.
for that tJavaRow2 i used with
output_row.content = (input_row.content.toString()).replaceAll("\n@","@");
But getting the above output only no changes means not able to remove the \n
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence' @ org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183) @ org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) @ org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277) @ org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211) @ org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) @ org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
In this I put the exported talend job (Archive file to import) and input file
Can U Please check if posible
https://drive.google.com/open?id=0B-hwVI6s7kodd0dWWFUtVWZHRTg
https://drive.google.com/open?id=0B-hwVI6s7kodSlVSMXNKbmNYeDg
Try to change "\n" by "\\n" as "\" is a special character for regex.
output_row.content = (input_row.content.toString()).replaceAll("\\n@","@")
Thanks a lot it Worked for me.....
Thanks for Support....