Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
WSyahirah21
Creator
Creator

Fetch and store result from log.info() into mysql database using talend studio

Currently, my spark job managed to output result from the log.info() function. What I was trying to do now is to fetch and store the output into a proper mysql database using talend component. 

 

Result from spark:

INFO:spark2:Source rows = 2692687

INFO:jspark2:Destination rows = 2692687

 

My idea is to use these workflow to achieve the task objective:

tSystem (submit spark job) >> tFileInputDelimited (fetch the printing result from spark) >> <some mysql component to store result to msql>  

 

Is it possible to do this way? 

Labels (3)
2 Replies
Anonymous
Not applicable

Hello,

 

You can read and parse the logs produced by spark and use a tFileInputDelimited -> tFilterRow -> tDBOutput(MySQL)

 

Another option would be to use the outputLine of tSystem if the job you call produces console logs.

 

 

Log4j messages themselves could be forwarded to a database. One would have to use JDBCAppender for log4j.

In case of Log4J2 the config I used (for a POC!) was:

 

  <JDBC name="dbAppender" tableName="log4j2.all_log" connectionSource="PoolingDriver">

<DriverManager connectionString="jdbc:postgresql://localhost:5432/postgres" 

driverClassName="org.postgresql.Driver" username="postgres" password=":)" />

 

    <Column name="moment" isEventTimestamp="true" />

    <Column name="origin"      isUnicode="false" pattern="%replace{%msg}{(.+) - (.*)}{$1}" />

    <Column name="message" isUnicode="false" pattern="%replace{%msg}{(.+) - (.*)}{$2}" />

    <Column name="mdc" isUnicode="false" pattern="%X" />

    <Column name="level" isUnicode="false" pattern="%level" />

<Filters>

<MapFilter onMatch="NEUTRAL" onMismatch="ACCEPT">

<KeyValuePair key="_pid" value="0"/>

</MapFilter>

<RegexFilter regex="^(connectionStatsLogs|talendStats_|talendMeter_|talendLogs_|tLogRow_).+" onMatch="DENY" onMismatch="ACCEPT"/>

<RegexFilter regex=".+ - Parameters:.+" onMatch="DENY" onMismatch="ACCEPT"/>

</Filters>

  </JDBC>

This config is not for production use as it creates a lot of new connections.

 

WSyahirah21
Creator
Creator
Author

Hi,

currently I found other alternative to fetch the string output using:

tSystem_1 (print the output) -> tJava (get the output and bring to next component)

 

tSystem_1 output:

('source count:', '1000')

('destination count:', '100')

 

tJava code:

String output=((String)globalMap.get("tSystem_1_OUTPUT"));

System.out.println("Printing the error code 1 : "+StringUtils.substringBetween(output,"source count:", "destination count:"));

 

tJava output:

Printing the error code 1 : ', '1000')

('

 

However, tJava takes the result exactly between string " source count:' " and " destination count: ", so that included the bracket and all. The expected result is only to get the value 1000.

 

How do I apply this java code to fetch the correct output?