Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a job that uses a source table that holds configuration data. It flows to an iterate then tjavarow to set some connection information. The truncate uses dynamic vars from the configuration table as well as the source and output components. The dboutput sets the parallel also in this way and uses the dynamic schema. This all works just fine however I have lost the ability to see how long each iteration takes for the dbinput and row count / seconds. that would be displayed in the TAC if it was all individual source and destinations.
I want to get the time taken for the insert and number of rows for each iteration. I am using parallel iterations and parallelism on the dbinsert for performance. Some tables take 1min others take 20min.
I can't figure out how to get the ((Integer)globalMap.get("tDBOutput_1_NB_LINE_INSERTED")) after each individual iteration. I can get it after the subjob but it is only the last one. Any ideas how to get this.
I would settle for the screen output but eventually want to log to a file and then email that file.
@Bolter44, Would you mind trying the scenario as below with help of global variables in loop based on your iterations:
I have parallel iterations however I changed to no parallel and added this to the tjava.
/* setup for logging */
row3.SourceDB = (String)globalMap.get("RefreshList.SOURCE_DB");
row3.SourcOwner = (String)globalMap.get("RefreshList.SOURCE_OWNER");
row3.SourceObject = (String)globalMap.get("RefreshList.SOURCE_TABLE");
row3.Parallelism = (Integer)globalMap.get("RefreshList.DEST_PARALLEL");
row3.ProcessedRows = ((Integer)globalMap.get("tDBOutput_1_NB_LINE_INSERTED"));
The first four are always populated correctly. The tDBOutput_1_NB_LINE_INSERTED only got one value through the iteration.
[statistics] connecting to socket on port 3587
statistics] connected
PROD|PAYROLL|PERDTOT|6|6030007|
ODS|GENERAL|GOBEACC|2|6030007|
UNCP|WA_CONF|TRANSACTION|2|6030007|
PROD|PAYROLL|PERETOT|2|6030007|
[statistics] disconnected
Hello,To begin, if you use the insert in parallel at the DB level, you will not be able to retrieve the insert number, because it uses multi threading and it is not able to set the exact number due to the random operation.
There is an approach that will allow you to recover this value.
I see that you are truncate the table before insertion, which is practical and makes things easier.
You can leave your design and just make a little addition. After insertion you can use these two components below:
The query For SQL Server: https://dataedo.com/kb/query/sql-server
select schema_name(tab.schema_id) + '.' + tab.name as [table],
sum(part.rows) as [rows]
from sys.tables tab
inner join sys.partitions part
on tab.object_id = part.object_id
where part.index_id IN (1, 0) -- 0 - table without PK, 1 table with PK
group by schema_name(tab.schema_id) + '.' + tab.name
order by sum(part.rows) desc
The first allows to delay a little the time to insert all the data, you can put 5 seconds or more if you wish.
The second allows you to query the database and retrieve the number of records. being that you are truncate the table, you will have correct values. You store this in a global variable that you can use later.
If by chance, there is already data in your table, you can execute the same query before the insertion and after the insertion. Then you make a difference to get the exact number.
especially do not hesitate if you need more explanation
I could not get it to work with the parallel set in the tDBOutput. So I changed the iterations to 8 so it can move through the list.
In the first tJava I added:
// Start time for each iteration
globalMap.put("IterateStart_"+((Integer)globalMap.get("tFlowToIterate_4_CURRENT_ITERATION")) , TalendDate.getCurrentDate());
to dynamically increment a global var that sets the start time for each iteration in parallel.
Then on the tDBoutput I used the On Component OK -> tJava I added in second tJava: LogRow.StartTime = (Date)(globalMap.get("IterateStart_"+((Integer)globalMap.get("tFlowToIterate_4_CURRENT_ITERATION")))); to get the current start for the iteration.
LogRow.EndTime = TalendDate.getCurrentDate(); LogRow.RunTime = TalendDate.diffDate(LogRow.EndTime,LogRow.StartTime,"SS");
to get the number of minuts it ran for. It will show 0 for less than 1 but that is OK.
The enitre loging code lools like: /* setup for logging */ LogRow.SourceDB = (String)globalMap.get("RefreshList.SOURCE_DB");LogRow.SourceOwner = (String)globalMap.get("RefreshList.SOURCE_OWNER");LogRow.SourceObject = (String)globalMap.get("RefreshList.SOURCE_TABLE");LogRow.Parallelism = (Integer)globalMap.get("RefreshList.DEST_PARALLEL");LogRow.ProcessedRows = ((Integer)globalMap.get("tDBOutput_1_NB_LINE"));LogRow.StartTime = (Date)(globalMap.get("IterateStart_"+((Integer)globalMap.get("tFlowToIterate_4_CURRENT_ITERATION"))));LogRow.EndTime = TalendDate.getCurrentDate();LogRow.RunTime = TalendDate.diffDate(LogRow.EndTime,LogRow.StartTime,"SS"); LogRow.Errors = ((String)globalMap.get("tDBOutput_1_ERROR_MESSAGE"));
Then made a schema to a tlogrow but will be used log to a file.
It is working. i wish talend would figgure a nice way to get some stats when you are using parallel on the dboutput and in iterations. Thanks to the other posters on this to give me some ideas to get it done.
Hope this may help others too.
Did you try my solution with the query ?