Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
fbehfar
Contributor
Contributor

tOracleTableTransfer different parameters

Hi,
I'm using tOracleTableTransfer for loading ~90 big tables, and noticed different speed for 2 tables with similar row counts, and found out the one took longer was due to the big table size!
I tried to play with the "Log interval in seconds", "Source select fetch size" and "Insert Batch size" parameter numbers and hope for the best results but it still taking a very long time for my tables to get loaded! (5 hours for about 90 tables with average row counts of 50K/table)
Does anyone have more depth knowledge on how this perfect custom-made component works?
I have even divided my table list into 2 for relatively large and small tables and gave 50,000 to small ones and 500,000 to large ones for "Source select fetch size" and "Insert Batch size" parameters, but it didn't help with the speed.
Any other idea on how to speed the load up and enhance the performance?
Any help is appreciated!
Labels (2)
21 Replies
fbehfar
Contributor
Contributor
Author

Thank you SOOOOOO MUCH jlolling again!
The log is VERY clean and neat, and having the option for committing is a big plus for this project of ours.
I will be testing the component more and monitor the performance with/out the commit, and will update on which works better for us.
BTW, I changed my "Source select fetch size" to 25K and "Insert Batch size" to 10K and I see better performance, I will keep playing with those numbers to see what works better!
One other good thing I noticed about this component is that it does not truncate the 6 digits in timestamps (even though Java is limited and truncating the last 3 digits in milliseconds).
Thanks!
Anonymous
Not applicable

You are welcome!
Thanks for getting me updated.
fbehfar
Contributor
Contributor
Author

jLolling, I'm testing the latest component version to see the performance. What I have been doing is giving the component different values for "Source select fetch size" and "Insert Batch size".
One thing that I noticed is that if I run the component multiple times with the same values for "Source select fetch size" and "Insert Batch size" (using the same table, the same records), I get different performance. My table has 21 columns, 46000 records and one time it took 3510 milliseconds, and the next time 6208 milliseconds.
On average the best speed that I got for this table was with "Source select fetch size" =50,000 and "Insert Batch size" =50,000 , but it varies each time I try.
Is there something I'm doing wrong? Or a step that I'm missing? I run the garbage collector in between my each runs!
fbehfar
Contributor
Contributor
Author

Here is the screenshot of my job and my settings.

0683p000009MBiN.png
Anonymous
Not applicable

The time differences are not caused by the component, instead caching or load situations in the database. The code of the component does the work always in the same way, no possible root clause for different durations.
The garbage collect is actually not necessary because there is a huge time (from the perspective of the JVM) between both runs and the GC has already done what ever is necessary.
I suggest you check the mode (option: Backup in file + Only backup in file, no inserts into target table) in which you only write the file and not in the table and check if this only reading varies in the duration and what time it needs. 
You cause huge transaction size and this depends on a huge space in the SGA.
fbehfar
Contributor
Contributor
Author

I applied your you suggestion to check the back up in file mode and not inserting into the target table, and ran the job few times for the same table and same settings and these are the duration in milliseconds:
25693,2574,2558,2480,3120,2808,14118,3417,14758,3307,2479,2583
If it only depends on the DB setting and not component, then I think we are good. It was just an observation that I had and wanted to make sure I'm using this component correctly!
Thanks!
fbehfar
Contributor
Contributor
Author

Also we tested the component with MSSQL as the source, and were not able to run the job. This is the error that we got:
: sqlrunner.datamodel.SQLDataModel - loadSchemas (schemas) for catalog: Infra failed: The server principal "username" is not able to access the database "Infra" under the current security context.
ERROR: Write failed in line number 35 message:Invalid column type
java.sql.SQLException: Invalid column type
.....
ERROR: Read has been failed. Stop write in table.
tOracleTableTransfer_1 read:23641 inserted:35
disconnected
Nothing was written into DB, even though the log says 35 inserted.
We tested the connection separately using tMSSQLInput, and it connects and returns results just fine, but when we use tOracleTableTransfer we get the above error.
We tried using the "Self defined source query" and separately with "source table where clause", both cases failed with that error.
SELECT  
MRN,
col1,
col2,
col3,
col4
FROM SchemaName.TableNameMSSQL
where SchemaName.TableNameMSSQL.col2 is not null
fbehfar
Contributor
Contributor
Author

Any update on the source connection issue?
Also I was wondering if instead of a DB connection for the source we could use a flat file? I believe the process does at already, but it doesn't have an option to select a file as your source data.
Thanks!
Anonymous
Not applicable

Could you please switch on the debug mode. 
I agree with you, the log output is confusing and needs a review.
fbehfar
Contributor
Contributor
Author

jLolling,
This is the error that we get, one note is that LAST_SERVICE_DATE is type of data and there is no way to force tOracleTableTransfer to recognize it as a date, it does not read schema .
We believe the type date is being treated/processed as a varchar type!
Starting job tOracleTabletransfer at 11:32 10/11/2016.
 
connecting to socket on port 3817
connected
: sqlrunner.datamodel.SQLDataModel - loadSchemas (schemas) for catalog: Infra failed: The server principal "sqlaaccusr.prompt.pr" is not able to access the database "Infra" under the current security context.
DEBUG: createSourceSelectStatement SQL:
SELECT
mrn as MEMBER_ID,
Carepath_Code,
Carepath_Display_Name as Carepath_Name,
Last_Service_Date,
active
FROM  AACC.vwGetCarepaths
where AACC.vwGetCarepaths.Status_Message_Patient is not null and MRN is not null
 

To see the whole post, download it here
OriginalPost.pdf