Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am using Talend bigdata Platform 6.3 Studio. I am using tSql component to execute a hive query in cluster. Please find below query
SELECT DISTINCT inVistaNonVistaComb.*,inLeasingProxy.LogicalFeedType as LPLogicalFeedType FROM inVistaNonVistaComb LEFT OUTER JOIN inLeasingProxy ON UPPER(inLeasingProxy.LogicalFeedType) = UPPER(inVistaNonVistaComb.Logical_Feed_Type) AND UPPER(inLeasingProxy.SalesOrgRegion) = UPPER(inVistaNonVistaComb.Sales_Org_Region) AND (UPPER(inVistaNonVistaComb.Sales_Org_Sub_Region) = UPPER(inLeasingProxy.SalesOrgSubRegion) OR UPPER(inLeasingProxy.SalesOrgSubRegion)='ALL') AND ( inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Sales_Rep_Code_Mis OR inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.CBN OR inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Transaction_Subtype OR inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Ice_Holder5 ) AND (inVistaNonVistaComb.Adjusted_Primary_Transaction_Date BETWEEN inLeasingProxy.EffectiveStartDate AND inLeasingProxy.EffectiveEndDate)
I am doing a left outer join with a reference table called
inLeasingProxy
This is a small file with only 2MB size. However the transaction file called
inVistaNonVistaComb will have a size 1 GB - 2GB
The problem here is that, for some transaction files(1GB), This query will execute very fast (10 minutes). But for some trasaction files(1GB) , this query will run for 2 hours.
How can i tune this query? Please help
I don't think talend big data expert will be able to help.
I have identified the issue.
Here the OR condition in the query is causing the problem.
WorkAround
I splitted the query into four queries.The individual OR condition is placed in seperate query. Each query is pasted in seperate tSQLRow.
The TsqlRows are connected in series.
Ex:
SELECT DISTINCT inVistaNonVistaComb.*,inLeasingProxy.LogicalFeedType as LPLogicalFeedType FROM inVistaNonVistaComb LEFT OUTER JOIN inLeasingProxy ON UPPER(inLeasingProxy.LogicalFeedType) = UPPER(inVistaNonVistaComb.Logical_Feed_Type) AND UPPER(inLeasingProxy.SalesOrgRegion) = UPPER(inVistaNonVistaComb.Sales_Org_Region) AND (UPPER(inVistaNonVistaComb.Sales_Org_Sub_Region) = UPPER(inLeasingProxy.SalesOrgSubRegion) OR UPPER(inLeasingProxy.SalesOrgSubRegion)='ALL') AND ( inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Sales_Rep_Code_Mis ) AND (inVistaNonVistaComb.Adjusted_Primary_Transaction_Date BETWEEN inLeasingProxy.EffectiveStartDate AND inLeasingProxy.EffectiveEndDate)
Hello,
We have redirected your issue to our Bigdata expert and then come back to you as soon as we can.
Thanks for your time.
Best regards
Sabrina
I don't think talend big data expert will be able to help.
I have identified the issue.
Here the OR condition in the query is causing the problem.
WorkAround
I splitted the query into four queries.The individual OR condition is placed in seperate query. Each query is pasted in seperate tSQLRow.
The TsqlRows are connected in series.
Ex:
SELECT DISTINCT inVistaNonVistaComb.*,inLeasingProxy.LogicalFeedType as LPLogicalFeedType FROM inVistaNonVistaComb LEFT OUTER JOIN inLeasingProxy ON UPPER(inLeasingProxy.LogicalFeedType) = UPPER(inVistaNonVistaComb.Logical_Feed_Type) AND UPPER(inLeasingProxy.SalesOrgRegion) = UPPER(inVistaNonVistaComb.Sales_Org_Region) AND (UPPER(inVistaNonVistaComb.Sales_Org_Sub_Region) = UPPER(inLeasingProxy.SalesOrgSubRegion) OR UPPER(inLeasingProxy.SalesOrgSubRegion)='ALL') AND ( inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Sales_Rep_Code_Mis ) AND (inVistaNonVistaComb.Adjusted_Primary_Transaction_Date BETWEEN inLeasingProxy.EffectiveStartDate AND inLeasingProxy.EffectiveEndDate)