Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
PawelM
Contributor II
Contributor II

Poor Implementation of org.talend.components.snowflake.runtime.SnowflakeSourceOrSink/tSnowflakeOutput

 

Hi 

Is there anyone actually using a custom tSnowflakeOutput component?

Because as it stands, relying on Snowflake support is no longer viable — the support process is stuck in an infinite AI-loop with zero intelligence. And apparently, for Qlik the most important thing is color-matching logos, not actual functionality...

Anyway — back to the real issue.

Performance Degradation in tSnowflakeOutput

We’re dealing with large volumes of data across 50+ schemas, with schema-swapping and backups, and performance is key.

In our setup, tSnowflakeOutput is often used to insert or update single records.
Despite that — on every single execution, the component internally calls:

 

java
metaData.getPrimaryKeys(this.getCatalog(connProps), this.getDbSchema(connProps), tableName);

This happens even when the schema and primary key are already manually defined in the component.

This logic is embedded in:

 

java
 
protected Schema getSchema(...) in SnowflakeTableProperties.java → which calls getSchema() in SnowflakeSourceOrSink

Impact:

  • Every single tSnowflakeOutput call triggers metaData.getPrimaryKeys(...)

  • On environments with many schemas/databases, this call takes up to 10 minutes

  • This means that updating a single record can cost 6+ minutes of runtime, repeated per record, per job execution

  • Multiplied across jobs and flows, this creates massive runtime bloat


This can be fixed : 

Introduce the following advanced UI options in tSnowflakeOutput:

  • includePrimaryKeys (checkbox): default to true for backward compatibility

  • catalogFilterLevel (combo box: "catalog", "schema", "table"😞 determines the scope of metadata calls

Update the schema loading logic to only call getPrimaryKeys when needed.


This can be fixed ex: 

@Override public Schema retrieveSchema() throws IOException { try (SandboxedInstance sandboxedInstance = SnowflakeDefinition.getSandboxedInstance( "org.talend.components.snowflake.runtime.SnowflakeSourceOrSink", true)) { SnowflakeRuntimeSourceOrSink ss = (SnowflakeRuntimeSourceOrSink) sandboxedInstance.getInstance(); ss.initialize(null, (Properties) this); boolean includePrimaryKeys = this.getIncludePrimaryKeys(); // GUI checkbox String catalogLevel = this.getCatalogFilterLevel(); // “catalog”, “schema”, or “table” return ss.getEndpointSchemaWithOptions( null, this.tableName.getValue(), includePrimaryKeys, catalogLevel ); } }

This small enhancement would drastically reduce runtime and avoid unnecessary metadata roundtrips when working with single-record operations.

Let me know if this can be considered in an upcoming release — or if we should fork and patch it ourselves...

Best regards,

Labels (4)
2 Replies
Denis_Segard
Support
Support

Hello,

Could you test to use this jdbc parameter :

enablePatternSearch=false

See https://docs.snowflake.com/en/developer-guide/jdbc/jdbc-parameters#enablepatternsearch

The jdbc parameter seems to be working with snowflake jdbc 3.22.0.

Kind regards
Denis

PawelM
Contributor II
Contributor II
Author

Hello, 
Before I made my post, I had already tested different JDBC parameters .

withouit enablePatternSearch:
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "XXX_DEV_XXX"
with  enablePatternSearch:
show /* JDBC:DatabaseMetaData.getPrimaryKeys() */ primary keys in database "XXX_DEV_XXX"

And the same  large delay is happening  with or without  enablePatternSearch  disabled 

So, no, it is not working!
As there are more than 50  schemas under that DB and it will grow ... ( as for now it's highlighted to our global international  organisation that  we will have to look for other ETL/ELT provider as Tlaled will degrade performance exponentially as there will be more and more schemas added )

I would expect  :
 show primary keys in SCHEMA XXX_DEV_XXX.SCHEMA_NAME
or just 
show primary keys in SCHEMA
or 
primary keys in TABLE "XXX_DEV_XXX.SCHEMA_NAME.TABLE_NAME 
or
Disable it by chakebox so it will not call anything!

So no, this parameter in JDBC or  any other via JDBC is not working ( this is why in Dbeaver  this configuration was also introduced !)

Kind Regardas
PM

tested on jdbc 3.22.0 and jdbc 3.23.0
1