Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Darkzealot
Contributor
Contributor

ADLS Gen2 Output job fails after 1 hour due to Azure AD Limitation

Hi all, we're pushing parquet files to ADLS with the tAzureAdlsGen2Output component.

When we encounter a large input file or table and the process runs 1hour+, it fails with a 401. It seems we run into that limitation: https://docs.microsoft.com/en-us/azure/databricks/kb/data-sources/job-fails-adls-hour

Anyone has faced similar issue? How can we work around this? Ideally, the talend component would refresh the passthrough token by itself.

We have to use Azure AD for security constraints =/

Labels (3)
5 Replies
Anonymous
Not applicable

Hello,

Are you using the default max batch size on the tAzureAdlsGen2Output component? Is it possible to optimize performance on the tAzureADLSGen2Output component so that it doesn't require 1 hours+?

Best regard

Sabrina

 

 

Darkzealot
Contributor
Contributor
Author

Hey, I already tried tweaking the Max Batch Size and was able to get to ~200000 before having "request too big" errors. But the subjob still need more then 1 hour =(

Anonymous
Not applicable

Hello,

Would you mind posting your job design screenshots on community which will be helpful for us to get more details and information about your current situation.

Please mask your sensitive data as well.

Best regards

Sabrina

Darkzealot
Contributor
Contributor
Author

Sure, here the screenshots and logs of the error at the end. (The error always happen after 1 hour.)

I got the same behavior with large csv files as input, or large DB tables.

Thanks!0695b00000Rh5R6AAJ.png0695b00000Rh5TvAAJ.png0695b00000Rh5RLAAZ.png0695b00000Rh5PuAAJ.png 

[INFO ] 10:33:02 org.apache.parquet.hadoop.InternalParquetRecordWriter- Flushing mem columnStore to file. allocated memory: 2664135

[INFO ] 10:33:27 org.apache.parquet.hadoop.InternalParquetRecordWriter- Flushing mem columnStore to file. allocated memory: 2411351

[INFO ] 10:33:56 org.apache.parquet.hadoop.InternalParquetRecordWriter- Flushing mem columnStore to file. allocated memory: 2223920

[INFO ] 10:34:22 org.apache.parquet.hadoop.InternalParquetRecordWriter- Flushing mem columnStore to file. allocated memory: 2369118

[INFO ] 10:34:48 org.apache.parquet.hadoop.InternalParquetRecordWriter- Flushing mem columnStore to file. allocated memory: 2419499

[ERROR] 10:34:49 org.talend.components.adlsgen2.service.AdlsGen2Service- [handleResponse] InvalidAuthenticationInfo [401]: Authentication information is not given in the correct format. Check the value of Authorization header..

 

[ERROR] 10:34:49 org.talend.components.adlsgen2.output.AdlsGen2Output- [afterGroup] InvalidAuthenticationInfo [401]: Authentication information is not given in the correct format. Check the value of Authorization header..

 

[FATAL] 10:34:49 JobX- tAzureAdlsGen2Output_1 (org.talend.components.adlsgen2.runtime.AdlsGen2RuntimeException) InvalidAuthenticationInfo [401]: Authentication information is not given in the correct format. Check the value of Authorization header..

 

org.talend.sdk.component.api.exception.ComponentException: (org.talend.components.adlsgen2.runtime.AdlsGen2RuntimeException) InvalidAuthenticationInfo [401]: Authentication information is not given in the correct format. Check the value of Authorization header..

Anonymous
Not applicable

Hello,

The lifetime of an Azure AD pass through token is one hour. When a command is sent to the cluster that takes longer than one hour, it fails if an ADLS resource is accessed after the one hour mark. This is a known issue.

As we known that, it is not possible to increase the lifetime of an Azure AD pass through token. The token is retrieved by the Azure Databricks replicated principal. You cannot edit its properties.

Could you please try to rewrite your queries, so that no single command takes longer than an hour to complete?

Best regards

Sabrina