Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
AnujGupta
Contributor III
Contributor III

Not able to run task while DatabricksDelta connector as sink

Hi Team, 

This is very strange behavior from Qlik replicate, same highlighted command is working fine when I executed from databaricks. But it is not working when I run task from Qlik. 

Source - Oracle

Destination - ADLS GEN2 unmanaged table.

Qlik Version - November 2022 (2022.11.0.289)

Oracle edition - 19.0.0.0

Error log - 

00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Creating threads for all components (replicationtask.c:2401)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Threads for all components were created (replicationtask.c:2560)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Task initialization completed successfully (replicationtask.c:3921)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Use any Oracle Archived Redo Log Destination (oracle_endpoint_imp.c:974)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Read '63' blocks backward (oracle_endpoint_imp.c:994)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle CDC uses Oracle File Access mode (oracle_endpoint_imp.c:1010)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Wait '5' minutes for missing Archived Redo log (oracle_endpoint_imp.c:1169)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: retry timeout is '120' minutes (oracle_endpoint_imp.c:1184)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Scale is set to 10 for NUMBER Datatype (oracle_endpoint_imp.c:1215)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Retry interval is set to 5 (oracle_endpoint_imp.c:1229)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle source database version is 19.0.0.0.0 (oracle_endpoint_conn.c:611)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle Client version: 19.3.0.0.0 (oracle_endpoint_conn.c:624)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: The classic Oracle source database is used (oracle_endpoint_conn.c:1103)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle compatibility version is 19.0.0 (oracle_endpoint_conn.c:87)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Database role is 'PRIMARY' (oracle_endpoint_conn.c:133)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: SUPPLEMENTAL_LOG_DATA_PK is set (oracle_endpoint_conn.c:142)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Resetlog process is supported in Oracle Database (oracle_endpoint_imp.c:1410)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Restore bulk state. Last bulk last record id - '0', last applied record id - '0', target confirmed record id - '0', sorter confirmed record id - '0' (endpointshell.c:2016)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout = 30000 milliseconds (bulk_apply.c:563)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout Min = 1000 milliseconds (bulk_apply.c:564)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Working in bulk apply mode (endpointshell.c:2024)
00010820: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Source endpoint 'Oracle' is using provider syntax 'Oracle' (provider_syntax_manager.c:941)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 'Stop reading when memory limit reached' is set to false (sorter.c:658)
00009292: 2023-07-21T16:39:24 [STREAM_COMPONEN ]I: Going to connect to Oracle server (DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(LOAD_BALANCE=OFF)(CONNECT_TIMEOUT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5155.srv.volvo.com)(PORT= 1525))(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5156.srv.volvo.com)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=gwms4p01_rw.srv.volvo.com))) with username WMS_REPLSVC (oracle_endpoint_imp.c:871)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: All stream components were initialized (replicationtask.c:3694)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle capture start time: now (oracle_endpoint_capture.c:605)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Sorter last run state: confirmed_record_id = 0, confirmed_stream_position = '' (sorter_transaction.c:3295)
00010820: 2023-07-21T16:39:24 [TARGET_APPLY ]I: Bulk max file size: 100 MB, 102400 KB (cloud_bulk.c:150)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used difference between the Replicate machine UTC time and Oracle Local time is '-7200' seconds (oracle_endpoint_capture.c:349)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used Oracle archived Redo log destination id is '1' (oracdc_merger.c:646)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle instance uses more than one archived Redo log destination id. Please configure the correct destination id, if Redo logs of '1' destination cannot be accessed (oracdc_merger.c:650)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start processing online Redo log sequence 7442 thread 1 name /oracle/redo/gwms4p01/redo_02.log (oradcdc_redo.c:914)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle Redo compatibility version 13000000 (oradcdc_redo.c:916)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start REDO fetch from the context 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, thread 1 (oradcdc_thread.c:2652)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Opened transaction list contains '2' transactions (oracle_endpoint_capture.c:830)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Correcting source database time by 0 microseconds (sorter_transaction.c:197)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 2 open transactions. Waiting for transaction consistency (sorter_transaction.c:303)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: Start waiting for transactional consistency (replicationtask.c:3371)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]I: Task error notification received from subtask 0, thread 1, status 1020401 (replicationtask.c:3517)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]W: Task 'ora_wms' encountered a fatal error (repository.c:5935)
00009292: 2023-07-21T16:39:25 [SORTER ]I: Final saved task state. Stream position 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, Source id 3, next Target id 1, confirmed Target id 0, last source timestamp 1689950364428554 (sorter.c:781)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: Failed (retcode -1) to execute statement: 'CREATE OR REPLACE TABLE `wms3check`.`attrep_apply_exceptions` ( `TASK_NAME` VARCHAR(128) NOT NULL, `TABLE_OWNER` VARCHAR(128) NOT NULL, `TABLE_NAME` VARCHAR(128) NOT NULL, `ERROR_TIME` TIMESTAMP NOT NULL, `STATEMENT` STRING NOT NULL, `ERROR` STRING NOT NULL ) USING DELTA LOCATION 'abfss://rawdata@gtooldlsdev.dfs.core.windows.net/Databases/wmsdelta/attrep_apply_exceptions' TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)' [1022502] (ar_odbc_stmt.c:4996)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: RetCode: SQL_ERROR SqlState: 42000 NativeError: 80 Message: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:410)

1 Solution

Accepted Solutions
john_wang
Support
Support

Hello @AnujGupta ,

Thanks for the post in community!

Several reasons may cause the same error:

Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key

In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:

Go to the Microsoft Azure Databricks target endpoint settings.

  • Click on "Database" Browse under Databricks ODBC Access and see if you get an error.
  • Click on "Staging directoryBrowse under storage Staging and see if you get an error.
  • Click on Test Connection to see if there are any error.

Please go through Databricks Permissions and access endpoint especially:

1.    When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.

spark.hadoop.hive.server2.enable.doAs false

2.    To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.

Example:

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

For details, refer to the Databricks online help spark configuration .

3.    The Client ID is configured correctly in the Endpoint.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

5 Replies
john_wang
Support
Support

Hello @AnujGupta ,

Thanks for the post in community!

Several reasons may cause the same error:

Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key

In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:

Go to the Microsoft Azure Databricks target endpoint settings.

  • Click on "Database" Browse under Databricks ODBC Access and see if you get an error.
  • Click on "Staging directoryBrowse under storage Staging and see if you get an error.
  • Click on Test Connection to see if there are any error.

Please go through Databricks Permissions and access endpoint especially:

1.    When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.

spark.hadoop.hive.server2.enable.doAs false

2.    To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.

Example:

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

For details, refer to the Databricks online help spark configuration .

3.    The Client ID is configured correctly in the Endpoint.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
AnujGupta
Contributor III
Contributor III
Author

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

This will work in my case. I need to know currently I have hardcoded my accesskey in Databricks cluster. Is there any way to not hardcode key in spark config setting. 

 

john_wang
Support
Support

Hello @AnujGupta ,

Good news! Thanks a lot for your update.

Databricks supports 3 credentials types to access Azure Data Lake Storage Gen2:

  • OAuth 2.0 with an Azure service principal
  • Shared access signatures (SAS)
  • Account keys

Account keys is used in current major Replicate versions. You may raise Feature Request if you need other credentials types.

thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
AnujGupta
Contributor III
Contributor III
Author

Please let me know how to raise Feature Request if I need to use service principal. 

john_wang
Support
Support

Hello @AnujGupta ,

There is an article Ideation Guidelines: Getting Started with Ideation .

Hope it helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!