Solved: Not able to run task while DatabricksDelta connect... - Qlik Community

AnujGupta · ‎2023-07-21

Hi Team,

This is very strange behavior from Qlik replicate, same highlighted command is working fine when I executed from databaricks. But it is not working when I run task from Qlik.

Source - Oracle

Destination - ADLS GEN2 unmanaged table.

Qlik Version - November 2022 (2022.11.0.289)

Oracle edition - 19.0.0.0

Error log -

00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Creating threads for all components (replicationtask.c:2401)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Threads for all components were created (replicationtask.c:2560)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Task initialization completed successfully (replicationtask.c:3921)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Use any Oracle Archived Redo Log Destination (oracle_endpoint_imp.c:974)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Read '63' blocks backward (oracle_endpoint_imp.c:994)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle CDC uses Oracle File Access mode (oracle_endpoint_imp.c:1010)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Wait '5' minutes for missing Archived Redo log (oracle_endpoint_imp.c:1169)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: retry timeout is '120' minutes (oracle_endpoint_imp.c:1184)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Scale is set to 10 for NUMBER Datatype (oracle_endpoint_imp.c:1215)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Retry interval is set to 5 (oracle_endpoint_imp.c:1229)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle source database version is 19.0.0.0.0 (oracle_endpoint_conn.c:611)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle Client version: 19.3.0.0.0 (oracle_endpoint_conn.c:624)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: The classic Oracle source database is used (oracle_endpoint_conn.c:1103)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle compatibility version is 19.0.0 (oracle_endpoint_conn.c:87)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Database role is 'PRIMARY' (oracle_endpoint_conn.c:133)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: SUPPLEMENTAL_LOG_DATA_PK is set (oracle_endpoint_conn.c:142)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Resetlog process is supported in Oracle Database (oracle_endpoint_imp.c:1410)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Restore bulk state. Last bulk last record id - '0', last applied record id - '0', target confirmed record id - '0', sorter confirmed record id - '0' (endpointshell.c:2016)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout = 30000 milliseconds (bulk_apply.c:563)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout Min = 1000 milliseconds (bulk_apply.c:564)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Working in bulk apply mode (endpointshell.c:2024)
00010820: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Source endpoint 'Oracle' is using provider syntax 'Oracle' (provider_syntax_manager.c:941)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 'Stop reading when memory limit reached' is set to false (sorter.c:658)
00009292: 2023-07-21T16:39:24 [STREAM_COMPONEN ]I: Going to connect to Oracle server (DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(LOAD_BALANCE=OFF)(CONNECT_TIMEOUT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5155.srv.volvo.com)(PORT= 1525))(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5156.srv.volvo.com)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=gwms4p01_rw.srv.volvo.com))) with username WMS_REPLSVC (oracle_endpoint_imp.c:871)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: All stream components were initialized (replicationtask.c:3694)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle capture start time: now (oracle_endpoint_capture.c:605)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Sorter last run state: confirmed_record_id = 0, confirmed_stream_position = '' (sorter_transaction.c:3295)
00010820: 2023-07-21T16:39:24 [TARGET_APPLY ]I: Bulk max file size: 100 MB, 102400 KB (cloud_bulk.c:150)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used difference between the Replicate machine UTC time and Oracle Local time is '-7200' seconds (oracle_endpoint_capture.c:349)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used Oracle archived Redo log destination id is '1' (oracdc_merger.c:646)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle instance uses more than one archived Redo log destination id. Please configure the correct destination id, if Redo logs of '1' destination cannot be accessed (oracdc_merger.c:650)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start processing online Redo log sequence 7442 thread 1 name /oracle/redo/gwms4p01/redo_02.log (oradcdc_redo.c:914)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle Redo compatibility version 13000000 (oradcdc_redo.c:916)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start REDO fetch from the context 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, thread 1 (oradcdc_thread.c:2652)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Opened transaction list contains '2' transactions (oracle_endpoint_capture.c:830)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Correcting source database time by 0 microseconds (sorter_transaction.c:197)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 2 open transactions. Waiting for transaction consistency (sorter_transaction.c:303)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: Start waiting for transactional consistency (replicationtask.c:3371)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]I: Task error notification received from subtask 0, thread 1, status 1020401 (replicationtask.c:3517)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]W: Task 'ora_wms' encountered a fatal error (repository.c:5935)
00009292: 2023-07-21T16:39:25 [SORTER ]I: Final saved task state. Stream position 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, Source id 3, next Target id 1, confirmed Target id 0, last source timestamp 1689950364428554 (sorter.c:781)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: Failed (retcode -1) to execute statement: 'CREATE OR REPLACE TABLE `wms3check`.`attrep_apply_exceptions` ( `TASK_NAME` VARCHAR(128) NOT NULL, `TABLE_OWNER` VARCHAR(128) NOT NULL, `TABLE_NAME` VARCHAR(128) NOT NULL, `ERROR_TIME` TIMESTAMP NOT NULL, `STATEMENT` STRING NOT NULL, `ERROR` STRING NOT NULL ) USING DELTA LOCATION 'abfss://rawdata@gtooldlsdev.dfs.core.windows.net/Databases/wmsdelta/attrep_apply_exceptions' TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)' [1022502] (ar_odbc_stmt.c:4996)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: RetCode: SQL_ERROR SqlState: 42000 NativeError: 80 Message: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:410)

john_wang · ‎2023-07-21

Hello @AnujGupta ,

Thanks for the post in community!

Several reasons may cause the same error:

Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key

In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:

Go to the Microsoft Azure Databricks target endpoint settings.

Click on "Database" Browse under Databricks ODBC Access and see if you get an error.
Click on "Staging directory" Browse under storage Staging and see if you get an error.
Click on Test Connection to see if there are any error.

Please go through Databricks Permissions and access endpoint especially:

1. When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.

spark.hadoop.hive.server2.enable.doAs false

2. To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.

Example:

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

For details, refer to the Databricks online help spark configuration .

3. The Client ID is configured correctly in the Endpoint.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

john_wang · ‎2023-07-21

Hello @AnujGupta ,

Thanks for the post in community!

Several reasons may cause the same error:

Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key

In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:

Go to the Microsoft Azure Databricks target endpoint settings.

Click on "Database" Browse under Databricks ODBC Access and see if you get an error.
Click on "Staging directory" Browse under storage Staging and see if you get an error.
Click on Test Connection to see if there are any error.

Please go through Databricks Permissions and access endpoint especially:

1. When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.

spark.hadoop.hive.server2.enable.doAs false

2. To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.

Example:

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

For details, refer to the Databricks online help spark configuration .

3. The Client ID is configured correctly in the Endpoint.

Hope this helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

AnujGupta · ‎2023-07-22

fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>

This will work in my case. I need to know currently I have hardcoded my accesskey in Databricks cluster. Is there any way to not hardcode key in spark config setting.

john_wang · ‎2023-07-22

Hello @AnujGupta ,

Good news! Thanks a lot for your update.

Databricks supports 3 credentials types to access Azure Data Lake Storage Gen2:

OAuth 2.0 with an Azure service principal
Shared access signatures (SAS)
Account keys

Account keys is used in current major Replicate versions. You may raise Feature Request if you need other credentials types.

thanks,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

AnujGupta · ‎2023-07-22

Please let me know how to raise Feature Request if I need to use service principal.

john_wang · ‎2023-07-23

Hello @AnujGupta ,

There is an article Ideation Guidelines: Getting Started with Ideation .

Hope it helps.

Regards,

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Not able to run task while DatabricksDelta connector as sink

Best Practices

CDC - 1-many - Log Stream

Configuration

Connectivity - Sources or Targets

Errors - Unexpected Behavior