Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team,
This is very strange behavior from Qlik replicate, same highlighted command is working fine when I executed from databaricks. But it is not working when I run task from Qlik.
Source - Oracle
Destination - ADLS GEN2 unmanaged table.
Qlik Version - November 2022 (2022.11.0.289)
Oracle edition - 19.0.0.0
Error log -
00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00011668: 2023-07-21T16:39:22 [METADATA_MANAGE ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00011668: 2023-07-21T16:39:23 [METADATA_MANAGE ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Creating threads for all components (replicationtask.c:2401)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Threads for all components were created (replicationtask.c:2560)
00011668: 2023-07-21T16:39:23 [TASK_MANAGER ]I: Task initialization completed successfully (replicationtask.c:3921)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Use any Oracle Archived Redo Log Destination (oracle_endpoint_imp.c:974)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Read '63' blocks backward (oracle_endpoint_imp.c:994)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle CDC uses Oracle File Access mode (oracle_endpoint_imp.c:1010)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Wait '5' minutes for missing Archived Redo log (oracle_endpoint_imp.c:1169)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: retry timeout is '120' minutes (oracle_endpoint_imp.c:1184)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Scale is set to 10 for NUMBER Datatype (oracle_endpoint_imp.c:1215)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Retry interval is set to 5 (oracle_endpoint_imp.c:1229)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle source database version is 19.0.0.0.0 (oracle_endpoint_conn.c:611)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle Client version: 19.3.0.0.0 (oracle_endpoint_conn.c:624)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: The classic Oracle source database is used (oracle_endpoint_conn.c:1103)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Oracle compatibility version is 19.0.0 (oracle_endpoint_conn.c:87)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Database role is 'PRIMARY' (oracle_endpoint_conn.c:133)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: SUPPLEMENTAL_LOG_DATA_PK is set (oracle_endpoint_conn.c:142)
00010800: 2023-07-21T16:39:23 [SOURCE_CAPTURE ]I: Resetlog process is supported in Oracle Database (oracle_endpoint_imp.c:1410)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Going to connect to server adb-2446663257414103.3.azuredatabricks.net database wms3check (cloud_imp.c:3965)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Target endpoint 'Databricks Lakehouse (Delta)' is using provider syntax 'DatabricksDelta' (provider_syntax_manager.c:947)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: ODBC driver version: '2.6.22.1037' (ar_odbc_conn.c:633)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Connected to server adb-2446663257414103.3.azuredatabricks.net database wms3check successfully. (cloud_imp.c:4004)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Restore bulk state. Last bulk last record id - '0', last applied record id - '0', target confirmed record id - '0', sorter confirmed record id - '0' (endpointshell.c:2016)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout = 30000 milliseconds (bulk_apply.c:563)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Set Bulk Timeout Min = 1000 milliseconds (bulk_apply.c:564)
00010820: 2023-07-21T16:39:23 [TARGET_APPLY ]I: Working in bulk apply mode (endpointshell.c:2024)
00010820: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Source endpoint 'Oracle' is using provider syntax 'Oracle' (provider_syntax_manager.c:941)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 'Stop reading when memory limit reached' is set to false (sorter.c:658)
00009292: 2023-07-21T16:39:24 [STREAM_COMPONEN ]I: Going to connect to Oracle server (DESCRIPTION=(ADDRESS_LIST=(FAILOVER=ON)(LOAD_BALANCE=OFF)(CONNECT_TIMEOUT=5)(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5155.srv.volvo.com)(PORT= 1525))(ADDRESS=(PROTOCOL=TCP)(HOST=segotl5156.srv.volvo.com)(PORT=1525)))(CONNECT_DATA=(SERVICE_NAME=gwms4p01_rw.srv.volvo.com))) with username WMS_REPLSVC (oracle_endpoint_imp.c:871)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: All stream components were initialized (replicationtask.c:3694)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle capture start time: now (oracle_endpoint_capture.c:605)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Sorter last run state: confirmed_record_id = 0, confirmed_stream_position = '' (sorter_transaction.c:3295)
00010820: 2023-07-21T16:39:24 [TARGET_APPLY ]I: Bulk max file size: 100 MB, 102400 KB (cloud_bulk.c:150)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used difference between the Replicate machine UTC time and Oracle Local time is '-7200' seconds (oracle_endpoint_capture.c:349)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Used Oracle archived Redo log destination id is '1' (oracdc_merger.c:646)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle instance uses more than one archived Redo log destination id. Please configure the correct destination id, if Redo logs of '1' destination cannot be accessed (oracdc_merger.c:650)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start processing online Redo log sequence 7442 thread 1 name /oracle/redo/gwms4p01/redo_02.log (oradcdc_redo.c:914)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Oracle Redo compatibility version 13000000 (oradcdc_redo.c:916)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Start REDO fetch from the context 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, thread 1 (oradcdc_thread.c:2652)
00010800: 2023-07-21T16:39:24 [SOURCE_CAPTURE ]I: Opened transaction list contains '2' transactions (oracle_endpoint_capture.c:830)
00009292: 2023-07-21T16:39:24 [SORTER ]I: Correcting source database time by 0 microseconds (sorter_transaction.c:197)
00009292: 2023-07-21T16:39:24 [SORTER ]I: 2 open transactions. Waiting for transaction consistency (sorter_transaction.c:303)
00011668: 2023-07-21T16:39:24 [TASK_MANAGER ]I: Start waiting for transactional consistency (replicationtask.c:3371)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]I: Task error notification received from subtask 0, thread 1, status 1020401 (replicationtask.c:3517)
00011668: 2023-07-21T16:39:25 [TASK_MANAGER ]W: Task 'ora_wms' encountered a fatal error (repository.c:5935)
00009292: 2023-07-21T16:39:25 [SORTER ]I: Final saved task state. Stream position 0000002f.408daa00.00000001.0000.00.0000:7442.94987.16, Source id 3, next Target id 1, confirmed Target id 0, last source timestamp 1689950364428554 (sorter.c:781)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: Failed (retcode -1) to execute statement: 'CREATE OR REPLACE TABLE `wms3check`.`attrep_apply_exceptions` ( `TASK_NAME` VARCHAR(128) NOT NULL, `TABLE_OWNER` VARCHAR(128) NOT NULL, `TABLE_NAME` VARCHAR(128) NOT NULL, `ERROR_TIME` TIMESTAMP NOT NULL, `STATEMENT` STRING NOT NULL, `ERROR` STRING NOT NULL ) USING DELTA LOCATION 'abfss://rawdata@gtooldlsdev.dfs.core.windows.net/Databases/wmsdelta/attrep_apply_exceptions' TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)' [1022502] (ar_odbc_stmt.c:4996)
00010820: 2023-07-21T16:39:25 [TARGET_APPLY ]E: RetCode: SQL_ERROR SqlState: 42000 NativeError: 80 Message: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:498)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.unity.EmptyHandle$.runWith(UCSHandle.scala:124)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:410)
Hello @AnujGupta ,
Thanks for the post in community!
Several reasons may cause the same error:
Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:
Go to the Microsoft Azure Databricks target endpoint settings.
Please go through Databricks Permissions and access endpoint especially:
1. When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.
spark.hadoop.hive.server2.enable.doAs false
2. To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.
Example:
fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>
For details, refer to the Databricks online help spark configuration .
3. The Client ID is configured correctly in the Endpoint.
Hope this helps.
Regards,
John.
Hello @AnujGupta ,
Thanks for the post in community!
Several reasons may cause the same error:
Error running query: Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
In most cases it's a configuration issue. Since the Microsoft Azure Databricks target endpoint consists of 2 independent parts, the Databricks ODBC Access and the Azure Storage Data Access, let's try to isolate the issue by following the below steps:
Go to the Microsoft Azure Databricks target endpoint settings.
Please go through Databricks Permissions and access endpoint especially:
1. When configuring a new cluster with Microsoft Azure Data Lake Storage (ADLS) Gen2, the following line must be added to the "Spark Config" section.
spark.hadoop.hive.server2.enable.doAs false
2. To be able to access the storage directories from the Databricks cluster, users need to add a configuration (in Spark Config) for that Storage Account and its key.
Example:
fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>
For details, refer to the Databricks online help spark configuration .
3. The Client ID is configured correctly in the Endpoint.
Hope this helps.
Regards,
John.
fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>
This will work in my case. I need to know currently I have hardcoded my accesskey in Databricks cluster. Is there any way to not hardcode key in spark config setting.
Hello @AnujGupta ,
Good news! Thanks a lot for your update.
Databricks supports 3 credentials types to access Azure Data Lake Storage Gen2:
Account keys is used in current major Replicate versions. You may raise Feature Request if you need other credentials types.
thanks,
John.
Please let me know how to raise Feature Request if I need to use service principal.
Hello @AnujGupta ,
There is an article Ideation Guidelines: Getting Started with Ideation .
Hope it helps.
Regards,
John.