Solved: Hadoop as a Target - Qlik Community

Chirag_ · ‎2024-02-19

Hi Team,

I need to understand some points regarding Hadoop target endpoint.

1. If there are multiple jobs hitting on target(Hadoop) so it will fail or wait till connection available or can be automatically resume when it fails?

2. How DDL handling carry out in Hadoop/Hive.?

Regards,

Chirag

aarun_arasu · ‎2024-02-19

Hello @Chirag_ ,

Well , I have not tested this scenario , but I believe below is what we expect it to be

When using text or Parquet format in HDFS, the behavior would be similar. If a new column is added to the schema, and records are written to these files in HDFS using the updated schema, the newly added column for existing records would typically contain null values.
Text and Parquet files in HDFS are structured data formats, and they maintain schema information. Therefore, when writing records with an updated schema, the newly added columns would be represented as null values for existing records.

Regards

Arun

View solution in original post

john_wang · ‎2024-02-19

Hi @Chirag_ ,

Aarun is correct, and I've conducted a quick test to confirm the NULL is the newly added column(s) value for the existing rows, a sample:

The table was added a new column "notes" after the "id=2" replicated to target side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

aarun_arasu · ‎2024-02-19

Hello @Chirag_ ,

Thanks for reaching Qlik community

If there are multiple jobs hitting the target (Hadoop), Qlik Replicate generally handles this by queuing the operations. It won't automatically fail or wait indefinitely for a connection to become available. Instead, it typically manages concurrency by queuing operations and executing them in the order they were received, once resources become available. If a job fails due to a connection issue or other reasons, Qlik Replicate may retry the operation based on its retry settings and error handling configuration. These settings can usually be customized to suit the specific requirements of your environment.

When DDL changes occur, Replicate follows the below:

Captures ALTER TABLE DDLs from the transaction log without identifying the DDL type (ADD/DROP/MODIFY COLUMN).
Reads the new table metadata from the source backend.
Compares the previous table metadata with the new table metadata in order to determine the change. Note that a single change may include multiple DDL operations performed on the backend.
Uses the new table metadata to parse the subsequent DML events.

We request you to go through the following user guide for more information

https://help.qlik.com/en-US/replicate/November2023/Content/Replicate/Main/Hadoop/hadoop_target.htm

https://help.qlik.com/en-US/replicate/May2022/Content/Replicate/Main/Endpoints/DDLStatements.htm

Thanks & Regards

Arun

aarun_arasu · ‎2024-02-19

Hello @Chirag_ ,

You may also refer to below article on retry configuration that can be done on Qlik replicate

https://community.qlik.com/t5/Official-Support-Articles/Changing-Task-Recovery-options-for-Replicate...

Regards

Arun

aarun_arasu · ‎2024-02-19

Hello team,

If our response has been helpful, please consider clicking "Accept as Solution". This will assist other users in easily finding the answer.

Regards,
Arun

Chirag_ · ‎2024-02-19

Hi @aarun_arasu ,

Thank You for the respond!

Regarding point 2, As Hadoop is a file system , Assume, I have 4 fields and 10 records in my hive table, later on 1 column added so what will be the output for that 10 records ?Is that additional column shows null value or blank ? As it in RDBMS system it would be NULL.

Similarly, how this handle in Text or Parque format for HDFS?

Regards,

Chirag

aarun_arasu · ‎2024-02-19

Hello @Chirag_ ,

Well , I have not tested this scenario , but I believe below is what we expect it to be

When using text or Parquet format in HDFS, the behavior would be similar. If a new column is added to the schema, and records are written to these files in HDFS using the updated schema, the newly added column for existing records would typically contain null values.
Text and Parquet files in HDFS are structured data formats, and they maintain schema information. Therefore, when writing records with an updated schema, the newly added columns would be represented as null values for existing records.

Regards

Arun

john_wang · ‎2024-02-19

Hi @Chirag_ ,

Aarun is correct, and I've conducted a quick test to confirm the NULL is the newly added column(s) value for the existing rows, a sample:

The table was added a new column "notes" after the "id=2" replicated to target side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Chirag_ · ‎2024-02-20

Hi @aarun_arasu , @john_wang ,

Thank for the respond and providing detailed info for raised question.

Regards,

Chirag

john_wang · ‎2024-02-20

Thank you so much for your great support @Chirag_

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

Hadoop as a Target

Best Practices

Configuration

Connectivity - Sources or Targets

General Question

Performance

Hadoop as a Target

Best Practices

Configuration

Connectivity - Sources or Targets

General Question

Performance

Related Topics