Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Chirag_
Partner - Contributor II
Partner - Contributor II

Hadoop as a Target

Hi Team,

I need to understand some points regarding Hadoop target endpoint.

1. If there are multiple jobs hitting on target(Hadoop) so it will fail or wait till connection available or can be automatically resume when it fails?

2. How DDL handling carry out in Hadoop/Hive.?

 

Regards,

Chirag

2 Solutions

Accepted Solutions
aarun_arasu
Support
Support

Hello @Chirag_ ,

Well , I have not tested this scenario , but I believe below is what we expect it to be

  • When using text or Parquet format in HDFS, the behavior would be similar. If a new column is added to the schema, and records are written to these files in HDFS using the updated schema, the newly added column for existing records would typically contain null values.
  • Text and Parquet files in HDFS are structured data formats, and they maintain schema information. Therefore, when writing records with an updated schema, the newly added columns would be represented as null values for existing records.

 

Regards

Arun

View solution in original post

john_wang
Support
Support

Hi @Chirag_ ,

Aarun is correct, and I've conducted a quick test to confirm the NULL is the newly added column(s) value for the existing rows, a sample:

john_wang_0-1708345388499.png

The table was added a new column "notes" after the "id=2" replicated to target side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!

View solution in original post

8 Replies
aarun_arasu
Support
Support

Hello @Chirag_ ,

 

Thanks for reaching Qlik community

If there are multiple jobs hitting the target (Hadoop), Qlik Replicate generally handles this by queuing the operations. It won't automatically fail or wait indefinitely for a connection to become available. Instead, it typically manages concurrency by queuing operations and executing them in the order they were received, once resources become available. If a job fails due to a connection issue or other reasons, Qlik Replicate may retry the operation based on its retry settings and error handling configuration. These settings can usually be customized to suit the specific requirements of your environment.

When DDL changes occur, Replicate follows the below:

  1. Captures ALTER TABLE DDLs from the transaction log without identifying the DDL type (ADD/DROP/MODIFY COLUMN).
  2. Reads the new table metadata from the source backend.
  3. Compares the previous table metadata with the new table metadata in order to determine the change. Note that a single change may include multiple DDL operations performed on the backend.
  4. Uses the new table metadata to parse the subsequent DML events.

We request you to go through the following user guide for more information

https://help.qlik.com/en-US/replicate/November2023/Content/Replicate/Main/Hadoop/hadoop_target.htm

https://help.qlik.com/en-US/replicate/May2022/Content/Replicate/Main/Endpoints/DDLStatements.htm

 

Thanks & Regards

Arun

aarun_arasu
Support
Support

Hello @Chirag_ ,

 

You may also refer to below article on retry configuration that can be done on Qlik replicate

https://community.qlik.com/t5/Official-Support-Articles/Changing-Task-Recovery-options-for-Replicate...

 

Regards

Arun

aarun_arasu
Support
Support

Chirag_
Partner - Contributor II
Partner - Contributor II
Author

Hi @aarun_arasu ,

Thank You for the respond!

Regarding point 2,  As Hadoop is a file system , Assume, I have 4 fields and 10 records in my hive table, later on 1 column added so what will be the output for that 10 records ?Is that additional column shows null value or blank ? As it in RDBMS system it would be NULL.

Similarly,  how this handle in Text or Parque format for HDFS?

 

Regards,

Chirag

aarun_arasu
Support
Support

Hello @Chirag_ ,

Well , I have not tested this scenario , but I believe below is what we expect it to be

  • When using text or Parquet format in HDFS, the behavior would be similar. If a new column is added to the schema, and records are written to these files in HDFS using the updated schema, the newly added column for existing records would typically contain null values.
  • Text and Parquet files in HDFS are structured data formats, and they maintain schema information. Therefore, when writing records with an updated schema, the newly added columns would be represented as null values for existing records.

 

Regards

Arun

john_wang
Support
Support

Hi @Chirag_ ,

Aarun is correct, and I've conducted a quick test to confirm the NULL is the newly added column(s) value for the existing rows, a sample:

john_wang_0-1708345388499.png

The table was added a new column "notes" after the "id=2" replicated to target side.

Hope this helps.

John.

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!
Chirag_
Partner - Contributor II
Partner - Contributor II
Author

Hi @aarun_arasu , @john_wang ,

Thank for the respond and providing detailed info for raised question.

 

Regards,

Chirag

 

john_wang
Support
Support

Thank you so much for your great support @Chirag_ 

Help users find answers! Do not forget to mark a solution that worked for you! If already marked, give it a thumbs up!