Handling default type/data conversion of NULL entries from Source (SAP) in Replicate
Qlik Replicate by default does type conversion to a NULL/ BLANK entry from source, for e.g., a NULL entry in Date columns is converted into “00-00-0000”. Our business wants data in target as is and don’t want to do any conversion.
Need a way to handle it to all tables and columns in the Global-Transformation, as we see many tables/ columns with this behavior?
There are several key points about the NULL/BLANK convert to 00-00-0000/0 in the end-to-end path (I'm assuming the behavior occurs in both Full Load + CDC. The below steps are focusing on FL. Correct me if I'm inaccurate):
1. The read from SAP source
2. The interim CSV file
3. The data processing/transformation in Replicate
4. The write to Databricks delta files
5. The Hive client display
The behavior may be caused by any one fact in above path, for example maybe all the process in Replicate are correct however the Hive display NULL DATE as 00-00-0000 in the last step. Let's troubleshoot it step by step (corresponding to above points:).
(1) Replace your Databricks delta target by a generic FILE target, what's the result in the target file?
(2) In Databricks delta EndPoint adds two internal parameters and set them to true
keepCSVFiles
keepErrorFiles
Re-run the task and check the "***.csv" files under the task folder to see what's the values in interim files
(4) Check the real values in Databricks delta storage files
(5) create a test file and make a test table map to it in Hive then check if Hive client (or other client) interpret NULL value in DATE column to "00-00-0000".
I did not setup an env to reproduce it here. Based on the troubleshooting result then we may figure out if a transformation helps or not.
Only if we confirm there is no way (eg transformation) or no any WA available then it will be a FR item.
Firstly apologies for delays in responding to your last comment. We were working on ways to handle this situation at our target side - but no success yet due to other restrictions, would be great if we have an enhancement or a transformation option from Qlik to handle Blank/ NULL scenarios,
Below are our findings based on your suggestions,
Testing done so far:
(1) Replace your Databricks delta target by a generic FILE target, what's the result in the target file?
We don’t have license for File type as a target, can Qlik test this in their lab?
(2) We picked couple of tables having blank entries in source and below is the finding
Table Name: VBPA
Column: item_number_of_the_sd_document
Table name: VBAK
Field name: ZZ_VDATU_ORIGINL
(3) Configured databricks endpoint to drop csv, and Re-run the task, checked the "***.csv" files under the task folder to see what's the values in interim files
VBPA Screenshot:
VBAK Screenshot:
(5) Check the real values in Databricks delta storage files
Results in Databricks table (VBPA) & storage files (parquets)
Results in Databricks table (VBPA) & storage files (parquets)
Based on our observations we can see that Delta connector is indeed converting Blank entries in source to a default value for certain data types like NUMCHAR (0) and DATE (0101-01-01).
Request you to provide a solution to to handle the Blank entries in source as Blank in target as well (Databricks for our case) irrespective of data types.
I have not such an env for my test, I'd like suggest you open a support case for further investigation. And you may get a temporary license for your test, please contact your region sales representative, or account manager. Let me know if you need any additional assistance.
NOTE: Upon clicking this link 2 tabs may open - please feel free to close the one with a login page. If you only see 1 tab with the login page, please try clicking this link first: Authenticate me! then try the link above again. Ensure pop-up blocker is off.