7 Replies Latest reply: Apr 27, 2016 11:34 AM by Marcus Sommer RSS

    Replace NULL and missing values as part of ETL for data quality?

    Georg Gottsmann

      Sorry for any inconvience with posts, there were a lot of errors when trying to publish this one and you answered faster than I could repair that

       

      Dear all,

       

      Suppose within the ETL process, I extract a table as following:

       

      The NULL resulted from a JOIN, the missing value in Field2 of row "b" is just empty.

       

      My question is: As part of ETL, are the following steps any good or do they even produce more harm later?

       

      Step 1: Replace all NULLs with <NULL>; to make them selectable in Listboxes:

       

      // Treat all NULL fields as a value
      NullAsValue *;
      
      // Set the value to replace NULL. In optimized QVD-loads add the condition where 1=1 or it will not work.
      Set NullValue = '<NULL>';
      
      

       

      New Table:

       

      Step 2: Replace also the missing value with <NULL>:

       

      Set NullInterpret = ''; // This will convert all blank fields to NULL in Excel files
      

       

      New Table:

       

       

      These steps in my opinion enable advanced analyses, but I am not sure whether to implement it everywhere. Going further: Do you think it makes sense to fill all NULL or missing values in all tables already when they are extracted from the source systems?

       

      Nachricht geändert durch Georg Gottsmann