2 Replies Latest reply: Dec 5, 2017 3:58 AM by Malik Al Qassas RSS

    Load multiple csv file using NOCONCATENATE after loading

    Malik Al Qassas

      I'm trying to load all csv files from a folder and after that I want to use NoConcatenate.

      I need this to do the JOIN after load another csv file from a folder.

       

      The steps that I want to do:

           1. Load csv files in a table. After loading I want to use NoCocatenate. If I use it before the script will load each separably. So, I need to load all files and after that I want to add NoConcatenate line?

           2. Load other csv files. Also here I want to load all files and then use NoConcatenate line to avoid concatenation with the first one

           3. After that I want to do a join of these 2 tables.

       

      Any help? I tried also to load the table again using NoConcatenate and dropping the old one but the problem is still not resolved.

       

      this is the script that I used to load all files from the first folder:

       

      // First folder

      for each file in filelist('lib://IMPORT_CSV\*.csv')

          [join_ImportError_files]:

          load

          "timestamp",

              timestamp as ts_import_join,

              "day" as day_join,

              "time" as time_join,

              "hour" as hour_join,

              requestor_lab,

              production_lab,

              last_timestamp,

              id_log,

              request_number as request_number_join,

              request_number as request_number_import,

              sample_number as sample_number_join,

              universal_test_id

          from [$(file)]

          (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

      next file;

       

       

      [join_ImportError]:

      NoConcatenate

      LOAD

          "timestamp",

          ts_import_join,

          day_join,

          time_join,

          hour_join,

          requestor_lab,

          production_lab,

          last_timestamp,

          id_log,

          request_number_join,

          request_number_import,

          sample_number_join,

          universal_test_id

      Resident join_ImportError_files;

      DROP TABLE [join_ImportError_files]

       

      [join_ImportError_filter_unique]:

      inner keep

      LOAD

      request_number_join,

          MAX(timestamp) as ts_import_join,

          MAX(day_join) as day_join,

          MAX(time_join) as time_join,

          MAX(hour_join) as hour_join,

          sample_number_join

      resident join_ImportError

      Group By request_number_join, sample_number_join;

      drop table join_ImportError_filter_unique;

       

      // The second folder (DB)

      for each file in filelist('lib://DB_CSV\*.csv')

        [join_DB_SnapWroklistSample_files]:

        // NoConcatenate

        LOAD

            SAMPLECODE as sample_number_join,

            SAMPLECODE2,

            REQUESTCODE as request_number_join,

            REQUESTCODE as request_number_db,

            TIMEVALIDATED,

            TIMERESULT,

            DATETIMEREQ,

            DATETIMEREQ_TS,

            REQ_DATE,

            REQ_HOUR,

            TIMESAMPLEIN,

            TIMESAMPLEIN_TS as ts_db_join,

            SAMPLEIN_DATE,

            SAMPLEIN_HOUR

        from [$(file)]

      (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

       

      [join_DB_SnapWroklistSample]:

      NoConcatenate

      LOAD

          sample_number_join,

          SAMPLECODE2,

          request_number_join,

          request_number_db,

          TIMEVALIDATED,

          TIMERESULT,

          DATETIMEREQ,

          DATETIMEREQ_TS,

          REQ_DATE,

          REQ_HOUR,

          TIMESAMPLEIN,

          ts_db_join,

          SAMPLEIN_DATE,

          SAMPLEIN_HOUR

      Resident join_DB_SnapWroklistSample_files;

      DROP TABLE [join_DB_SnapWroklistSample_files];

       

       

      [join_SnapWroklistSample_filter_unique]:

      inner keep 

      LOAD

      sample_number_join,

          request_number_join,

          MIN(ts_db_join) as ts_db_join

      resident join_DB_SnapWroklistSample

      Group By request_number_join, sample_number_join;

      drop table join_SnapWroklistSample_filter_unique;

       

       

       

       

      [join_DB_Import]:

      NoConcatenate

      Load request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join

      resident [join_ImportError];

      join

      Load request_number_join, sample_number_join, ts_db_join

      resident [join_DB_SnapWroklistSample];

       

       

      [join_DB_Import2]:  //add_diff_field

      NoConcatenate

       

       

      LOAD request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join,

      ts_db_join , Num#(Interval(ts_db_join-ts_import_join,'S'),'0.0') as diff3

      RESIDENT [join_DB_Import];

       

      DROP TABLE [join_ImportError];

      DROP TABLE [join_DB_SnapWroklistSample];

       

      DROP TABLE [join_DB_Import];

        • Re: Load multiple csv file using NOCONCATENATE after loading
          Anil Babu

          After run this script, how is your data model looks like?

          • Re: Load multiple csv file using NOCONCATENATE after loading
            Malik Al Qassas

            I resolved the problem. This is the solution.

             

            The steps are the same :

                 1. Load all csv files in a table (without NoConcatenate: we can not use it because if we use it, the script will generate one table for each file). After that load again the table using NoConcatenate. Then, drop the first table (withour NoConcatenate).

                 2. Load other csv files with the same procedure of step 1.

                 3. After that I want to do a join of these 2 tables.

             

            //  Load all files from the first folder (Import):

            for each file in filelist('lib://IMPORT_CSV\*.csv')

                [join_ImportError_files]:

                load

                "timestamp",

                    timestamp as ts_import_join,

                    "day" as day_join,

                    "time" as time_join,

                    "hour" as hour_join,

                    requestor_lab,

                    production_lab,

                    last_timestamp,

                    id_log,

                    request_number as request_number_join,

                    request_number as request_number_import,

                    sample_number as sample_number_join,

                    universal_test_id

                from [$(file)]

                (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

            next file;

             

            // Load again the table with NoConcatenate

            [join_ImportError]:

            NoConcatenate

            LOAD

                "timestamp",

                ts_import_join,

                day_join,

                time_join,

                hour_join,

                requestor_lab,

                production_lab,

                last_timestamp,

                id_log,

                request_number_join,

                request_number_import,

                sample_number_join,

                universal_test_id

            Resident join_ImportError_files;

            // Drop the first table with NoConcatenate

            DROP TABLE [join_ImportError_files];

             

            // I use this to group by -> u can skip this step

            [join_ImportError_filter_unique]:

            inner keep

            LOAD

            request_number_join,

                MAX(timestamp) as ts_import_join,

                MAX(day_join) as day_join,

                MAX(time_join) as time_join,

                MAX(hour_join) as hour_join,

                sample_number_join

            resident join_ImportError

            Group By request_number_join, sample_number_join;

            drop table join_ImportError_filter_unique;

             

            // Do the same steps for the second csv files

            for each file2 in filelist('lib://DB_CSV\*.csv')

              [join_DB_SnapWroklistSample_files]:

              LOAD

                  SAMPLECODE as sample_number_join,

                  SAMPLECODE2,

                  REQUESTCODE as request_number_join,

                  REQUESTCODE as request_number_db,

                  TIMEVALIDATED,

                  TIMERESULT,

                  DATETIMEREQ,

                  DATETIMEREQ_TS,

                  REQ_DATE,

                  REQ_HOUR,

                  TIMESAMPLEIN,

                  TIMESAMPLEIN_TS as ts_db_join,

                  SAMPLEIN_DATE,

                  SAMPLEIN_HOUR

              from [$(file2)]

              (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

            next file2;

             

            [join_DB_SnapWroklistSample]:

            NoConcatenate

            LOAD

                sample_number_join,

                SAMPLECODE2,

                request_number_join,

                request_number_db,

                TIMEVALIDATED,

                TIMERESULT,

                DATETIMEREQ,

                DATETIMEREQ_TS,

                REQ_DATE,

                REQ_HOUR,

                TIMESAMPLEIN,

                ts_db_join,

                SAMPLEIN_DATE,

                SAMPLEIN_HOUR

            Resident join_DB_SnapWroklistSample_files;

            DROP TABLE [join_DB_SnapWroklistSample_files];

             

             

            [join_SnapWroklistSample_filter_unique]:

            inner keep 

            LOAD

            sample_number_join,

                request_number_join,

                MIN(ts_db_join) as ts_db_join

            resident join_DB_SnapWroklistSample

            Group By request_number_join, sample_number_join;

            drop table join_SnapWroklistSample_filter_unique;

             

            // Here we do the join

            [join_DB_Import]:

            NoConcatenate

            Load request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join

            resident [join_ImportError];

            join

            Load request_number_join, sample_number_join, ts_db_join

            resident [join_DB_SnapWroklistSample];

             

            // I use this to do the difference between 2 timestamp -> you can skip

            [join_DB_Import2]:  //add_diff_field

            NoConcatenate

            LOAD request_number_join, sample_number_join, ts_import_join, day_join, time_join, hour_join,

            ts_db_join , Num#(Interval(ts_db_join-ts_import_join,'S'),'0.0') as diff3

               

            RESIDENT [join_DB_Import];

             

            DROP TABLE [join_ImportError];

            DROP TABLE [join_DB_SnapWroklistSample];

            DROP TABLE [join_DB_Import];