4 Replies Latest reply: Jan 9, 2013 10:37 AM by John Lifter RSS

    Loop and Load CSV Files

      Hi,

       

      Q1. I was wondering if anyone had an idea on how I could load (x) number of csv files that are located in a folder. These are all the same file structure so I thought I could create a connection, schema, but I cannot see a way of looping through the files and loading them. My thought was to load them all into a buffer/table for use later on in other data flows?

       

      Q2. The write lookup table seems to be for RDBMS only, I am working with CSV files only and would like to load a csv a lookup file for later use in functions and transforms?

       

       

      Thanks

      James

        • Re: Loop and Load CSV Files

          Hi James,

           

          You can load multiple files using *

           

          Directory;

          LOAD FieldA,

               FieldB

          FROM

          [..\Desktop\*.csv]

          (txt, codepage is 1252, embedded labels, delimiter is ',', msq);

           

          Hope this helps

          Regards

          • Re: Loop and Load CSV Files

            The QlikView Expressor Read File operator does not support the type of looping you are trying to perform, however there are three approaches to accomplishing what you want to do.

             

            First, if you purchase a license for standard edition, you will be able to write a script that looks into the source directory, determines the number of files that need to be processed, and then runs the dataflow a sufficient number of times to process each of the files.

             

            Second, you can use the Read Custom operator to perform the same sort of looping and processing.

             

            The coding to perform the processing within the Read Custom operator is a bit too involved to cover here.  You should read a reference on the Lua programming language to understand how to use the file I/O functions that will be needed.  For example, you will need to use the os.execute("dir /b ..."):lines() statement to return an iterator that will allow you to retrieve the file names in the directory represented by ....  Then you will need to create a file handle, set it to one of the files, and use the read function on the handle to retrieve each line of the file.  Your code repeats this processing for each of the files.  At this point, you can emit each line from the Read Custom operator and use a downstream Transform operator to parse each line into its individual fields.

             

            Third, again within the Read Custom operator, first retrieve a listing of the names of the files (e.g., file1 and file2), then use the Windows copy file1+file2 destination_file command to combine the files into a single file.  You can then use additional code in the Read Custom operator to process this large single file, or you can add a second step to the dataflow and read the large single file using a Read File operator.  Note, however, that if each of your csv files contains a header row you will need some way to distinguish and remove the header rows that will be distributed throughout the large single file.

             

            Besides the fact that with the second and third approaches you do not need to purchase a license, the second and third approaches also allow you to process all of the input files in a single execution of the dataflow, whereas the first approach runs the dataflow once for each file.

             

            **

             

            The lookup table is designed to store data from any source.  Yes, it looks like you are defining a database table when you configure a lookup table.  That is because the lookup table is actually a database table managed by a RDBMS within Expressor.  But the data used to populate this table can come from a csv file.  Remember that upstream of the operator that writes to the lookup table is any type of input operator.  So you can read from a csv file and store the data in a lookup table for future use.

              • Re: Loop and Load CSV Files

                Thanks for the swift reply John.

                 

                Can you expand a bit more on the approach using server edition, I was under the impression that the server just executes the dataflow packages?

                 

                You say write a script that looks at the source directory, what scripting language are we talking about, Expressor scripting or something else?

                 

                Thanks

                James

                  • Re: Loop and Load CSV Files

                    Yes, the Data Integration Engine (frequently called the server) in the Expressor Standard or Enterprise Editions executes dataflow packages that contain one or more dataflows.  To run, you issue the etask command from within a command window or under the direction of a scheduler application.  This contrasts with the Expressor Desktop Edition in which you may only run a dataflow from within Studio or from a QlikView script that itself must be run from within the QlikView desktop application.

                     

                    Since you run etask from within a command window, the actual statement making this call may be included in a script file such as a Windows batch file.  The etask command takes several arguments.  Some of these arguments specify the name of a configuration artifact or configuration values that can override operator properties that you set when designing the dataflow in Studio.  Specifically, you could use one of these arguments to change the name of the file to be read by the Read File operator.

                     

                    While it would be possible to write a Windows batch script that loops over a listing of files to be processed, invoking etask in each iteration, and passing the name of one of the files as a command line argument, these types of scripts are generally written using Expressor Datascript.  The advantage of using Expressor Datascirpt is that your script has the ability to invoke nearly all of the Expressor Datascript functions, such as the date or string manipulation functions, which makes your scripting much more powerful.  Also, the script can use the Expressor Datascript file I/O commands, which provide the functions needed to easily set up the loop that processes each file.

                     

                    Approach 1 that I described in an earlier posting, would place this scripting in a separate script file that you would call from within a command window.  Approach 2 would place very similar scripting into the Read Custom operator.  By moving the scripting into the Read Custom operator, you would be able to process the collection of files when running the dataflow from within Studio.

                     

                    So, one of the very valuable features of the Expressor Standard and Enterprise Editions is the ability to run and coordinate execution of one or multiple dataflows from a single script file.  This gives you the ability to monitor the success of each step of a dataflow and modify the processing accordingly.

                     

                    Hope this brief discussion answered your question.