37 Replies Latest reply: Mar 11, 2016 4:54 AM by Sebastian Haep RSS

    Semicolon in Data Source - How to load the data properly?

    Sebastian Haep

      Hi guys,

       

      I need your help, if there is anything possible.

       

      The situation is as follows:

      I have a csv file as data source which gets updated multiple times a day. The csv file is semicolon seperated. It always contains a fixed set of columns (20 columns). One column is a description. Like twice a week or so it happens that in 2-5 rows the description contains a semicolon in the text.

       

      As I want to automate the whole data loading process you see the point, right? Is there any way or possibility to do not get an error when this happens, caused by the fact that the rows get displaced due to the semicolon in the text. My first idea was to just replace the semicolon seperator with something else when the source exports the csv file, but this is unfortunately not possible.

       

      Is there any way to overcome this?

       

      Thanks guys!

       

      Greetings,

      Seb

        • Re: Semicolon in Data Source - How to load the data properly?
          Anton Aleksandrov

          You can use the function Subfield() to manually parse the field from your file with the delimiter.

          For example,if  you know that next field is e-mail address,therefore the next field should be checked for the presence of the char '@' .

          if(subfield(@1,';',5) like '*@*',subfield(@1,';',4),

            if(subfield(@1,';',6) like '*@*',subfield(@1,';',4) & subfield(@1,';',5)... as description

          if(subfield(@1,';',5) like '*@*',subfield(@1,';',5),

            if(subfield(@1,';',6) like '*@*',subfield(@1,';',6) ... as e_mail

          It works fine if your field 'description' is the one of the last fields in your input file,

          otherwise, all fields after 'description' mustbe processing through 'if' statement.

          • Re: Semicolon in Data Source - How to load the data properly?
            Stefan Wühl

            Can you use quoting when you export the data to the csv-file? Like embedding each field value into double quotes:

            "10"; "20"; "Text; with semi;colon"

            • Re: Semicolon in Data Source - How to load the data properly?
              Sasidhar Parupudi

              Probably skip those line using a where condition

              where SubStringCount(yourTextLine,';')>20

               

              and you can write the lines where count of semicolon in the text is more than 20, into a different table or a log file for possible analysis and rectification.

               

              Best solution would be to change the delimiter in your source file

               

              HTH

              Sasi

              • Re: Semicolon in Data Source - How to load the data properly?
                Marco Wedel

                Hi,

                 

                one solution might be:

                 

                QlikCommunity_Thread_202456_Pic2.JPG

                 

                using this csv file:

                QlikCommunity_Thread_202456_Pic1.JPG

                 

                 

                table1:
                LOAD Left([@1:n],Index([@1:n],';',2))&'"'&Mid([@1:n],Index([@1:n],';',2)+1,Index([@1:n],';',SubStringCount([@1:n],';')-16)-Index([@1:n],';',2)-1)&'"'&Mid([@1:n],Index([@1:n],';',SubStringCount([@1:n],';')-16)) as line
                FROM QlikCommunity_Thread_202456.csv (fix, codepage is 1252)
                Where RecNo()>1;
                
                table2:
                LOAD *
                From_Field (table1, line) (txt, codepage is 1252, no labels, delimiter is ';', msq);
                
                tabFieldNames:
                CrossTable (oldFieldName,newFieldName)
                LOAD 1,*
                FROM QlikCommunity_Thread_202456.csv (txt, codepage is 1252, no labels, delimiter is ';', msq)
                Where RecNo()=1;
                
                mapFieldNames:
                Mapping LOAD oldFieldName, newFieldName
                Resident tabFieldNames;
                
                DROP Tables table1, tabFieldNames;
                
                RENAME Fields using mapFieldNames;
                
                

                 

                hope this helps

                 

                regards

                 

                Marco

                  • Re: Semicolon in Data Source - How to load the data properly?
                    Peter Cammaert

                    Very nice solution Marco, your fixing bad source file formatting on the fly. I think it will be extremely difficult to find a simpler solution.

                     

                    Peter

                      • Re: Semicolon in Data Source - How to load the data properly?
                        Marco Wedel

                        Thanks.

                        I first tried to LOAD * From_Field with embedded labels, but failed doing so.

                        That's why I had to rename the fields in the end.

                        Nevertheless, I'm glad it worked.

                         

                        good night

                         

                        Marco

                          • Re: Semicolon in Data Source - How to load the data properly?
                            Sebastian Haep

                            Hi Marco,

                             

                            the outcome looks exactly like that what I need.

                             

                            It would be nice to add comments in the script or to explain me what the script is actually doing as I cant figure it out properly yet.

                             

                            But yeah this looks like what I need.

                             

                            Thanks a lot for your help already!

                             

                             

                            EDIT: I tested your QVW-File with mine csv. It does not work properly, but probably because my description column is the 5th and not the 3rd column. How can I fix that?

                             

                            And I saw that it generates a synthetic key. Is that correct?

                             

                            I would just store table 2 then in a qvd to use it somewhere else right?

                             

                            And another point is that it does not display the letters ö, ä, ü properly. I already tried to replaced codepage 1252 by utf8 or ansi (which is the same as 1252) but this was not the root cause or I took the wrong charset. Notepad++ says the file is Ansi encoded.

                              • Re: Semicolon in Data Source - How to load the data properly?
                                Stefan Wühl

                                In the first table load, Marco is reading your input file in with out parsing the columns, so each line goes into a single field 'line'.

                                 

                                Left([@1:n],Index([@1:n],';',2))&'"'&Mid([@1:n],Index([@1:n],';',2)+1,Index([@1:n],';',SubStringCount([@1:n],';')-16)-Index([@1:n],';',2)-1)&'"'&Mid([@1:n],Index([@1:n],';',SubStringCount([@1:n],';')-16))

                                 

                                He is adding double quotes " around the part in question, to find the correct positions for the double quotes, he is looking for the 2 semicolon counted from the beginning, he knows this will start the third column in the file, the text description. But he does not know how many semicolons are contained in the text. So for the end of the column, he is looking for the 16th semicolon counted from the end.

                                [edit: see below for the implementation shortcut. Marco is counting the semicolons from the left, and counting 'Total number of semicolons - expected semicolons after column 3' instead of just counting from the end of the line]

                                 

                                Left([@1:n],Index([@1:n],';',2)) // Find second semicolon, end of second column in line


                                &'"'&Mid([@1:n],Index([@1:n],';',2)+1,Index([@1:n],';',SubStringCount([@1:n],';')-16)-Index([@1:n],';',2)-1)&'"' // enclose the third column with double quotes


                                Mid([@1:n],Index([@1:n],';',SubStringCount([@1:n],';')-16)) // add columns 4-20

                                 

                                Once the columns with text is quoted, you can use MSQ style int format specifier to read it in without taking semicolons as separators, separating the line into fields in the data model.

                                 

                                The rest of the script is then mapping the column names to the names used in the file.

                                 

                                If you want to parse your text field located in the 5th column, you need to replace 2 by 4 and 16 by 14 in above expression.

                                 

                                With regard to your issue displaying the character correctly, could you upload a small sample csv file?

                                  • Re: Semicolon in Data Source - How to load the data properly?
                                    Stefan Wühl

                                    And since Index() does also takes a negative position index, you can simplify the line manipulation to

                                     

                                    Left([@1:n],Index([@1:n],';',2))&'"'&Mid([@1:n],Index([@1:n],';',2)+1,Index([@1:n],';',-17)-Index([@1:n],';',2)-1)&'"'&Mid([@1:n],Index([@1:n],';',-17))

                                    • Re: Semicolon in Data Source - How to load the data properly?
                                      Sebastian Haep

                                      Hi thank you for the explanation!

                                       

                                      I attached a sample file for the encoding issue.

                                        • Re: Semicolon in Data Source - How to load the data properly?
                                          Marco Wedel

                                          Hallo Sebastian,

                                           

                                          utf8 in the from_field load seems to work with umlauts:

                                           

                                          QlikCommunity_Thread_202456_Pic4.JPG

                                           

                                          // defining the number of the column that might contain semicolons
                                          LET vColumnWithSemicolon = 5;
                                          LET vColumns = 20;
                                          
                                          // load whole lines of csv file without headers and adding double quotes around values of above specified column
                                          // quoted semicolons will not be treated as delimiters. Append Record numbers as additional column to let lines be distinct
                                          table1:
                                          LOAD Left([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1))&'"'&Mid([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1)+1,Index([@1:n],';',SubStringCount([@1:n],';')+$(vColumnWithSemicolon)-$(vColumns)+1)-Index([@1:n],';',$(vColumnWithSemicolon)-1)-1)&'"'&Mid([@1:n],Index([@1:n],';',SubStringCount([@1:n],';')+$(vColumnWithSemicolon)-$(vColumns)+1))&';'&(RecNo()-1) as line
                                          FROM [https://community.qlik.com/servlet/JiveServlet/download/963714-208647/sample.csv] (fix, codepage is 1252)
                                          Where RecNo()>1;
                                          
                                          // load separate fields from previously loaded field "line"
                                          // This load creates field names @1 to @n (n being the ID/RecNo field)
                                          table2:
                                          LOAD *
                                          From_Field (table1, line) (txt, utf8, no labels, delimiter is ';', msq);
                                          
                                          // calculate fieldnumber of ID field
                                          LET vIDColumn = $(vColumns)+1;
                                          
                                          // create temporary table having columns for old and new field names (header row of csv file + ID field)
                                          tabFieldNames:
                                          CrossTable (oldFieldName,newFieldName)
                                          LOAD 1,*,'ID' as @$(vIDColumn)
                                          FROM [https://community.qlik.com/servlet/JiveServlet/download/963714-208647/sample.csv] (txt, codepage is 1252, no labels, delimiter is ';', msq)
                                          Where RecNo()=1;
                                          
                                          // create mapping table to translate default field names (@1 to @n) to column headers of csv file
                                          mapFieldNames:
                                          Mapping LOAD oldFieldName, newFieldName
                                          Resident tabFieldNames;
                                          
                                          // drop temporary tables
                                          DROP Tables table1, tabFieldNames;
                                          
                                          // rename fields using the previously generated mapping table
                                          RENAME Fields using mapFieldNames;
                                          

                                           

                                          hope this helps

                                           

                                          regards

                                           

                                          Marco

                                            • Re: Semicolon in Data Source - How to load the data properly?
                                              Stefan Wühl

                                              Is the challenge created by Peter still open?

                                               

                                              Another solution without a field name mapping could be:

                                               

                                              // defining the number of the column that might contain semicolons
                                              LET vColumnWithSemicolon = 5;
                                              LET vColumns = 20;
                                              LET vRight = vColumns - vColumnWithSemicolon; //# semicolons counted from the right 
                                              LET vLeft = vColumnWithSemicolon-1; //# of semicolons counted from the left
                                              
                                              
                                              // load whole lines of csv file without headers and adding double quotes around values of above specified column
                                              // quoted semicolons will not be treated as delimiters. Append lines to single LINE
                                              table1:
                                              LOAD
                                              CONCAT(line, chr(10)&chr(13) ,LineNo) as LINE;
                                              LOAD Left([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1))&'"'&Mid([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1)+1,Index([@1:n],';',-$(vRight))-Index([@1:n],';',$(vLeft))-1) &'"'&Mid([@1:n],Index([@1:n],';',-$(vRight))) as line,
                                              recno() as LineNo
                                              FROM [https://community.qlik.com/servlet/JiveServlet/download/963714-208647/sample.csv] (fix, codepage is 1252)
                                              //Where RecNo()>1
                                              ;
                                              
                                              // load separate fields from previously loaded field "LINE"
                                              // This load uses embedded field names
                                              table2:
                                              LOAD *
                                              From_Field (table1, LINE) (txt, utf8, embedded labels, delimiter is ';', msq);
                                              
                                  • Re: Semicolon in Data Source - How to load the data properly?
                                    Rob Wunderlich

                                    Marco, it's acceptable to provide comments in your script

                                     

                                    -Rob

                                  • Re: Semicolon in Data Source - How to load the data properly?
                                    Marco Wedel

                                    Hi,

                                     

                                    another version that includes an ID field to be able to load files with double rows.

                                    (and some comments, Rob ;-)

                                     

                                    QlikCommunity_Thread_202456_Pic3.JPG

                                     

                                     

                                    // defining the number of the column that might contain semicolons
                                    LET vColumnWithSemicolon = 3;
                                    LET vColumns = 20;

                                    // load whole lines of csv file without headers and adding double quotes around values of above specified column
                                    // quoted semicolons will not be treated as delimiters. Append Record numbers as additional column to let lines be distinct
                                    table1:
                                    LOAD Left([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1))&'"'&Mid([@1:n],Index([@1:n],';',$(vColumnWithSemicolon)-1)+1,Index([@1:n],';',SubStringCount([@1:n],';')+$(vColumnWithSemicolon)-$(vColumns)+1)-Index([@1:n],';',$(vColumnWithSemicolon)-1)-1)&'"'&Mid([@1:n],Index([@1:n],';',SubStringCount([@1:n],';')+$(vColumnWithSemicolon)-$(vColumns)+1))&';'&(RecNo()-1) as line
                                    FROM QlikCommunity_Thread_202456_v2.csv (fix, codepage is 1252)
                                    Where RecNo()>1;

                                    // load separate fields from previously loaded field "line"
                                    // This load creates field names @1 to @n (n being the ID/RecNo field)
                                    table2:
                                    LOAD *
                                    From_Field (table1, line) (txt, codepage is 1252, no labels, delimiter is ';', msq);

                                    // calculate fieldnumber of ID field
                                    LET vIDColumn = $(vColumns)+1;

                                    // create temporary table having columns for old and new field names (header row of csv file + ID field)
                                    tabFieldNames:
                                    CrossTable (oldFieldName,newFieldName)
                                    LOAD 1,*,'ID' as @$(vIDColumn)
                                    FROM QlikCommunity_Thread_202456_v2.csv (txt, codepage is 1252, no labels, delimiter is ';', msq)
                                    Where RecNo()=1;

                                    // create mapping table to translate default field names (@1 to @n) to column headers of csv file
                                    mapFieldNames:
                                    Mapping LOAD oldFieldName, newFieldName
                                    Resident tabFieldNames;

                                    // drop temporary tables
                                    DROP Tables table1, tabFieldNames;

                                    // rename fields using the previously generated mapping table
                                    RENAME Fields using mapFieldNames;

                                     

                                    hope this helps

                                     

                                    regards

                                     

                                    Marco

                                      • Re: Semicolon in Data Source - How to load the data properly?
                                        Sebastian Haep

                                        Hi guys,

                                         

                                        I just found out that all the scripts of you guys does not load my input file properly as I tried Stefans script.

                                         

                                        The source file has 1.3mio rows. With Marcos script I get just 13k rows and with Stefans script just around 8k.

                                        If I use smaller sample files like 10 rows both scripts load 10 rows.

                                         

                                        What is the reason for that? The rows which are loaded look correct but way too less rows.

                                        I do not have a sample file with this amount of rows to share it with you.

                                         

                                        Best regards,

                                         

                                        Seb

                                          • Re: Semicolon in Data Source - How to load the data properly?
                                            Stefan Wühl

                                            Sebastian,

                                             

                                            if you get issues when loading the data, I would stick to Marco's solution, since mine has an additional aggregation step that might show additional issues with large data sets.

                                             

                                            Marco's solution also included an ID field. I would first check if this ID field shows a sequential integer,and what the last number is. This should be the line where the load stucked.

                                             

                                            I would suggest also to add a line in the script after the first table load, so we look at each loading stage one after the other.

                                             

                                            Exit Script;

                                             

                                            Look at your source file. Is there anything special in the line where the QV LOAD stucked? There are sometimes issues with special characters, like EOF.

                                              • Re: Semicolon in Data Source - How to load the data properly?
                                                Sebastian Haep

                                                Hi Stefan,

                                                 

                                                I checked it and found this in the row after the last ID (in the field description):

                                                 

                                                File opened in VIM: problem_screen.jpg  File openend in Notepad++:problem_screen2.jpg

                                                 

                                                I think Qlikview recognizes it as EOF. If I just load the whole thing without the check and the "no eof" parameter it loads everything.

                                                 

                                                Normally the sign is a em dash but QlilView recognizes it as EOF.

                                                 

                                                Stefan why is your script not good for large data sets and where is the aggregation step? Your script is insanely fast and it would be super to get it work with a large data set, because I am afraid that Marcos script will take pretty long with 1.3 mio rows. For smaller data sets it is fast enough but it takes like 1min per 10k rows.

                                                  • Re: Semicolon in Data Source - How to load the data properly?
                                                    Stefan Wühl

                                                    Sebastian,

                                                     

                                                    how does this version perform? Does it load all records? Is it fast enough?

                                                     

                                                    You may need to adapt your changes for EOF filtering in the format specs section as you've done in the last version.

                                                     

                                                    edit:

                                                    If not all records are loaded, you can play with this variable

                                                     

                                                    Let vNumRowsAgg = 5000;

                                                     

                                                    and lower the number. This variable determines the number of lines from the original file that are feed into an aggregated LINE.

                                                      • Re: Semicolon in Data Source - How to load the data properly?
                                                        Sebastian Haep

                                                        Hi Stefan,

                                                         

                                                        thanks for this one. It performs better but does not load all records yet.

                                                        With 5000 its super fast like 3 Minutes, but only 800k rows. With 1000 I had 1.22 Mio. rows and 6 Minutes (also fast enough). With 500 I had 1.28 Mio rows and around 15 Minutes. Just testing right now with 10 if I get the whole 1.38 Mio rows, but this would probably be too slow.

                                                         

                                                        How is the aggregation working? Does it take 5000 rows at once and put it in 1 row of Line? And why is there such a difference in resulting loaded rows?

                                                         

                                                        I got one idea yesterday as I saw that there are several rows duplicates. I checked it with PowerQuery in Excel. Loaded the whole file but unseparated so 1 complete line = 1 value. I then let PowerQuery delete the duplicates and of 1.3mio rows there were just like 560k left. Is this something where we can optimize the script? I mean we could load it distinct to get less rows to check and load right?

                                                         

                                                        Thanks again for your help!

                                                          • Re: Semicolon in Data Source - How to load the data properly?
                                                            Stefan Wühl

                                                            Exactely, if you set vNumRowsAgg to 5000, you will concatenate 5000 lines of your source file into one Record in the QV data model.

                                                             

                                                            The FROM_FIELD LOAD is not only looking at a single field, it's also performing a table load per record, that's what is slowing down Marco's approach, the overhead of having 1.3 million table loads, each producing a single line and concatenating to the previous table load.

                                                             

                                                            I must admit I have no clear idea how many lines you can concatenate before you get issues, this probably also depends on the length of the lines. Are the lines similar or differ a lot in terms of character length?

                                                             

                                                            I would also have assumed that there should not be such a slow increase in reading in lines successfully, more like a certain threshold that works or works not. But that's just my guts feeling.

                                                             

                                                            I also thought about just reading your file in using Marco's approach, table1 , then store the whole thing into a text file, then read it in using standard means and embedded labels.

                                                            Should work - as soon as I manage to tell QV not to add quotes, which makes it really hard to read the new file in flawlessly. Since this request (no quoting on STORE) has been there for years, I don't think we can expect it soon.

                                                             

                                                            Regarding your idea: If you can live with a DISTINCT table load, just give it a try.

                                                              • Re: Semicolon in Data Source - How to load the data properly?
                                                                Stefan Wühl

                                                                Maybe it's an issue with the table1 size / symbol table size.

                                                                 

                                                                Remember that you are loading > 1 million lines, if we merge these lines into fewer LINEs, QV probably isn't able to make use of the DISTINCTness of the values, i.e. QV needs to store the full amount of characters into the symbol table.

                                                                 

                                                                BTW: Do you get an error message when the load stops below the 1.3 million records limit? Or does it just silently fail?

                                                                 

                                                                In attached v7 of the load script, I load the original text file only partially, vNumRowsAgg lines on each iteration.

                                                                Then do the FROM_FIELD LOAD, then drop the table1. Loop.

                                                                 

                                                                Maybe this has a positive effect on stability of the load of your complete text file?

                                                                  • Re: Semicolon in Data Source - How to load the data properly?
                                                                    Sebastian Haep

                                                                    Hi Stefan,

                                                                     

                                                                    to answer your questions:

                                                                    It also fails with a Aggr. value of 50. There is no error message it just stops, like it was successful. The lines differ from there length depending on the length of the description, all other fields should have more or less the same length.

                                                                     

                                                                    I also got no proper result with a distinct load. I tried it in table 1 and table 2 but both didnt gave me the numbers I wanted to see. First I was pretty near when I took a Aggregation of 5000 and Distinct load for table 2. I had like 515k rows but actually there should be 518k unique rows. I lowered the aggr. to 3000 and funnily I had 572k rows then. So this didnt work out.

                                                                     

                                                                    The v7 script you posted was super slow. I canceled it because after 3 minutes it had like 10k rows loaded.

                                                                     

                                                                    Marcos new version below gave me the correct number of rows after put in the 'no eof' parameter and it took 11min for 1.39 Mio rows.

                                                            • Re: Semicolon in Data Source - How to load the data properly?
                                                              Peter Cammaert

                                                              One lesson learned from this story: avoid data exchanges by .csv at all costs if they prove to be unreliable in the slightest way. Whenever you think your whole setup handles all cases encountered so far, and your document is published to end-users, the data exchange may unexpectedly break down again and will certainly make you go looking for another anomaly. A nice one is Newline-characters in the middle of text fields.

                                                               

                                                              Better methods of data exchange are binary files or direct DB connections. At least they will handle weird text strings without any unexpected problems.

                                                               

                                                              Peter

                                                        • Re: Semicolon in Data Source - How to load the data properly?
                                                          Marco Wedel

                                                          Hi,

                                                           

                                                          another approach might be:

                                                           

                                                          // defining the number of the column that might contain semicolons
                                                          LET vColumnWithSemicolon = 5;
                                                          LET vColumns = 20;
                                                          
                                                          // loading each semicolon separated subfield, and concatenating those who belong to the same field
                                                          table1:
                                                          Generic
                                                          LOAD ID,
                                                               '@'&FieldNum,
                                                               Concat(SubString,';',Seq)
                                                          Group By ID, FieldNum;
                                                          LOAD RecNo() as ID,
                                                               IterNo() as Seq,
                                                               IterNo()-RangeMin(RangeMax(IterNo()-$(vColumnWithSemicolon),0),Fields-$(vColumns)) as FieldNum,
                                                               SubField(line,';',IterNo()) as SubString
                                                          While IterNo()<=Fields;
                                                          LOAD [@1:n] as line,
                                                               SubStringCount([@1:n],';')+1 as Fields
                                                          FROM [QlikCommunity_Thread_202456_v3.csv] (fix, codepage is 1252)
                                                          Where RecNo()>1;
                                                          
                                                          // create temporary table having columns for old and new field names (header row of csv file + ID field)
                                                          tabFieldNames:
                                                          CrossTable (oldFieldName,newFieldName)
                                                          LOAD 1,*
                                                          FROM [QlikCommunity_Thread_202456_v3.csv] (txt, codepage is 1252, no labels, delimiter is ';', msq)
                                                          Where RecNo()=1;
                                                          
                                                          // create mapping table to translate default field names (@1 to @n) to column headers of csv file
                                                          mapFieldNames:
                                                          Mapping LOAD oldFieldName, newFieldName
                                                          Resident tabFieldNames;
                                                          
                                                          // drop temporary tables
                                                          DROP Tables tabFieldNames;
                                                          
                                                          // rename fields using the previously generated mapping table
                                                          RENAME Fields using mapFieldNames;
                                                          

                                                           

                                                          hope this helps

                                                           

                                                          regards

                                                           

                                                          Marco

                                                            • Re: Semicolon in Data Source - How to load the data properly?
                                                              Sebastian Haep

                                                              Hi Marco.

                                                               

                                                              this one looks pretty good! 11 Minutes and all 1.39 Mio. rows loaded. Yippie!

                                                               

                                                              Two questions here:

                                                              1. If I would like to filter out duplicate lines, as I found out that around 790k rows are duplicates, how should I do this with your script as every field is a seperate table. The thing here is that just a whole line in total is unique. An example is you have one article with ID 1234 but this article has different categories.

                                                              so you have

                                                              ID 1234 Cat 1

                                                              ID 1234 Cat 2

                                                              ... and so on.

                                                              I mean 11min is also fine if I have all rows including duplicates but if I can throw them out the size of the table would be much smaller.

                                                               

                                                              2. All fields are in seperate tables now. How can I map them easily together without doing map for every table again? Is it possible to do that in a loop? Afterwards I would be able to store this table in a qvd for further use.

                                                               

                                                              Thank you so much!

                                                                • Re: Semicolon in Data Source - How to load the data properly?
                                                                  Stefan Wühl

                                                                  If you need to join the tables back into one, there are some examples here in the forum, e.g.

                                                                  The Generic Load

                                                                  • Re: Semicolon in Data Source - How to load the data properly?
                                                                    Marco Wedel

                                                                    Hi,

                                                                     

                                                                    the shortest code i know of to recombine the tables created by the generic load into one is written by Rob Wunderlich :

                                                                    http://qlikviewcookbook.com/2010/05/use-cases-for-generic-load/

                                                                     

                                                                    So a solution could be:

                                                                     

                                                                    // defining the number of the column that might contain semicolons
                                                                    LET vColumnWithSemicolon = 3;
                                                                    LET vColumns = 20;
                                                                    
                                                                    // loading each semicolon separated subfield, and concatenating those who belong to the same field
                                                                    table1:
                                                                    Generic
                                                                    LOAD ID,
                                                                        '@'&FieldNum,
                                                                        Concat(SubString,';',Seq)
                                                                    Group By ID, FieldNum;
                                                                    LOAD RecNo() as ID,
                                                                        IterNo() as Seq,
                                                                        IterNo()-RangeMin(RangeMax(IterNo()-$(vColumnWithSemicolon),0),Fields-$(vColumns)) as FieldNum,
                                                                        SubField(line,';',IterNo()) as SubString
                                                                    While IterNo()<=Fields;
                                                                    LOAD [@1:n] as line,
                                                                        SubStringCount([@1:n],';')+1 as Fields
                                                                    FROM [QlikCommunity_Thread_202456_v2.csv] (fix, codepage is 1252)
                                                                    Where RecNo()>1;
                                                                    
                                                                    // create temporary table having columns for old and new field names (header row of csv file + ID field)
                                                                    tabFieldNames:
                                                                    CrossTable (oldFieldName,newFieldName)
                                                                    LOAD 1,*
                                                                    FROM [QlikCommunity_Thread_202456_v2.csv] (txt, codepage is 1252, no labels, delimiter is ';', msq)
                                                                    Where RecNo()=1;
                                                                    
                                                                    // create mapping table to translate default field names (@1 to @n) to column headers of csv file
                                                                    mapFieldNames:
                                                                    Mapping LOAD oldFieldName, newFieldName
                                                                    Resident tabFieldNames;
                                                                    
                                                                    // drop temporary tables
                                                                    DROP Tables tabFieldNames;
                                                                    
                                                                    // rename fields using the previously generated mapping table
                                                                    RENAME Fields using mapFieldNames;
                                                                    
                                                                    // Load initial result table to join to
                                                                    tabResult:
                                                                    LOAD RecNo() as ID
                                                                    AutoGenerate FieldValueCount('ID');
                                                                    
                                                                    // recombining the generic load tables into one (see: http://qlikviewcookbook.com/2010/05/use-cases-for-generic-load/)
                                                                    FOR i = NoOfTables()-1 to 0 STEP -1 
                                                                      LET vTable=TableName($(i)); 
                                                                      IF WildMatch('$(vTable)', 'table1.*') THEN 
                                                                        LEFT JOIN (tabResult) LOAD * RESIDENT [$(vTable)]; 
                                                                        DROP TABLE  [$(vTable)]; 
                                                                      ENDIF 
                                                                    NEXT i;
                                                                    

                                                                     

                                                                    QlikCommunity_Thread_202456_Pic5.JPG

                                                                     

                                                                    To only load distinct lines from your csv, you could add a "LOAD distinct [@1:n] as line From yourSource" at the beginning and subsequently load resident from this table instead of the csv file.

                                                                     

                                                                    hope this helps

                                                                     

                                                                    regards

                                                                     

                                                                    Marco

                                                                  • Re: Semicolon in Data Source - How to load the data properly?
                                                                    Stefan Wühl

                                                                    Another nice solution, Marco!

                                                                     

                                                                    It's always good to step back and look for alternative approaches, you've showed this not for the first time.

                                                                     

                                                                    I especially like the FieldNum calculation.

                                                                     

                                                                    Regards,

                                                                    Stefan