Skip to main content
Announcements
Global Transformation Awards! Applications are now open. Submit Entry
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Semicolon in Data Source - How to load the data properly?

Hi guys,

I need your help, if there is anything possible.

The situation is as follows:

I have a csv file as data source which gets updated multiple times a day. The csv file is semicolon seperated. It always contains a fixed set of columns (20 columns). One column is a description. Like twice a week or so it happens that in 2-5 rows the description contains a semicolon in the text.

As I want to automate the whole data loading process you see the point, right? Is there any way or possibility to do not get an error when this happens, caused by the fact that the rows get displaced due to the semicolon in the text. My first idea was to just replace the semicolon seperator with something else when the source exports the csv file, but this is unfortunately not possible.

Is there any way to overcome this?

Thanks guys!

Greetings,

Seb

37 Replies
Not applicable
Author

Hi Stefan,

to answer your questions:

It also fails with a Aggr. value of 50. There is no error message it just stops, like it was successful. The lines differ from there length depending on the length of the description, all other fields should have more or less the same length.

I also got no proper result with a distinct load. I tried it in table 1 and table 2 but both didnt gave me the numbers I wanted to see. First I was pretty near when I took a Aggregation of 5000 and Distinct load for table 2. I had like 515k rows but actually there should be 518k unique rows. I lowered the aggr. to 3000 and funnily I had 572k rows then. So this didnt work out.

The v7 script you posted was super slow. I canceled it because after 3 minutes it had like 10k rows loaded.

Marcos new version below gave me the correct number of rows after put in the 'no eof' parameter and it took 11min for 1.39 Mio rows.

Not applicable
Author

Hi Marco.

this one looks pretty good! 11 Minutes and all 1.39 Mio. rows loaded. Yippie!

Two questions here:

1. If I would like to filter out duplicate lines, as I found out that around 790k rows are duplicates, how should I do this with your script as every field is a seperate table. The thing here is that just a whole line in total is unique. An example is you have one article with ID 1234 but this article has different categories.

so you have

ID 1234 Cat 1

ID 1234 Cat 2

... and so on.

I mean 11min is also fine if I have all rows including duplicates but if I can throw them out the size of the table would be much smaller.

2. All fields are in seperate tables now. How can I map them easily together without doing map for every table again? Is it possible to do that in a loop? Afterwards I would be able to store this table in a qvd for further use.

Thank you so much!

swuehl
MVP
MVP

Another nice solution, Marco!

It's always good to step back and look for alternative approaches, you've showed this not for the first time.

I especially like the FieldNum calculation.

Regards,

Stefan

swuehl
MVP
MVP

If you need to join the tables back into one, there are some examples here in the forum, e.g.

The Generic Load

MarcoWedel

Hi,

the shortest code i know of to recombine the tables created by the generic load into one is written by rwunderlich :

http://qlikviewcookbook.com/2010/05/use-cases-for-generic-load/

So a solution could be:

// defining the number of the column that might contain semicolons

LET vColumnWithSemicolon = 3;

LET vColumns = 20;

// loading each semicolon separated subfield, and concatenating those who belong to the same field

table1:

Generic

LOAD ID,

    '@'&FieldNum,

    Concat(SubString,';',Seq)

Group By ID, FieldNum;

LOAD RecNo() as ID,

    IterNo() as Seq,

    IterNo()-RangeMin(RangeMax(IterNo()-$(vColumnWithSemicolon),0),Fields-$(vColumns)) as FieldNum,

    SubField(line,';',IterNo()) as SubString

While IterNo()<=Fields;

LOAD [@1:n] as line,

    SubStringCount([@1:n],';')+1 as Fields

FROM [QlikCommunity_Thread_202456_v2.csv] (fix, codepage is 1252)

Where RecNo()>1;

// create temporary table having columns for old and new field names (header row of csv file + ID field)

tabFieldNames:

CrossTable (oldFieldName,newFieldName)

LOAD 1,*

FROM [QlikCommunity_Thread_202456_v2.csv] (txt, codepage is 1252, no labels, delimiter is ';', msq)

Where RecNo()=1;

// create mapping table to translate default field names (@1 to @n) to column headers of csv file

mapFieldNames:

Mapping LOAD oldFieldName, newFieldName

Resident tabFieldNames;

// drop temporary tables

DROP Tables tabFieldNames;

// rename fields using the previously generated mapping table

RENAME Fields using mapFieldNames;

// Load initial result table to join to

tabResult:

LOAD RecNo() as ID

AutoGenerate FieldValueCount('ID');

// recombining the generic load tables into one (see: http://qlikviewcookbook.com/2010/05/use-cases-for-generic-load/)

FOR i = NoOfTables()-1 to 0 STEP -1

  LET vTable=TableName($(i));

  IF WildMatch('$(vTable)', 'table1.*') THEN

    LEFT JOIN (tabResult) LOAD * RESIDENT [$(vTable)];

    DROP TABLE  [$(vTable)];

  ENDIF

NEXT i;

QlikCommunity_Thread_202456_Pic5.JPG

To only load distinct lines from your csv, you could add a "LOAD distinct [@1:n] as line From yourSource" at the beginning and subsequently load resident from this table instead of the csv file.

hope this helps

regards

Marco

Not applicable
Author

Hi Marco,

this solution works just fine!

But now I am experiencing an issue while reloading the document through the QMC.

Pushed my whole project to the qlikview server and everything works fine except this one here.

If I start it manually through opening the document and reloading it, no problem. Everything works fine. The output qvd is as it should be and about 69mb big and the whole script takes like 13-15min to run through.

If it runs through the QMC task, it runs without any error in the scheduler but the output is totally wrong. It has just 2 columns ID and @. The file is 190mb big and it takes only 4min to run through. Seems like the script ist just stopping at some point and generates the qvd.

Any idea whats the root cause here? I first thought I made a change in the code but I took a working backup version and have the same issue here.

The document is built with a full licensed client. The technical user of the QV Server has Full Control in the path where the csv is.

Thanks in advance guys and best regards,

Seb

swuehl
MVP
MVP

Could you post the document log of that server script run?

Video - Troubleshooting reload and distribution task failures

edit: Maybe also the document log when you run it manually.

Not applicable
Author

Yes sure.

Please find attached the Logfile when its started by the server and manually by me with my client version.

The only difference I see is the QlikView Version. The server is running on V11 and my client is V12. Is that a problem?

As you can see in the log file it does not generate these 20 tables when executed by the server, instead only @.

Thanks!