Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Conditional row format recognition

Hi all,

I'm just started with Qlikview 11, and I'm trying to continue every step written in the tutorial. My regional parameters are Spanish at first, but I'm able to change them to English (United States) to make QV write the SET statements to the comma (,) for thousend separator and also get the correct date format.

Getting through it, I found next situation: A file with three fileds (Date, Client, Sales) -->

Header: Date, Client, Sales,,,

  Rows: (There are many rows, but all of the match only one of this two optional formats) ->

            4/20/2008,Atlantic Marketing,810,,,

            "5/17/2008,Atlantic Marketing,""1,000"",,,"

I'm trying to apply some functions to the script file in order to read the .csv file:

LOAD Date#(Date,'D/MM/YY')as Date,

     Client,

     Num#(Sales,'#,###','.',',') as Sales

FROM

[..\Date1.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',');

In my opinion, Delimiter should be ',' as it's written in the script.

I've been trying different options into the function parameters fro Date# and Num#, but sometimes, some of the work for one of the row formats or the other.

Would be any way to explain to QV to apply a different row format depending on the starting character of the row (whic is ").

Any other idea? Maybe I'm just messing ideas with other scripting language background and there's another easier way to do it.

Thanks for your time and attention in advance.

Aitor.

1 Solution

Accepted Solutions
Miguel_Angel_Baeyens

Hi Aitor,

That is a kind of a nightmare file, and it requires a lot of string formatting, so my first piece of advice is "get a cleaner file if possible". Anyway, the following will do

CleanData:
LOAD Date(Date#(PurgeChar(SubField(@1:n, ',', 1), '"'), 'M/D/YYYY')) AS Fecha,
     Replace(SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 2), '$$$', ', ') AS Cliente,
     Num(SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 3) & SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 4)) AS Ventas
FROM
[Fecha1.csv]
(fix, codepage is 1252);

As you can see, there is a lot of string functions. If your file is some hundreds of thousands rows worth of data it will take real long to load, and probably you will need to clean up the file before loading it into QlikView.

But basically, what the above does is to use each line as a whole (@1:n) remove the double quotes and use each of the comma separated resulting values as fields, except for the Ventas field, where it concatenates the third and the fourth value, since "1,200" should read 1200. I replace as well the "" for $$$ (or any other non existent value) because I know the Cliente field has some values with commas within, that I then restore to " ," because I want to keep them.

It's a good example on how to struggle with crazy formatted sources using string functions, and I do not discard that may exist some other cleaner and easier ways to get the same results, irrespective the regional settings. Is it worth the work in QlikView instead of using some other tools to clean up data? That's up to you. I'd say that, in most cases, it's not worth the time you spend on doing this, but anyway.

Hope that helps.

Miguel

View solution in original post

5 Replies
kaushiknsolanki
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

     If the data will come always in the 2 format you said then you can try this.

     Load

     Date#(Date(PurgeChar(Date,'"'),'MM/DD/YYYY'),'DD/MM/YYYY') as Date,

     Client,

     PurgeChar(Sales,'"') as Sales

     From XYZ

     Hope this will help.

Regards,

Kaushik Solanki

Please remember to hit the 'Like' button and for helpful answers and resolutions, click on the 'Accept As Solution' button. Cheers!
Miguel_Angel_Baeyens

Hi Aitor,

Try specifying the quoting format, in the last line of your script:

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

Besides, the date format should be 'M/DD/YYYY' instead.

Hope that helps.

Miguel

Not applicable
Author

Hi,

first of all, thanks for your answers.

After posting the original message, I've been trying to use PurgeChar functions and the I've found your suggestion Kaushik.

Using your advice I've get closer to the correct interpretation of the file but only part of the dates are recognized as dates, the rest are recognized as text chains.

I got closer using this instead:

date(num(floor(Fecha))) as Fecha

Using this statement I got every date from the file in date data type.

Also I tried using the quoting parameter in the script statement as you said Miguel, but I haven't get anything as I would like.

Thanks for the file, but as far as I'm using the personal edition, I'm not able to open it.

I'm attaching my original file too, cuase it has more text format possibilities.

My actual script looks like:

SET ThousandSep='.';

SET DecimalSep=',';

SET MoneyThousandSep=',';

SET MoneyDecimalSep='.';

SET MoneyFormat='#,##0.00 €;-#,##0.00 €';

SET TimeFormat='h:mm:ss';

SET DateFormat='MM/DD/YYYY';

SET TimestampFormat='MM/DD/YYYY h:mm:ss[.fff]';

SET MonthNames='ene;feb;mar;abr;may;jun;jul;ago;sep;oct;nov;dic';

SET DayNames='lun;mar;mié;jue;vie;sáb;dom';

Directory;

LOAD date(num(floor(PurgeChar(Fecha,'"')))) as Fecha, 

     PurgeChar(Cliente, '"') as Cliente,

     Num(PurgeChar(Ventas,'"'),'#,###','.',',') as Ventas

FROM

[..\Fecha1.csv]

(txt, codepage is 1252, embedded labels, delimiter is ',', no quotes);

Thanks again for your answers.

Kind regards, Aitor.

P.S.: The original file is in Spanish so if neccessary, the translation would be:

Fecha = Date
Cliente = Client
Ventas = Sales


Miguel_Angel_Baeyens

Hi Aitor,

That is a kind of a nightmare file, and it requires a lot of string formatting, so my first piece of advice is "get a cleaner file if possible". Anyway, the following will do

CleanData:
LOAD Date(Date#(PurgeChar(SubField(@1:n, ',', 1), '"'), 'M/D/YYYY')) AS Fecha,
     Replace(SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 2), '$$$', ', ') AS Cliente,
     Num(SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 3) & SubField(PurgeChar(Replace(@1:n, ', ', '$$$'), '""'), ',', 4)) AS Ventas
FROM
[Fecha1.csv]
(fix, codepage is 1252);

As you can see, there is a lot of string functions. If your file is some hundreds of thousands rows worth of data it will take real long to load, and probably you will need to clean up the file before loading it into QlikView.

But basically, what the above does is to use each line as a whole (@1:n) remove the double quotes and use each of the comma separated resulting values as fields, except for the Ventas field, where it concatenates the third and the fourth value, since "1,200" should read 1200. I replace as well the "" for $$$ (or any other non existent value) because I know the Cliente field has some values with commas within, that I then restore to " ," because I want to keep them.

It's a good example on how to struggle with crazy formatted sources using string functions, and I do not discard that may exist some other cleaner and easier ways to get the same results, irrespective the regional settings. Is it worth the work in QlikView instead of using some other tools to clean up data? That's up to you. I'd say that, in most cases, it's not worth the time you spend on doing this, but anyway.

Hope that helps.

Miguel

Not applicable
Author

Hi Miguel,

thanks for your answer. It really works.

The code has so many things that I need to learn and I do agree that's a good way to struggle yourself with nightmare source files. If you get success you finally learn a bit more.

Yesterday while I was reading the qlikview 11 reference manual I found "Alt" function and I thought that it could be a good option for this file. I'll work that way after studying carefully you answer, and trying to understand every step.

About the deeper discussion, as I told at first, I'm knew on this. I have no idea about how much time the performance use to take with data. I'm just following a tutorial example so dataset it's short and the performance will not get worse beacause the cleaning transformations. In my future project, I'll see how the situation looks, and taking in mind your words, I'll probably have to clean my Perl books and prepare a pre-treatment script to format the file properly.

Once again, thanks for answering!!

Kind regards,

Aitor.