Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data validation

I have a run a script and stored the table into a QVD.

It has following fields

Server

Storage

field1

field2 etc

When I open the qvd and try to find the number of Server using text obejct  --- count(Server) it gives me  4469. But when I create a table obejct with the dimensions Server , Storage and field1 and import that into excel , I can see there are infact only 4417 servers.

When I chechked the Storage using both text obejct and excel method, again isame thing happens. Text object shows 4.6 terabytes and when i do sum(Storage) on excel shows 4.3 terabytes.

What am i missing here?. The storage is 4.3 terabytes after validating with the team  but why does qlik skew up the values?.

Note: The scripting has mapping .

1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

If they are true duplicates of all fields then :

     Load Distinct * from Temp.qvd (qvd) ;

If you just need to eliminate duplicates of one field then something like this will accept the first one but reject subsequent duplicates on that field.

     load  Distinct * from Temp.qvd (qvd) where not exists ( Server );

View solution in original post

7 Replies
Anonymous
Not applicable
Author

Your data could well have values missing or duplicated that cause these results, to get a feel for your data try creating Statistic Boxes for the fields in question with all the ... count functions displayed.

Also see if count(Server) gives the same or different value as count( distinct Server) .

Not applicable
Author

Count(distinct server) gives 4363. but the statistics box still shows 4463. And the export to the excel shows it as 4417.. I am not sure why the data is soo messed up.

Is there a way to find out which server names are duplicated?

Anonymous
Not applicable
Author

You may also find the Data Profiler QlikView App - Generic Data Profiler of Steve Dark of use to look at the data your have and any quirks therein.

Anonymous
Not applicable
Author

You could create straight table :

     Dimension          Server

     Expression          Count ( Server )

and sort it descending on the Expression to see what Server values [if any] have multiple entries.

Not applicable
Author

Thanks Bill ... Using the aove method I was able to find that some of the rows were indeed getting duplicated. Is there a way to get rid of the duplicates from the script?

Anonymous
Not applicable
Author

If they are true duplicates of all fields then :

     Load Distinct * from Temp.qvd (qvd) ;

If you just need to eliminate duplicates of one field then something like this will accept the first one but reject subsequent duplicates on that field.

     load  Distinct * from Temp.qvd (qvd) where not exists ( Server );

Not applicable
Author

Thanks Bill.. That worked..