Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm facing a problem that I know is due to the disconnect between how I think QV stores data and how it actually does.
My data has a Unique ID and a Name field. The data has 100,00 rows.
I can see in Excel that a Unique ID exists and a Name exists for every row.
If I use Count (Name) I get 45,000. This is because many people share the same name.
Count(DISTINCT (Name) also gives me 45,000.
What expression can I use to give me the result 100,000. I am trying to identify the level of completeness in this and other fields.
For example when I move on to eMail I may only have 20,000 rows with an entry and roughly 19,800 distinct emails. In this example I would want 20,000 as my result.
Hopefully this makes sense - any help appreciated
Thanks
Stuart
Aaahh !!
So your Name field is a Key, as in a field in 2 different table that joins them. I reckon that could well be your joker.
Could you share a screenshot of your data model and load script ?
Try
COUNT(ALL Name)
and
COUNT(ALL Email)
Thanks for the quick reply Manish
When I do Count (ALL Name) I get the same results as Count (Name). 45,000 instead of 100,00
Count( Name&UniquID) -- Does that work?
Create a Statistic Boxes for your Unique ID and Name field respectively and add all the 'count' functions to be displayed.
From that you should be able to see what';s what, especially regarding Missing and Null values. If the answer is not obvious then please post the results.
Thanks but no it doesnt. I always get a count of 100,000 for that as there is a Unique ID for everything and that probably concatanates the fields.
Thanks for the hint on adding a stats box. This is very usefull.
Do stats boxes not work with Key fields as for my Name field I get n/a for everything. But my Email field I get proper results.
Aaahh !!
So your Name field is a Key, as in a field in 2 different table that joins them. I reckon that could well be your joker.
Could you share a screenshot of your data model and load script ?
if i understood, try this
Count(distinct uniqueID)
or else
while loading the data take the rowno() parameter like below : because according to AQL logic duplicated rows are removed
Load *,
rowno() as rowno,
recno() as recno
from path;
see this also
count(distinct rowno) for validating
yes it was the key that was causing the issue.
I brought the field in a second time just for counting purposes and it now works.
thanks
Stuart