Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
chrisghughes
Contributor II
Contributor II

Group By not working as I would expect

!

Hello -

I have the following table:

   

IDStart Dateactive_dateContact Date
4e1205d9-439a-48d2-bf86-49c9d413652f6/20/20135/24/20182/10/2018
4e1205d9-439a-48d2-bf86-49c9d413652f6/20/20135/24/20183/17/2018
4e1205d9-439a-48d2-bf86-49c9d413652f6/20/20135/24/20183/31/2018

I want to figure out how many times a contact occured before/after the start/active dates.

I have the following script

load

NoConcatenate

"ID",

sum(if("Contact Date" <= [Start Date],1,0)) as calls_before,

sum(if("Contact Date" > [Start Date] and "Contact Date" < "active_date",1,0)) as calls_after,

sum(if("Contact Date" > date("active_date"),1,0)) as calls_after_discharge,

count("Contact Date") as total_calls

resident res_contacts

group by "ID";

My expectation would be I would get a single row like this

    

IDcalls_beforecalls_aftercalls_after_dischargetotal_calls
4e1205d9-4 !39a-48d2-bf86-49c9d413652f0303

However, what I get is

    

IDcalls_beforecalls_aftercalls_after_dischargetotal_calls
4e1205d9-439a-48d2-bf86-49c9d413652f0909

I'm coming from a SQL background so this grouping behavior doesn't make sense to me.  I've made sure there is not assoications or linkings inside the application to make sure I'm not getting a cartesian product.  Even in Data Manager it shows the wrong values.

!

1 Solution

Accepted Solutions
rubenmarin

The last grouping doesn't removes duplicates because you are doing a count of a field that has a value, so all rows will be counted, duplicated or not...

Before group by:

UniqueID, Contact Date

1, 01/01/0217

1, 01/01/0217

1, 01/01/0217


After group by:

UniqueID, Total_calls

1, 3


There are 3 rows to counts, doesn't matter if they have the same values, if you do a count the result is '3', maybe counting different UniqueId with:

Count(Distinct UniqueID&[Contact Date]) as total_calls

View solution in original post

10 Replies
christophebrault
Specialist
Specialist

Hi,

If you want to achieve this in the script, below is a solution. No doubt there  is a more elegant way to get the same result, but it's late here and I should sleep 😉

TEMP:

load * inline [

ID, Start Date, active_date, Contact Date

4e1205d9-439a-48d2-bf86-49c9d413652f, 6/20/2013, 5/24/2018, 2/10/2018,

4e1205d9-439a-48d2-bf86-49c9d413652f, 6/20/2013, 5/24/2018, 3/17/2018,

4e1205d9-439a-48d2-bf86-49c9d413652f, 6/20/2013, 5/24/2018, 3/31/2018

];


FINAL:

LOAD ID,

only(active_date) as active_date,

only([Start Date]) as [Start Date]

Resident TEMP group by ID;


left join(FINAL)

LOAD ID,

Count([Contact Date]) as calls_before

resident TEMP

where [Contact Date]<=[Start Date]

group by ID;


left join(FINAL)

LOAD ID,

Count([Contact Date]) as calls_after

resident TEMP

where [Contact Date]>=[Start Date] and [Contact Date]<=active_date

group by ID;


left join(FINAL)

LOAD ID,

Count([Contact Date]) as calls_afterDischarge

resident TEMP

where [Contact Date]>=active_date

group by ID;


drop table TEMP;

exit script;

Inscrivez vous à ma Newletter Qlik
DoNotMissQlik- Connect with me on Linkedin
JustinDallas
Specialist III
Specialist III

Not a solution, but with qlik, you might not want to pre-aggregate/cube data like this in your script.  Generally, such things are done in the UI.

BUT, if I had to do it, I would do something like this.  It appears that with the test data, that I'm getting the expected values.

Chughes:

LOAD ID,

  Date(Date#(start_date_text,'M/DD/YYYY')) AS 'start_date',

  Date(Date#(active_date_text,'M/DD/YYYY')) AS 'active_date',

  Date(Date#(contact_date_text,'M/DD/YYYY')) AS 'contact_date'

;

LOAD * Inline

[

'ID', 'start_date_text', 'active_date_text', 'contact_date_text'

'4e1205d9-439a-48d2-bf86-49c9d413652f', '6/20/2013', '5/24/2018', '2/10/2018'

'4e1205d9-439a-48d2-bf86-49c9d413652f', '6/20/2013', '5/24/2018', '3/17/2018'

'4e1205d9-439a-48d2-bf86-49c9d413652f', '6/20/2013', '5/24/2018', '3/31/2018'

]

;

Final:

NoConcatenate

LOAD ID,

SUM( If( contact_date < start_date, 1 , 0 )) AS 'calls_before',

    SUM( If( contact_date > start_date and contact_date < active_date, 1 , 0 )) AS 'calls_after',

    SUM( If( contact_date > active_date, 1 , 0 )) AS 'calls_after_discharge',

    Count(contact_date) AS 'total_calls'

Resident Chughes

GROUP BY ID

;

DROP TABLE Chughes

;

EXIT Script

;

SumthingWrong2.PNG

Are you absolutely positive that your quotes and so forth are acting correctly?

Hope this helps.

chrisghughes
Contributor II
Contributor II
Author

Hi -

This is currently what I'm doing actually. The pre-aggregation in the script. 

I'm baffled how you are getting a different value...  Just for giggles I copied your exact code into my app and I still get 9.

christophebrault
Specialist
Specialist

Hi,

Perhaps your raw data are duplicated ?

Add a Rowno() as ID in your table to check you source table

Inscrivez vous à ma Newletter Qlik
DoNotMissQlik- Connect with me on Linkedin
chrisghughes
Contributor II
Contributor II
Author

This is what is confusing me.  Shouldn't the group by ID resolve that?  Or does Qlik do some kind of joining or processing that I'm not understanding

chrisghughes
Contributor II
Contributor II
Author

The table "final_contacts" is where I'm getting the duplicate values. What seems to be happening is the detail records in program_data seem to be causing duplicates.

I'm about to just import these CSV files into SQL and write the query I know will work.  But I'm stubborn and want to learn how the joins in QLIK work.

There are no tables loaded in the GUI/Data manager. Everything is loaded via the script

Here is my code:

Person_Header:

LOAD

DISTINCT

    "Full Name - First Name First",

    "First Name",

    "Last Name",

    "Unique ID"

FROM program_data.csv;

Program_Data:

LOAD

    "Program Name",

    "Program Timeline Date",

    "Program Timeline Active Status",

    "Program Current Status",

    "Program Timeline Status Description",

    "Unique ID"

FROM program_data.csv

Person_Contacts:

LOAD

    "Contact Date",

    "Unique ID"

FROM cpm_contacts.csv

Person_Aggregate:

LOAD

    date(min("Program Timeline Date")) as [Start Date],

    if(WildMatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],[Program Timeline Date]),'Pending Enrollment','Inactive')

and WildMatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],-[Program Timeline Date]),'Pending Enrollment','Inactive'),'Never Active','Was Active') as CPM_Involved,

date(if(wildmatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],-[Program Timeline Date]),'Active'),date(now()),date(max([Program Timeline Date])))) as active_date,

    "Unique ID"

resident Program_Data

group by "Unique ID";

res_contacts:

NoConcatenate

LOAD

"Unique ID",

    date#(min("Program Timeline Date")) as [Start Date],

    if(WildMatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],[Program Timeline Date]),'Pending Enrollment','Inactive')

and WildMatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],-[Program Timeline Date]),'Pending Enrollment','Inactive'),'Never Active','Was Active') as CPM_Involved,

date#(if(wildmatch(Firstsortedvalue(DISTINCT [Program Timeline Status Description],-[Program Timeline Date]),'Active'),date(now()),date(max([Program Timeline Date])))) as active_date

Resident Program_Data

group by "Unique ID";

left join(res_contacts)

LOAD "Unique ID",

"Contact Date"

Resident Person_Contacts;

Final_Contacts:

load

"Unique ID",

sum(if("Contact Date" <= [Start Date],1,0)) as calls_before,

sum(if("Contact Date" > [Start Date] and "Contact Date" < "active_date",1,0)) as calls_after,

sum(if("Contact Date" > date("active_date"),1,0)) as calls_after_discharge,

count("Contact Date") as total_calls

resident res_contacts

group by "Unique ID";

Drop Table res_contacts;

rubenmarin

Hi Chris, Person_Contacts or Program_Data has more than one row for a UniqueId it wil cause duplicates in the left join of rec_contacts.

Possible solutions can be:

- Create a concatenated key to have an unique id per person-program (it it's any data that needs to keep the differences)

- Use applymap instead of join

- Do a group by or load distinct before Final_Contacts to remove duplicates

- ... Knowing what you want to do and the data behind more solutions are possible

chrisghughes
Contributor II
Contributor II
Author

Thank you very much for your response.  I thought that's what I was doing with the group by.  So how can I compare contact_date for a person as I'm trying to do in final contacts without having duplicates?  Seems the grouping I have already in final contacts would accomplish this

rubenmarin

The last grouping doesn't removes duplicates because you are doing a count of a field that has a value, so all rows will be counted, duplicated or not...

Before group by:

UniqueID, Contact Date

1, 01/01/0217

1, 01/01/0217

1, 01/01/0217


After group by:

UniqueID, Total_calls

1, 3


There are 3 rows to counts, doesn't matter if they have the same values, if you do a count the result is '3', maybe counting different UniqueId with:

Count(Distinct UniqueID&[Contact Date]) as total_calls