Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
morenoju
Partner - Specialist
Partner - Specialist

Generic calendar cardinality issue

Hi guys,

I followed a technique described in this Qlik Community thread where @johnw explains to create a generic calendar:

https://community.qlik.com/t5/QlikView-Scripting/Link-to-generic-calendar/td-p/230210

The key is to create a "linkage" table like this:

 

MasterLinkage:
LOAD
event_code,
'Event' as DataType,
event_start_date as Date_A,
event_link as link_id
RESIDENT event
;
CONCATENATE (MasterLinkage)
LOAD
link_data_internal_id,
'Traffic' as DataType,
link_date as Date_A,
traffic_link as link_id
RESIDENT traffic;

 

This way I can use Date_A to filter on event_start_date and link_date at the same time, which is awesome.

However, notice that I had to create that link_data_internal_id unique identifier that otherwise I would not have. Since I started using it, my qvds and my app have grown insanely. I'm assuming it's because of the cardinality. I have millions of traffic records, so having a unique identifier does not seem like a good idea.

What can I do? How could I keep the "linkage" to the generic calendar without using a unique id for the traffic records?

Thanks much!

Juan

 

Labels (1)
2 Replies
morenoju
Partner - Specialist
Partner - Specialist
Author

I've replaced the unique id "link_data_internal_id" with three fields:

Date(Floor("start_date")) AS date_key,
Time(Round(Frac(Time("start_date",'hh:mm')),1/96),'hh:mm') as time_key,
item_id as link_key,

These three fields identify uniquely each record. And since I avoid the cardinality issue, my qvds are now 80% smaller than they used to be!

However, the three fields are creating a syntetic key that is slowing the load script... I'm going to be testing if the new loading times are acceptable.

morenoju
Partner - Specialist
Partner - Specialist
Author

Even though everything works as desired, I'm a bit concerned about the synthetic key because it's definitely affecting the reload duration.

The tables involved are:

 

[event]:
LOAD 
	[event_id],
	[code] as event_code,
	[description] as event_description,
	[event_type_id],
	[event_status_id],
	[event_impact_id],
    [event_location],
	[city_id] as event_city_id,
	[link_id] as event_link,
    [facility_way_id] as event_roadway_id,
	[start_timestamp] as event_start_timestamp,
    Date(Floor(start_timestamp)) AS event_start_date,
    Floor(Hour(start_timestamp)) As event_start_hour,
	[close_timestamp] as event_close_timestamp,
    event_duration;

[traffic]:
LOAD
    Date(Floor("start_date")) AS date_key,
    Time(Round(Frac(Time("start_date",'hh:mm')),1/96),'hh:mm') as time_key,
    item_id as link_key,
    Date(Floor("start_date")) AS link_date,
    Time(Round(Frac(Time("start_date",'hh:mm')),1/96),'hh:mm') as "link_time_15min",
    Floor(Hour(Time(Round(Frac(Time("start_date",'hh:mm')),1/96),'hh:mm'))) as link_hour,
    "item_id" as traffic_link,
    Round("speed_mph") as "link_speed_mph";

 

 

The idea is that when using this calendar, the elements from both [events] and [traffic] get filtered:

 

Calendar_A:  
Load  
  TempDate AS Date_A,
  week(TempDate) As Week_A,  
  Year(TempDate) As Year_A,  
  Month(TempDate) As Month_A, 
  Year(TempDate)&'-'& num(Month(TempDate),'00') as YearMonth_A,
  Day(TempDate) As Day_A,  
  Week(weekstart(TempDate)) & '-' & WeekYear(TempDate) as WeekYear_A,  
  WeekDay(TempDate) as WeekDay_A  
Resident TempCalendar  
Order By TempDate ASC;

 

 

To link both tables to the calendar, I've created the tables below:

 

MasterLinkage:
LOAD
event_code,
'Event' as DataType,
event_start_date as Date_A,
event_link as link_id
RESIDENT event
;
CONCATENATE (MasterLinkage)
LOAD
date_key,
time_key,
link_key,
'Traffic' as DataType,
link_date as Date_A,
traffic_link as link_id
RESIDENT traffic;

 

 

It works, but as said, having the date_key, time_key, link_key synthetic key makes the reload slow. What would you say is the best approach to this kind of challenge?


Thanks