Skip to main content
Announcements
Qlik Connect 2025: 3 days of full immersion in data, analytics, and AI. May 13-15 | Orlando, FL: Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Data Modelling Query

All,

I am new to designing QlikView data models and would appreciate any advice which can be given.

I have raw data in two source tables which both have a 'date' field and 'business unit' reference field in them. There are another two information table of fields relating to the business units and my calendar.

If I import these four tables directly in my script I obviously generate a data loop CALENDAR > DATA1 > BUS. UNIT > DATA2 > CALENDAR.

I could rename the date field in one of the data tables - but one of the primary calculations I wish to perform is a graph of the ratio of DATA1:DATA2 across time - so renaming one of the date fields would surely cause a mismatch across the two data tables when making selections on date, wouldn't it?

Anyway, my main query is to hope that someone can suggest how I best approach these kinds of data design issues, as the data stored in the two tables is of totally different types and so I don't see how I would combine in into a single 'fact' table.

I am concious of how much of an impact a bad data design can have on a document's performance and resutls

Thanks in advance

1 Solution

Accepted Solutions
johnw
Champion III
Champion III

Is there any actual problem with that data model? Are you getting correct results? On the surface, it looks like you started out just fine, then caused yourself problems trying to "solve" the synthetic key "problem". No particular reason for joining your Rooms Data and Offline Rooms is jumping out at me. As you say, the data in the two tables is very different.

View solution in original post

7 Replies
Not applicable
Author

Having two fields the same in both tables in effect give you a composite key. Using this key you can perform a join on your two datasources (like a SQL full outer join).

The load would go something like (example using just two fields, Costs and Sales) :


rawdata:
LOAD CalendarDate,
BusUnit,
Sales
FROM
[datasource1.xls]
(biff, embedded labels, table is Sheet1$);
join(rawdata)
LOAD CalendarDate,
BusUnit,
Costs
FROM
[datasource2.xls]
(biff, embedded labels, table is Sheet1$);


johnw
Champion III
Champion III

It sounds like you have this structure:

synthetic key table: synthetic key, business unit, date
table 1: synthetic key, field A
table 2: synthetic key, field B
table 3: business unit, field C
table 4: business unit, field D
calendar: date, month, year

And I'm not seeing the loop. Looks like a star schema centered on a synthetic key table. So I'm obviously not understanding your data from JUST the description of your tables. Perhaps you could list the tables and important fields something like I've done above? Take a picture of the table layout? Something like that?

Not applicable
Author

Hi John,

I realise now that the loop was being caused by the date fields in the two data tables having slighty different names (as I'd previously tried to solve the synthtic key issue).

Un-doing these edits gives me the following data structure:

I haven't tried joining the Rooms Data and Offline Rooms tables yet, as SQLMonkey suggested - largely because the data in the two tables is very different - i.e. there could be multiple records in the offline rooms table for each reporting date - and I don't know how JOIN will behave...

johnw
Champion III
Champion III

Is there any actual problem with that data model? Are you getting correct results? On the surface, it looks like you started out just fine, then caused yourself problems trying to "solve" the synthetic key "problem". No particular reason for joining your Rooms Data and Offline Rooms is jumping out at me. As you say, the data in the two tables is very different.

larsc
Partner - Contributor III
Partner - Contributor III

It is often easier to answer a data modelig query/ problem if we know the business request, and how it connected to the data model. What is it we want to solve with the QlikView-application? What queries need to be answered by the end users, and what data do we need to load into our applications to be able to do that?

For example - In your data model, I see ReportingDate in both the "Offline Rooms" and "Rooms data" tables. What is the difference between these two ReportingDate's, and is there any special reason why you need both of them?



Not applicable
Author

I just assumed that a synthetic key would cause QlikView calculation problems when it came to re-cutting the data over several dimensions etc - this was mainly based on, as all the guidance materials & manuals etc always describe trying to remove synthetic keys..

After some reconcilitations it doesn't appear to be causing any problem at all.

I have progressed significantly with my tool, and am beginning to find the answers to some of the business problems we were searching for. However, I can't quite shed the uneasy feeling that this Synthetic key will come back to bite me.

Now if only I can master set anlysis (time for another thread!)

johnw
Champion III
Champion III

Synthetic keys have a bad reputation. I believe I understand how they got that reputation, but they aren't necessarily a bad thing. I don't think the one you have is a problem; I think it's the solution. For more general information on synthetic keys, you might look at the following thread:

http://community.qlik.com/forums/t/31028.aspx