Primary Keys

Report Inappropriate Content · ‎2010-07-05

All,

We have developed a large QV document that has approx 10 million rows of data and some 400-500 separate fields.

Up till now we have avoided any synthetic keys by careful data modelling.

We now have a problem that threatens our design. We want to introduce a new repository that contains Employee information. [The Employee Database] The Primary key for that data set is called EmployeeID.

In the current document multiple EmployeeID's can be returned in a single row of data although these have all been aliased when the scripts run against the upstream views/tables.

e.g. Select * from myDBView where SystemID = 123456

Returns:

SystemID SystemName AppManagerEmpID SupportManagerEmpID UpdateEmpID DepartmentHeadEmpID UnitHeadEmpID

123456 SystemA 45362 76534 45362 87676 99876

All of the values returned for fields ending EmpID are EmployeeID's. The EmployeeID is a simple integer unique to each individual and is the Primary key in the Employee Database.

What is the best way to handle this scenario in QlikView? I want to enrich the data with attributes from the Employee Database but want to avoid circular references and synthetic keys. In the production model there are some 40 aliased fields that contain the EmployeeID.

1: Create new aliased keys to join tables? e.g. Concat EmployeeID and 'A' as EmployeeID_A, Concat EmployeeID and 'B' as EmployeeID_B etc

2: Use Set Analysis

3: Give up

Tom.

brenner_martina · ‎2010-07-05

Hi, tatierney,

never give up with qlikview, it holds allways a workarround. In your case will be the best way by renaming all ID in all tables you need. It will be helpful to qualify all fields and unqualify IDs, p.e. all keyfields start with %:

Qualify *;

Unqualify '%*';

Sometimes you need to create a new keys by joining some fields with field1 & field2 & field3 as %keyfield....

Report Inappropriate Content · ‎2010-07-05

Thanks Martina

I did also think about implementing a data island solution but with over 250K records in the Employee database alone that was a non runner due to the likely performance issues.

I suspect I will have to go down the new keys creation root due to the complexity of the production model. We consume data from around 25 upstream systems which range from Oracle and MS SQL enterprise scale database solutions through to SharePoint lists, XML, XLS and even CSV inputs.

Tom

Report Inappropriate Content · ‎2010-07-05

I'm a bit unclear about the design question. About which employee do you want the data? In the example you've given there are 5. Do you currently have separate tables to which all of these ID fields refer? Or are they just floating as integers in a table?

Making the assumption that you have a table or tables to which these IDs are all referring in your existing data model, I would append the data to the various tables at the load level, and eliminate the additional dimension table of 'Employees'

Eg: Presumably you have an Employee table, of which you've made multiple copies to refer to all of these different instances of different 'kinds' of employees. Append the dimensions from your new employee repository that you intend to use into your employee table.

If you haven't yet got instances of an employee table that these various rows are returning, you have an entirely different problem.

Unfortunately, attempting to implement all of this with set analysis is a nightmare, and would make your application impossible to maintain.

What I've done in the past (which is admittedly inelegant…) is to append the various dimensions about my 'employees' into a single table, and then selectively load them into various copies of the table; ie: department heads: Load <fields> from <employee table> where <exists condition that would result in this member appearing in the department heads>. You need to alias the various 'Employee ID' fields to match the aliases you've used elsewhere in the application.

It's brutal, but it works.

If you economize on the amount of data that you actually need, and carefully select your fields so QV's compression can do its job well, it doesn't actually add much overhead to have multiple copies of the same employee who satisfies various conditions.

I wouldn't recommend this solution if you need to then hang further tables off of each kind of employee, because that quickly becomes a nightmare. It needs to be as collapsed as possible.

johnw · ‎2010-07-06

It sounds to me like you might want something like this:

SystemID, EmployeeID, EmployeeRole
123456, 45362, App Manager
123456, 76534, Support Manager
123456, 45362, Update Emp
123456, 87676, Department Head
123456, 99876, Unit Head

SystemID, SystemName
123456, SystemA

But it depends a whole lot on how you want to INTERACT with your data, not just what your data IS. This layout makes it easy to find, for instance, every System that the employee is associated with in some way. However, it makes it much more difficult to determine which systems have 87676 as the Department Head and 76534 as the Support Manager.

Related Topics