Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
My Qlik task successfully replicates :
Source and Target tables are all in Oracle. In the below scenario John Doe has two addresses.
In my target database there is a third table that joins Persons with their Addresses.
The target table "Person" looks as follows:
ID (primary key, sequence.next val) | Person_ID_Source | Name |
4 | 5665888 | John Doe |
The target table "Address" looks as follows :
ID (primary key, sequence.nextval) | Person_ID_Source | Street_No | City |
20 | 5665888 | Holland street 123 | New York |
21 | 5665888 | Baker street 456 | New York |
Making the connection via Person_ID_Source I want my third target table "Person_Address" to be filled as follows :
ID (primary key, sequence.nextval) | PersonID | AddressId |
1 | 4 | 20 |
2 | 4 | 21 |
If an Address is deleted at the source, it should disappear from both "Address" and "Person_Address" target tables.
If an Address is updated at the source, it should be updated in "Address" target table.
If an Address is inserted at the source, it should be inserted in "Address" and "Person_Address" target tables.
If a Person is deleted at the source, his addresses should disappear from "Address" and "Person_Address" target tables.
If a Person is inserted at the source, his addresses should be inserted to "Address" and "Person_Address" target tables.
How would I include these operations in a Task? I'm asking for CDC and Full load.
Thanks a million.
Does the 'Person_Address' table not appear on the source and is maintained there?
Replication of that 3rd source table should take care of it all.
Replicate strictly does ONE-on-ONE row processing. It is NOT an Informatica or similar tool. You cannot use CDC to fix a broken source DB design.
An action (or the existence for full-load) on one row in one table will create an action on one row the assigned target table for that source table. There is NO option to impact a second row (but you can do source or target lookups on other rows).
Now the target DB might have a TRIGGER which could try to impact other tables , like in your example a target DB trigger could impact 'person address'. Now DELETE would be easy, but INSERTS would be impossible because of lack of context. When an address is created, no person is known yet. But wouldn't the source application also update its 'Person_Address' table and that table would simply also be replicated?
Without the source making recognizable row changes, I get the feeling that you'll end up building an application module on the target OR PERHAPS A VIEW which consolidates your person/address data on a regular (hourly? Daily) basis to maintain, expose, desired new connections which do not come with the raw base tables.
Groetjes,
Hein.
I
Does the 'Person_Address' table not appear on the source and is maintained there?
Replication of that 3rd source table should take care of it all.
Replicate strictly does ONE-on-ONE row processing. It is NOT an Informatica or similar tool. You cannot use CDC to fix a broken source DB design.
An action (or the existence for full-load) on one row in one table will create an action on one row the assigned target table for that source table. There is NO option to impact a second row (but you can do source or target lookups on other rows).
Now the target DB might have a TRIGGER which could try to impact other tables , like in your example a target DB trigger could impact 'person address'. Now DELETE would be easy, but INSERTS would be impossible because of lack of context. When an address is created, no person is known yet. But wouldn't the source application also update its 'Person_Address' table and that table would simply also be replicated?
Without the source making recognizable row changes, I get the feeling that you'll end up building an application module on the target OR PERHAPS A VIEW which consolidates your person/address data on a regular (hourly? Daily) basis to maintain, expose, desired new connections which do not come with the raw base tables.
Groetjes,
Hein.
I
Hi Hein,
thanks for your response. In the source db there's no source table that joins Persons with their Addresses directly. There's a Family table that joins a Person with a Family, and it's the Family that has one or more addresses, not a Person. So in our source database all Persons in the same Family have the same Address. Besides no orphan addresses exist ie addresses without a Person attached.
We are creating a .Net application that will be used alongside our current CRM, Siebel. The latter will be decommissioned once the .Net app is entirely operational, but that will take a few years.
In consequence we will probably be using triggers to perform create, update and delete operations for "Person_Address" as we need changes in the source to be replicated in real time.
Houdoe,
Ole
One more thought. If the structure of the source DB and target DB are significantly different, then you may want to consider change tables instead of, or in addition to, applying changes.
Then you could run a regular job (every minute, hour or day) to consolidate those changes into the target.
The right way to design this depends on expect change rate, width and volume.
Unfortunately you cannot enable change table in a per-table fashion. It's all or nothing. If only a few tables need change tables then you could add a task for those tables.
The kind of data you describe typically doesn't change too often. Perhaps hundreds per day for a large organization, not hundreds per seconds. It may be fine to setup change tables for all tables and use those that are needed, blindly truncating the others or just purging rows older than a few days. That gives the auditing/debugging benefit of being able to look back and see what happened.
Veuillez agréer, monsieur, mes salutations distingué,
Hein.
Thank you.
Yes using the change tables is an interesting option.
Unfortunately the regular job you mentioned won't work out for us as our management wants the source data to be replicated in real time.
>> our management wants the source data to be replicated in real time.
Sure they want that, and I want a pot of gold - every week.
Everyone wants (near) realtime. But at what price? What is the actual business need?
With your help they have to define 'real time' (average, max per day, max per year) and up to you get to tell them the price .(development time and production resources) and how realistic their goal is.
- less than 1 seconds (average) ? Unobtanium.
- less than 10 seconds (average) ? 4 weeks development, 1/4 person all day every day to monitor, 1 CPU all day every day.
- less than a minute? 1 weeks development, 20% of a CPU all day.
- hourly? Neglect able CPU
Of course I don't really have an idea about the actual development cost as that depends on skills and available tools and the runtime resources depend on the server/db platforms and volume of changes to be processed.
Good luck,
Hein