Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Balax
Contributor
Contributor

Data Lake and Primary key constraints

Data catalyst allows to import or specify key constraints in properties for an entity. When installed on a Hadoop edge node in AWS , what purpose / benefit  other than data lineage would those serve. There are after all , no PK,s in Hadoop as such.

Is there any performance benefit when preparing data?

Will this enforce referential integrity?

I couldn't find a straight answer elsewhere so I thought I bother the forum with this. 

 

Labels (1)
1 Reply
bagga3
Contributor
Contributor

One of the reasons is that these constraints are already supposed to be enforced in OLTP which is the source of OLAP data most of the time so adding this check again in the warehouse would introduce unnecessary overhead.

Another reason I can think of is that the data is not always loaded in warehouse tables in a synchronous manner to reduce the execution time of pipelines. So checking if the primary record exists for the fk column in the current table being loaded would never succeed (again based on the assumption that OLTP already has the correct data).