Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
pedrocevil4
Contributor
Contributor

Data Lakes and Pipelines

I see a lot of job postings asking for experience creating, maintaining, and modifying data lakes and pipelines.

I've also heard a LOT about pipelines, and a little bit about lakes (mostly in passing), but I don't think I've ever seen someone actually talk about them in depth. What are they exactly? How do you build them? Do you need something like Spark or Hadoop? What about Azure? Where can you go to learn this power?!

If anyone here works with these concepts, can you shed some light on this topic please? And perhaps drop some resources to study? Please?

Labels (1)
1 Solution

Accepted Solutions
Paul-ADA-UK
Former Employee
Former Employee

Think of Spark and Hadoop at the compute engine used to load and process your data. Hadoop however is a platform which includes an larger ecosystem of Apache products and storage.

Originally, the section of Hadoop which you used for compute operations was the Map Reduce engine. However Map Reduce has only 2 API's which are Map And Reduce. What this means is that every interaction and calculation on your data has to be converted to either a 'map' or 'reduce' action which limits the speed of Hadoop and is one of the reasons (there are a few) as to why Spark is quicker. Spark has 80+ API's and does not convert everything into a 'map' or 'reduce' action.

Think of a data lake as a place where you load your data files. Data can be structured, semi-structured and  unstructured. Structured files include formats such as json, csv, parquet. You can then use Spark to load and work with your data and apply different structures to suit your needs.

Qlik Data Integration provides a developer friendly platform for loading your lake and provisioning your data in your lake in real-time. 

View solution in original post

7 Replies
Jabjab1212
Contributor
Contributor

You posted in the mobile routers page. But here's what I'd recommend.       Subway surfers mod apk

Amar18120
Contributor
Contributor

I see a lot of job postings asking for experience creating, maintaining, and modifying data lakes and pipelines.

sapacademy2007
Contributor III
Contributor III

Any help towards azure and aws you can see my following link hope that helps 🙂

http://www.traininginchennai.co.in/aws-training-in-chennai/

My best wishes and thanks my best wishes too Cheers 🙂

Paul-ADA-UK
Former Employee
Former Employee

Think of Spark and Hadoop at the compute engine used to load and process your data. Hadoop however is a platform which includes an larger ecosystem of Apache products and storage.

Originally, the section of Hadoop which you used for compute operations was the Map Reduce engine. However Map Reduce has only 2 API's which are Map And Reduce. What this means is that every interaction and calculation on your data has to be converted to either a 'map' or 'reduce' action which limits the speed of Hadoop and is one of the reasons (there are a few) as to why Spark is quicker. Spark has 80+ API's and does not convert everything into a 'map' or 'reduce' action.

Think of a data lake as a place where you load your data files. Data can be structured, semi-structured and  unstructured. Structured files include formats such as json, csv, parquet. You can then use Spark to load and work with your data and apply different structures to suit your needs.

Qlik Data Integration provides a developer friendly platform for loading your lake and provisioning your data in your lake in real-time. 

davidsmith
Contributor
Contributor

simplifying data analytics pipelines can significantly improve the productivity of data engineering teams, making it easier to manage projects, and freeing up time to focus on use cases and data analytic applications. A consolidated data lake streamlines data analytics pipeline development and management, enabling data engineering teams to rapidly prototype, test and launch analytics and AI projects without having to deal with migrating, securing and managing large volumes of data.

cine hub

markwood45
Contributor
Contributor

Data pipeline is a slightly more generic term. It refers to any set of processing elements that move data from one system to another, possibly transforming the data along the way.

http://syncler.me/
http://filmplus.vip/
https://www.livenettv.vip/
https://www.typhoontv.me/
https://cinemahdapk.me/
http://mediaboxhd.cc/

 

bajiraosingham1
Contributor
Contributor

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

cinehub.me