Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
risklessbegyy
Contributor
Contributor

Advice on building a data sync/ETL service with Python?

I'm going to be in charge of building an integration service at my new job. My experience is in Python, I love the language, and I'll probably get the green light to do the project in Python despite the company being almost exclusively a PHP shop. I might be out of my league, so I'm polling to see if anyone has done something similar and has any advice for structuring it. We have a SaaS product with about 1000 clients. The basic problem to solve is that we have client data in lots of places and it needs to be synchronized between sources like:

  • Our app: separate instances per client of our PHP web app, we'd want to pull data like what features they're using or ignoring, etc.

  • Sales management system

  • Client success management system (our installations require direct support from our team of implementation managers, and they track client status here)

  • Helpdesk/ticketing system (client-facing)

  • Existing data warehouse collecting summary data for all app instances

  • Support dashboard system

  • Server health and log monitoring platform

  • Google Apps analytics setup (spreadsheets)

The general architecture would probably be something like this:

  • A web app framework (Django? Flask?) running the whole app

  • A data warehouse (we use PostgreSQL) housing aggregated data from all the different sources about each client

  • Another section of the database housing information about the syncing service itself

  • An extensible model for managing integration jobs

  • Integration modules for loading and sending data to and from each source (I'm thinking separate "pull" and "push" integrations for each source, possibly more if there are separate data types from the same source)

  • A workflow/trigger model for kicking off given processes based on other conditions (like spinning up new instances and seeding initial data when a sale is finalized)

  • A task queue (probably Celery) for handling the actual extracting and loading asynchronously

  • REST APIs allowing various services to trigger actions when they have or need updated data

  • A web GUI allowing me and other developers to view job logs, check the contents of the data store and manage scheduled jobs (this can come later)

  • Authentication (probably handled by the framework or integration with our centralized auth service or Google SSO)

  • Analytics: it seems natural that this might expand into a BI platform someday.

So the scope is getting pretty big. I will either learn an enormous amount about almost everything (probable) or fall flat on my face (possible). I have a bit of familiarity with almost all of the above (as in, small side projects), and I can picture how it would all fit together. I think I can make it work, and my company is very patient and supportive. There are also lots of friendly senior developers working around me, with experience in web app architecture, who would definitely be good resources (and would be happy to share advice). But I'm really quite a junior developer, and all those senior devs are PHP developers (so, knowledgeable but probably not able to take over my keyboard or advise on particular libraries or syntax). And I can't shake the feeling that solving this problem using Django, Celery, Pandas and sweat might lead to reinventing the wheel. Building and maintaining this will be my entire job, but it still feels like building it from scratch might be boiling the ocean when this has to be a problem that thousands of companies have already solved. Doing it the long way might be a huge waste of energy. Any particular advice about how to structure a project like this, or resources I should be aware of? Any solid, mature ETL libraries or integration frameworks that excel at this kind of work?

(edit: formatting)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

lin-ksyssmartwifi.com Linksyssmartwifi.com router setup

Labels (1)
0 Replies