Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
mylimo456
Contributor
Contributor

Data Integration/ETL Process Question

Hey,

I hope this is a relevant question for this community. I'm having trouble understanding what is the best practice or industry standard for this process question.

Let's say I have a Twitter API and connect to it via Python to scrape Twitter data. Is it best to scrape the data in Python, clean it in Python, and then dump it into a database OR is it better to scrape the data in Python, save the results out to flat file and let an ETL tool handle the data transformation and then dumping it to a database?

The first solution would be nice because it could all be done at the time of the API call to the data and then delivered directly to a data base. This method would probably use Pandas.

The second solution would allow for the raw initial data to be stored and to allow a tool specifically for extracting and cleaning the data format. Instead of doing everything in one call this would allow for each piece to be done in bite-sized pieces.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

smart-linksyssmartwifilinksyssmartwifi-netamped-ampedwirelesssetup.net

Labels (2)
0 Replies