Hello All,
we are going to launch new phase and extract data from 350-500 data source like (oracle -mssql-impala-Greenplum) and many different data sources,
what is the best approach/design to perform this phase to extract data with sysdate -1 from mentioned sources and insert it into parquet file and then move the file to HDFS
i have a fear of having memory issue on the server that jobs will run on it , what should i consider and have a logging for all of these tables/jobs ?
can anyone help me to sort out this ?