Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
MikeBender27
Contributor
Contributor

Loading from Azure Data Lake - CDM Formatted files

Microsoft D365 CRM and F&O Applications now support automatic export to data lake as a preview feature and this will soon be the primary way to extract data from these business applications. When installed, tables from the application are placed in CSV files in folders in an Azure data lake (gen2) container. In most cases, a folder has multiple CSV files that must be appended together to make up the entire file. The metadata for the tables is stored in .json files that define the column headers and data types for each column in the CSV files. Microsoft calls this their "CDM" format.

Has anyone successfully used Talend Studio to extract data from a data lake that is in this new CDM format. I can use the tAzureAdlsGen2Input component to read the CSV files, but I have not found a way to create a Schema from the .json files. Talend studio requires you to create a fake schema on the input csv file with column headers "field0, field1, field2, etc" then you have to manually define the names and data types for each column in a tmap. NOT a very elegant solution. Is there a way that these .json files could be read in and turned into a stored Talend schema?

It is also possible to use an Azure Synapse Serverless Workspace on top of the data lake then query the datalake through Synapse using the tAzureSynapseInput component but this can be expensive since everything goes through Synapse.

Any better ideas on how to access these "CDM" data lakes?

Labels (6)
0 Replies