<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Process data stored in Azure Data Lake Store with Databricks using Talend in Official Support Articles</title>
    <link>https://community.qlik.com/t5/Official-Support-Articles/Process-data-stored-in-Azure-Data-Lake-Store-with-Databricks/ta-p/2151411</link>
    <description>&lt;P&gt;Azure Data Lake Store Gen1 (ADLS) is a hyper-scale Big Data store. It is common for solutions within the Azure environment to store data within ADLS and process it with other compute resources, such as Azure Databricks. Azure Databricks is a consumption-based Spark managed service that simplifies processing of Big Data and Artificial Intelligence workloads. Processing data with Talend within this environment is a common pattern. Talend 7.1 adds support for executing Jobs in Azure Databricks 3.5LTS. This article explains the process of creating a solution that processes data stored in ADLS with Databricks using Talend.&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Creating a service principal&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Service principals are a means of authenticating within the Azure environment.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a service principal from the Azure Portal by navigating to &lt;STRONG&gt;Azure Active Directory&lt;/STRONG&gt; and selecting &lt;STRONG&gt;App Registrations&lt;/STRONG&gt;. Then select &lt;STRONG&gt;New application registration&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uoq.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124167i568F983A3667141C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uoq.png" alt="0EM5b0000074uoq.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide the name of the registration and the URL. For service principal, the URL is required but not utilized. Select &lt;STRONG&gt;Web app / API&lt;/STRONG&gt; for the &lt;STRONG&gt;Application type&lt;/STRONG&gt;. Click &lt;STRONG&gt;Create&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074upK.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123431i2287F825BFD0A2D0/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074upK.png" alt="0EM5b0000074upK.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Record the resulting information, which includes the Application ID, also known as the Client id. The Client id is used during the configuration of the Talend Job and the creation and configuration of the Databricks cluster. Make sure you save this information, as you'll need it in the future.&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074urB.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124834i51C3151887D26E79/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074urB.png" alt="0EM5b0000074urB.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create Keys&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a key by clicking &lt;STRONG&gt;Settings&lt;/STRONG&gt; to open a blade containing the &lt;STRONG&gt;Keys&lt;/STRONG&gt; option. Enter a description and select an expiration. Click &lt;STRONG&gt;Save&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074urk.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124193iDD50630F40E92C0C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074urk.png" alt="0EM5b0000074urk.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Record the key value. &lt;STRONG&gt;Important&lt;/STRONG&gt;: this is the only opportunity to capture the key value. The key is required to configure the Talend Job and Databricks cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uuA.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124424i8D4174CD10163BF0/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uuA.png" alt="0EM5b0000074uuA.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;API Permission&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Grant the service principal access to the Azure Data Lake API by selecting &lt;STRONG&gt;Required Permissions&lt;/STRONG&gt;, then click &lt;STRONG&gt;Add&lt;/STRONG&gt;. Select &lt;STRONG&gt;Azure Data Lake&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uxJ.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123386iD92AD422BC2DD214/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uxJ.png" alt="0EM5b0000074uxJ.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Grant full access to the ADLS service and click &lt;STRONG&gt;Select&lt;/STRONG&gt;, then &lt;STRONG&gt;Done&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uxT.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121990iA141EE6912AAF9B6/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uxT.png" alt="0EM5b0000074uxT.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;OAUTH 2.0 Token Endpoint&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The last Azure Active Directory data element needed is the OAUTH 2.0 TOKEN ENDPOINT.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Navigate to &lt;STRONG&gt;App Registrations&lt;/STRONG&gt; and select &lt;STRONG&gt;Endpoints&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vE5.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124993iA47BB42BC33F625B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vE5.png" alt="0EM5b0000074vE5.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click the &lt;STRONG&gt;copy&lt;/STRONG&gt; button next to the textbox containing the OAUTH 2.0 TOKEN ENDPOINT. Save the resulting value to use when you configure the Databricks cluster and the Talend Job.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vE0.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122105i1FB8C2757C194AF4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vE0.png" alt="0EM5b0000074vE0.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Review&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;At this point, you should have captured and stored three values for future use, as shown in the examples below. &lt;STRONG&gt;Note&lt;/STRONG&gt;: these values are examples and will not work in your settings.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v0X.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123228i2AC0013F504094ED/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v0X.png" alt="0EM5b0000074v0X.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Creating and configuring Azure Data Lake Store Gen1&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;If an ADLS store does not exist, create one using the Azure Portal.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v0m.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125005i251D6034BE4D9571/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v0m.png" alt="0EM5b0000074v0m.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide a unique &lt;STRONG&gt;Name&lt;/STRONG&gt;, &lt;STRONG&gt;Subscription&lt;/STRONG&gt;, &lt;STRONG&gt;Resource&lt;/STRONG&gt; &lt;STRONG&gt;group&lt;/STRONG&gt;, and &lt;STRONG&gt;Location&lt;/STRONG&gt; for the new ADLS. Click &lt;STRONG&gt;Create&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v16.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122458i034641195296D78B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v16.png" alt="0EM5b0000074v16.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Grant Access&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Grant permission to the previously created service principal, so that it can interact with the ADLS. Navigate to ADLS Data explorer for the appropriate ADLS instance.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v1G.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124072iC1AEA7D47E2BDC07/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v1G.png" alt="0EM5b0000074v1G.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Depending on security requirements, grant the previously created service principal access to the appropriate location within the ADLS, such as a folder. Select &lt;STRONG&gt;Access &amp;gt;&lt;/STRONG&gt; &lt;STRONG&gt;Add&lt;/STRONG&gt;, then search for the service principal using the description you created earlier.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v2n.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121573i73B4A8FA7FEE04D4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v2n.png" alt="0EM5b0000074v2n.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the service principal, then click &lt;STRONG&gt;Select&lt;/STRONG&gt;. Select the appropriate permissions, the permissions scoping, and default or access permission. Once the permissions are selected, click &lt;STRONG&gt;Ok&lt;/STRONG&gt;. Note that the account must have read and executed permissions to all ancestors of that item to access an object at a lower level.&amp;nbsp;For more information on ADLS permissions, see the &lt;A href="#Access control in Azure Data Lake Storage Gen1" target="_self"&gt;Access control in Azure Data Lake Gen 1&lt;/A&gt; article referenced in the &lt;A href="#Resources" target="_self"&gt;Resources&lt;/A&gt; section of this article.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v32.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125156i5D1C1FE79718CA1B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v32.png" alt="0EM5b0000074v32.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Creating and configuring Databricks&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Databricks offers Spark as a managed service. An Azure Databricks service has zero or more clusters. This section discusses the provisioning or identification of a Databricks service instance, and the creation and configuration of a cluster within that service.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Provision or identify a Azure Databricks Service&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;If one does not exist, create an Azure Databricks Service.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEj.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125154i89B67ECA0845AA57/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEj.png" alt="0EM5b0000074vEj.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Ensure, as recommended, that the ADLS instance is in the same location as the Azure Databricks Service. For more information, see the &lt;A href="#Azure Databricks" target="_self"&gt;Azure Databricks&lt;/A&gt; article, referenced in the Resources section of this article.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEo.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124047i10634154A5C6EB20/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEo.png" alt="0EM5b0000074vEo.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Workspace&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Select the newly created Azure Databricks Service or choose an existing one.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click &lt;STRONG&gt;Launch Workspace&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123776iDD1BF23F5661DE9B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEP.png" alt="0EM5b0000074vEP.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create a Cluster&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Databricks utilizes clusters to execute Jobs. Click the &lt;STRONG&gt;Clusters&lt;/STRONG&gt; icon on the left to navigate to the &lt;STRONG&gt;Clusters&lt;/STRONG&gt; section.&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v5m.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124931i664341B82D180C6E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v5m.png" alt="0EM5b0000074v5m.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Initially, a workspace does not have clusters associated with it. Select &lt;STRONG&gt;Create Cluster&lt;/STRONG&gt; to start the creation process.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Assign a name to the cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Standard&lt;/STRONG&gt; as the &lt;STRONG&gt;Cluster Mode&lt;/STRONG&gt; type.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;3.5 LTS&lt;/STRONG&gt;, the version required by Talend 7.1, as the &lt;STRONG&gt;Databricks Runtime Version&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the appropriate sizing of the &lt;STRONG&gt;Driver Type&lt;/STRONG&gt; and &lt;STRONG&gt;Worker Type&lt;/STRONG&gt;, based on the expected workloads.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Auto Termination is appropriate in non-production environments, where cost management is of greater concern than responsiveness. When Auto Termination is enabled, the cluster shuts down after the specified period of inactivity. The default is &lt;STRONG&gt;120&lt;/STRONG&gt;. However, you can adjust it according to your requirements.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v4t.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122624i4201D70F7838D385/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v4t.png" alt="0EM5b0000074v4t.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Spark Configuration&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The Spark Configuration section of the cluster is used to capture information necessary for the Jobs to access ADLS.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Add the items in &lt;STRONG&gt;Table 1&lt;/STRONG&gt; to the &lt;STRONG&gt;Spark Configuration&lt;/STRONG&gt; section, using the previously captured values.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: replace &lt;STRONG&gt;&amp;lt;insert client id here&amp;gt;&lt;/STRONG&gt; with the Application Id, replace &lt;STRONG&gt;&amp;lt;insert client secret key here&amp;gt;&lt;/STRONG&gt; with the key value associated with the Application/Service Principal, and replace &lt;STRONG&gt;&amp;lt;insert url endpoint here&amp;gt;&lt;/STRONG&gt; with the OAUTH 2.0 TOKEN ENDPOINT you captured earlier.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Table 1 - Spark Configuration&lt;/P&gt;
&lt;TABLE width="630"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.serializer org.apache.spark.serializer.KryoSerializer&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.access.token.provider.type ClientCredential&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.databricks.delta.preview.enabled true&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.client.id &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert client id here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.credential &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert client secret key here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.refresh.url &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert url endpoint here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Your configuration section should look like this:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v8g.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124705iF62D807C1CD8AF42/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v8g.png" alt="0EM5b0000074v8g.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Databricks Endpoint&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Talend Studio requires the Databricks cluster endpoint for execution. The URL is typically in the format &lt;STRONG&gt;https://&lt;EM&gt;location&lt;/EM&gt;.azuredatabricks.net&lt;/STRONG&gt;. In this case, it is &lt;A href="https://eastus2.azuredatabricks.net" target="_blank" rel="noopener"&gt;https://eastus2.azuredatabricks.net&lt;/A&gt;. Make a note of this value, as it is needed later.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Cluster Id&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;You can capture the Cluster Id in two ways. One way is by examining the URL. The second, and preferred way, is by looking at the &lt;STRONG&gt;Environment&lt;/STRONG&gt; section of the &lt;STRONG&gt;Spark UI&lt;/STRONG&gt; tab of the cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Search for &lt;STRONG&gt;ClusterId&lt;/STRONG&gt; to locate the value.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v8l.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122167i6697BE5D1FFA64FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v8l.png" alt="0EM5b0000074v8l.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Token&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;To grant Talend Studio permissions to push a Job to the Spark cluster, you must first generate a token in the Databricks workspace.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Click the &lt;STRONG&gt;User&lt;/STRONG&gt; icon on the top left of the Databricks workspace, then select &lt;STRONG&gt;User Settings&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9U.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124027iD1DA906CA40825B3/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9U.png" alt="0EM5b0000074v9U.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click &lt;STRONG&gt;Generate New Token&lt;/STRONG&gt; from the &lt;STRONG&gt;Access Token&lt;/STRONG&gt; tab.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9e.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122889i8CF7A1089773377A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9e.png" alt="0EM5b0000074v9e.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide a comment describing the purpose of the token and a lifetime in days for that token.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9j.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121681i63E78B1BE380081D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9j.png" alt="0EM5b0000074v9j.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Make a note of the generated token.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9o.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124405i85F6145D52FFB75E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9o.png" alt="0EM5b0000074v9o.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Review&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;At this point you should have captured the following information:&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9t.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123252i82051ED48514BC80/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9t.png" alt="0EM5b0000074v9t.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Creating a Job&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Sources for the Job are available in the attached &lt;STRONG&gt;DatabricksADLSTempHumidFile.zip&lt;/STRONG&gt; and &lt;STRONG&gt;TempHumidData.csv&lt;/STRONG&gt; files.&lt;/P&gt;
&lt;P&gt;Talend 7.1 added support for executing Big Data Jobs in Databricks 3.5LTS. For example, a Big Data Batch Job can now target Databricks for execution. The example Job, reads the CSV file, from ADLS containing a timestamp, temperature, humidity, and probe temperature. The Job then computes the average of the temperature and probe temperature and writes the results back to a different location within ADLS.&lt;/P&gt;
&lt;P&gt;The example Talend Job looks like this:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124704i4BD6DA1127EAA1CD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAN.png" alt="0EM5b0000074vAN.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt; component to provide Spark with the authentication information necessary to access ADLS. In this case, copy the previously captured values into the appropriate settings of the component. Again, Client Id corresponds to the Application Id of the service principal. The Client key value is the same value captured during the creation of the key for the service principal.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAX.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123190iBC2AF426E6AC12C7/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAX.png" alt="0EM5b0000074vAX.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;tFileInputDelimited&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component to read the input file from ADLS. Note that ADLS is case sensitive, and that the &lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt; component is being used to define storage.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAc.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122936i7A410CC2D85F4F4F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAc.png" alt="0EM5b0000074vAc.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Supplying a schema for the input file, simplifies later calculations.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAm.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121593i91FA97F8499613A6/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAm.png" alt="0EM5b0000074vAm.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;tMap&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component to compute the average temperature. The AverageTemp expression is a simple average: (row2.AmbientTempF + row2.ProbeTempF ) / 2.0.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAw.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124102i2F699423DECBCEF9/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAw.png" alt="0EM5b0000074vAw.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;tFileOutputDelimited&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tFileOutputDelimited&lt;/STRONG&gt; component to write out the original values, with the newly computed values.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124225i45EAB8357642BB18/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vB6.png" alt="0EM5b0000074vB6.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Configuring a Job in Talend&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;After you create and test the Job in local mode, configure it to execute on a Databricks cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Databricks&lt;/STRONG&gt; from the &lt;STRONG&gt;Distribution&lt;/STRONG&gt; drop-down list. Populate &lt;STRONG&gt;Endpoint&lt;/STRONG&gt;, &lt;STRONG&gt;Cluster ID&lt;/STRONG&gt;, and &lt;STRONG&gt;Token&lt;/STRONG&gt; using the previously captured values. Here the token request is the token generated in the Databricks workspace under &lt;STRONG&gt;User settings&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122001i82405CAAA464B060/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBG.png" alt="0EM5b0000074vBG.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Run the Job and ensure that it completes successfully. Note that on the first run, required JARs are uploaded to the cluster’s file system. This process may take some time, depending on your connection speeds.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBa.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124878i6559AB0FCC909CC4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBa.png" alt="0EM5b0000074vBa.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Under certain circumstances, the Databricks cluster recycles before execution of the Job. Under normal conditions, the cluster returns to a functioning state, and the Job executes.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;View the execution results by using the ADLS data explorer.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBf.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123558i7F76C3CAA96C8D08/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBf.png" alt="0EM5b0000074vBf.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&amp;nbsp;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Related Content&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Using HDFS components to work with Azure Data Lake Store (ADLS)&lt;BR /&gt;&lt;A href="https://help.talend.com/reader/Sm466hmdh~Y~2GehtIo6xw/NPn7PIuX_Dcqib9WP0Zl_g" target="_blank" rel="noopener"&gt;https://help.talend.com/reader/Sm466hmdh~Y~2GehtIo6xw/NPn7PIuX_Dcqib9WP0Zl_g&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Overview of Azure Data Lake Storage Gen1&lt;BR /&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview" target="_blank" rel="noopener"&gt;https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Azure Databricks&lt;BR /&gt;&lt;A href="https://azure.microsoft.com/en-us/services/databricks/" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-us/services/databricks/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Access control in Azure Data Lake Storage Gen1&lt;BR /&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control" target="_blank" rel="noopener"&gt;https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jan 2024 02:35:30 GMT</pubDate>
    <dc:creator>TalendSolutionExpert</dc:creator>
    <dc:date>2024-01-23T02:35:30Z</dc:date>
    <item>
      <title>Process data stored in Azure Data Lake Store with Databricks using Talend</title>
      <link>https://community.qlik.com/t5/Official-Support-Articles/Process-data-stored-in-Azure-Data-Lake-Store-with-Databricks/ta-p/2151411</link>
      <description>&lt;P&gt;Azure Data Lake Store Gen1 (ADLS) is a hyper-scale Big Data store. It is common for solutions within the Azure environment to store data within ADLS and process it with other compute resources, such as Azure Databricks. Azure Databricks is a consumption-based Spark managed service that simplifies processing of Big Data and Artificial Intelligence workloads. Processing data with Talend within this environment is a common pattern. Talend 7.1 adds support for executing Jobs in Azure Databricks 3.5LTS. This article explains the process of creating a solution that processes data stored in ADLS with Databricks using Talend.&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Creating a service principal&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Service principals are a means of authenticating within the Azure environment.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a service principal from the Azure Portal by navigating to &lt;STRONG&gt;Azure Active Directory&lt;/STRONG&gt; and selecting &lt;STRONG&gt;App Registrations&lt;/STRONG&gt;. Then select &lt;STRONG&gt;New application registration&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uoq.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124167i568F983A3667141C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uoq.png" alt="0EM5b0000074uoq.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide the name of the registration and the URL. For service principal, the URL is required but not utilized. Select &lt;STRONG&gt;Web app / API&lt;/STRONG&gt; for the &lt;STRONG&gt;Application type&lt;/STRONG&gt;. Click &lt;STRONG&gt;Create&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074upK.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123431i2287F825BFD0A2D0/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074upK.png" alt="0EM5b0000074upK.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Record the resulting information, which includes the Application ID, also known as the Client id. The Client id is used during the configuration of the Talend Job and the creation and configuration of the Databricks cluster. Make sure you save this information, as you'll need it in the future.&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074urB.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124834i51C3151887D26E79/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074urB.png" alt="0EM5b0000074urB.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create Keys&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Create a key by clicking &lt;STRONG&gt;Settings&lt;/STRONG&gt; to open a blade containing the &lt;STRONG&gt;Keys&lt;/STRONG&gt; option. Enter a description and select an expiration. Click &lt;STRONG&gt;Save&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074urk.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124193iDD50630F40E92C0C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074urk.png" alt="0EM5b0000074urk.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Record the key value. &lt;STRONG&gt;Important&lt;/STRONG&gt;: this is the only opportunity to capture the key value. The key is required to configure the Talend Job and Databricks cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uuA.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124424i8D4174CD10163BF0/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uuA.png" alt="0EM5b0000074uuA.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;API Permission&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Grant the service principal access to the Azure Data Lake API by selecting &lt;STRONG&gt;Required Permissions&lt;/STRONG&gt;, then click &lt;STRONG&gt;Add&lt;/STRONG&gt;. Select &lt;STRONG&gt;Azure Data Lake&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uxJ.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123386iD92AD422BC2DD214/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uxJ.png" alt="0EM5b0000074uxJ.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Grant full access to the ADLS service and click &lt;STRONG&gt;Select&lt;/STRONG&gt;, then &lt;STRONG&gt;Done&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074uxT.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121990iA141EE6912AAF9B6/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074uxT.png" alt="0EM5b0000074uxT.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;OAUTH 2.0 Token Endpoint&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The last Azure Active Directory data element needed is the OAUTH 2.0 TOKEN ENDPOINT.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Navigate to &lt;STRONG&gt;App Registrations&lt;/STRONG&gt; and select &lt;STRONG&gt;Endpoints&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vE5.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124993iA47BB42BC33F625B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vE5.png" alt="0EM5b0000074vE5.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click the &lt;STRONG&gt;copy&lt;/STRONG&gt; button next to the textbox containing the OAUTH 2.0 TOKEN ENDPOINT. Save the resulting value to use when you configure the Databricks cluster and the Talend Job.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vE0.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122105i1FB8C2757C194AF4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vE0.png" alt="0EM5b0000074vE0.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Review&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;At this point, you should have captured and stored three values for future use, as shown in the examples below. &lt;STRONG&gt;Note&lt;/STRONG&gt;: these values are examples and will not work in your settings.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v0X.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123228i2AC0013F504094ED/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v0X.png" alt="0EM5b0000074v0X.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Creating and configuring Azure Data Lake Store Gen1&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;If an ADLS store does not exist, create one using the Azure Portal.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v0m.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125005i251D6034BE4D9571/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v0m.png" alt="0EM5b0000074v0m.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide a unique &lt;STRONG&gt;Name&lt;/STRONG&gt;, &lt;STRONG&gt;Subscription&lt;/STRONG&gt;, &lt;STRONG&gt;Resource&lt;/STRONG&gt; &lt;STRONG&gt;group&lt;/STRONG&gt;, and &lt;STRONG&gt;Location&lt;/STRONG&gt; for the new ADLS. Click &lt;STRONG&gt;Create&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v16.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122458i034641195296D78B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v16.png" alt="0EM5b0000074v16.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Grant Access&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Grant permission to the previously created service principal, so that it can interact with the ADLS. Navigate to ADLS Data explorer for the appropriate ADLS instance.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v1G.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124072iC1AEA7D47E2BDC07/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v1G.png" alt="0EM5b0000074v1G.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Depending on security requirements, grant the previously created service principal access to the appropriate location within the ADLS, such as a folder. Select &lt;STRONG&gt;Access &amp;gt;&lt;/STRONG&gt; &lt;STRONG&gt;Add&lt;/STRONG&gt;, then search for the service principal using the description you created earlier.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v2n.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121573i73B4A8FA7FEE04D4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v2n.png" alt="0EM5b0000074v2n.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the service principal, then click &lt;STRONG&gt;Select&lt;/STRONG&gt;. Select the appropriate permissions, the permissions scoping, and default or access permission. Once the permissions are selected, click &lt;STRONG&gt;Ok&lt;/STRONG&gt;. Note that the account must have read and executed permissions to all ancestors of that item to access an object at a lower level.&amp;nbsp;For more information on ADLS permissions, see the &lt;A href="#Access control in Azure Data Lake Storage Gen1" target="_self"&gt;Access control in Azure Data Lake Gen 1&lt;/A&gt; article referenced in the &lt;A href="#Resources" target="_self"&gt;Resources&lt;/A&gt; section of this article.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v32.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125156i5D1C1FE79718CA1B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v32.png" alt="0EM5b0000074v32.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Creating and configuring Databricks&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Databricks offers Spark as a managed service. An Azure Databricks service has zero or more clusters. This section discusses the provisioning or identification of a Databricks service instance, and the creation and configuration of a cluster within that service.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Provision or identify a Azure Databricks Service&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;If one does not exist, create an Azure Databricks Service.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEj.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/125154i89B67ECA0845AA57/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEj.png" alt="0EM5b0000074vEj.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Ensure, as recommended, that the ADLS instance is in the same location as the Azure Databricks Service. For more information, see the &lt;A href="#Azure Databricks" target="_self"&gt;Azure Databricks&lt;/A&gt; article, referenced in the Resources section of this article.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEo.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124047i10634154A5C6EB20/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEo.png" alt="0EM5b0000074vEo.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Workspace&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Select the newly created Azure Databricks Service or choose an existing one.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click &lt;STRONG&gt;Launch Workspace&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vEP.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123776iDD1BF23F5661DE9B/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vEP.png" alt="0EM5b0000074vEP.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Create a Cluster&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Databricks utilizes clusters to execute Jobs. Click the &lt;STRONG&gt;Clusters&lt;/STRONG&gt; icon on the left to navigate to the &lt;STRONG&gt;Clusters&lt;/STRONG&gt; section.&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v5m.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124931i664341B82D180C6E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v5m.png" alt="0EM5b0000074v5m.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Initially, a workspace does not have clusters associated with it. Select &lt;STRONG&gt;Create Cluster&lt;/STRONG&gt; to start the creation process.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Assign a name to the cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Standard&lt;/STRONG&gt; as the &lt;STRONG&gt;Cluster Mode&lt;/STRONG&gt; type.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;3.5 LTS&lt;/STRONG&gt;, the version required by Talend 7.1, as the &lt;STRONG&gt;Databricks Runtime Version&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select the appropriate sizing of the &lt;STRONG&gt;Driver Type&lt;/STRONG&gt; and &lt;STRONG&gt;Worker Type&lt;/STRONG&gt;, based on the expected workloads.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Auto Termination is appropriate in non-production environments, where cost management is of greater concern than responsiveness. When Auto Termination is enabled, the cluster shuts down after the specified period of inactivity. The default is &lt;STRONG&gt;120&lt;/STRONG&gt;. However, you can adjust it according to your requirements.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v4t.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122624i4201D70F7838D385/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v4t.png" alt="0EM5b0000074v4t.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Spark Configuration&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The Spark Configuration section of the cluster is used to capture information necessary for the Jobs to access ADLS.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Add the items in &lt;STRONG&gt;Table 1&lt;/STRONG&gt; to the &lt;STRONG&gt;Spark Configuration&lt;/STRONG&gt; section, using the previously captured values.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: replace &lt;STRONG&gt;&amp;lt;insert client id here&amp;gt;&lt;/STRONG&gt; with the Application Id, replace &lt;STRONG&gt;&amp;lt;insert client secret key here&amp;gt;&lt;/STRONG&gt; with the key value associated with the Application/Service Principal, and replace &lt;STRONG&gt;&amp;lt;insert url endpoint here&amp;gt;&lt;/STRONG&gt; with the OAUTH 2.0 TOKEN ENDPOINT you captured earlier.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Table 1 - Spark Configuration&lt;/P&gt;
&lt;TABLE width="630"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.serializer org.apache.spark.serializer.KryoSerializer&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.access.token.provider.type ClientCredential&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.databricks.delta.preview.enabled true&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.client.id &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert client id here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.credential &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert client secret key here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD colspan="1" rowspan="1" width="630"&gt;
&lt;P&gt;spark.hadoop.dfs.adls.oauth2.refresh.url &lt;EM&gt;&lt;STRONG&gt;&amp;lt;insert url endpoint here&amp;gt;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Your configuration section should look like this:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v8g.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124705iF62D807C1CD8AF42/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v8g.png" alt="0EM5b0000074v8g.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Databricks Endpoint&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Talend Studio requires the Databricks cluster endpoint for execution. The URL is typically in the format &lt;STRONG&gt;https://&lt;EM&gt;location&lt;/EM&gt;.azuredatabricks.net&lt;/STRONG&gt;. In this case, it is &lt;A href="https://eastus2.azuredatabricks.net" target="_blank" rel="noopener"&gt;https://eastus2.azuredatabricks.net&lt;/A&gt;. Make a note of this value, as it is needed later.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Cluster Id&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;You can capture the Cluster Id in two ways. One way is by examining the URL. The second, and preferred way, is by looking at the &lt;STRONG&gt;Environment&lt;/STRONG&gt; section of the &lt;STRONG&gt;Spark UI&lt;/STRONG&gt; tab of the cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Search for &lt;STRONG&gt;ClusterId&lt;/STRONG&gt; to locate the value.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v8l.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122167i6697BE5D1FFA64FD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v8l.png" alt="0EM5b0000074v8l.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Token&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;To grant Talend Studio permissions to push a Job to the Spark cluster, you must first generate a token in the Databricks workspace.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;Click the &lt;STRONG&gt;User&lt;/STRONG&gt; icon on the top left of the Databricks workspace, then select &lt;STRONG&gt;User Settings&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9U.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124027iD1DA906CA40825B3/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9U.png" alt="0EM5b0000074v9U.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Click &lt;STRONG&gt;Generate New Token&lt;/STRONG&gt; from the &lt;STRONG&gt;Access Token&lt;/STRONG&gt; tab.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9e.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122889i8CF7A1089773377A/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9e.png" alt="0EM5b0000074v9e.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Provide a comment describing the purpose of the token and a lifetime in days for that token.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9j.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121681i63E78B1BE380081D/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9j.png" alt="0EM5b0000074v9j.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Make a note of the generated token.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9o.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124405i85F6145D52FFB75E/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9o.png" alt="0EM5b0000074v9o.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;Review&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;At this point you should have captured the following information:&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074v9t.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123252i82051ED48514BC80/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074v9t.png" alt="0EM5b0000074v9t.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Creating a Job&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Sources for the Job are available in the attached &lt;STRONG&gt;DatabricksADLSTempHumidFile.zip&lt;/STRONG&gt; and &lt;STRONG&gt;TempHumidData.csv&lt;/STRONG&gt; files.&lt;/P&gt;
&lt;P&gt;Talend 7.1 added support for executing Big Data Jobs in Databricks 3.5LTS. For example, a Big Data Batch Job can now target Databricks for execution. The example Job, reads the CSV file, from ADLS containing a timestamp, temperature, humidity, and probe temperature. The Job then computes the average of the temperature and probe temperature and writes the results back to a different location within ADLS.&lt;/P&gt;
&lt;P&gt;The example Talend Job looks like this:&lt;/P&gt;
&lt;P class="lia-indent-padding-left-30px"&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124704i4BD6DA1127EAA1CD/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAN.png" alt="0EM5b0000074vAN.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt; component to provide Spark with the authentication information necessary to access ADLS. In this case, copy the previously captured values into the appropriate settings of the component. Again, Client Id corresponds to the Application Id of the service principal. The Client key value is the same value captured during the creation of the key for the service principal.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAX.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123190iBC2AF426E6AC12C7/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAX.png" alt="0EM5b0000074vAX.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;tFileInputDelimited&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tFileInputDelimited&lt;/STRONG&gt; component to read the input file from ADLS. Note that ADLS is case sensitive, and that the &lt;STRONG&gt;tAzureFSConfiguration&lt;/STRONG&gt; component is being used to define storage.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAc.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122936i7A410CC2D85F4F4F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAc.png" alt="0EM5b0000074vAc.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Supplying a schema for the input file, simplifies later calculations.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAm.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/121593i91FA97F8499613A6/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAm.png" alt="0EM5b0000074vAm.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&lt;FONT color="#339966"&gt;tMap&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tMap&lt;/STRONG&gt; component to compute the average temperature. The AverageTemp expression is a simple average: (row2.AmbientTempF + row2.ProbeTempF ) / 2.0.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vAw.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124102i2F699423DECBCEF9/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vAw.png" alt="0EM5b0000074vAw.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;tFileOutputDelimited&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H4&gt;
&lt;P&gt;Use the &lt;STRONG&gt;tFileOutputDelimited&lt;/STRONG&gt; component to write out the original values, with the newly computed values.&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124225i45EAB8357642BB18/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vB6.png" alt="0EM5b0000074vB6.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Configuring a Job in Talend&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;After you create and test the Job in local mode, configure it to execute on a Databricks cluster.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Select &lt;STRONG&gt;Databricks&lt;/STRONG&gt; from the &lt;STRONG&gt;Distribution&lt;/STRONG&gt; drop-down list. Populate &lt;STRONG&gt;Endpoint&lt;/STRONG&gt;, &lt;STRONG&gt;Cluster ID&lt;/STRONG&gt;, and &lt;STRONG&gt;Token&lt;/STRONG&gt; using the previously captured values. Here the token request is the token generated in the Databricks workspace under &lt;STRONG&gt;User settings&lt;/STRONG&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBG.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/122001i82405CAAA464B060/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBG.png" alt="0EM5b0000074vBG.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Run the Job and ensure that it completes successfully. Note that on the first run, required JARs are uploaded to the cluster’s file system. This process may take some time, depending on your connection speeds.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBa.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/124878i6559AB0FCC909CC4/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBa.png" alt="0EM5b0000074vBa.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Under certain circumstances, the Databricks cluster recycles before execution of the Job. Under normal conditions, the cluster returns to a functioning state, and the Job executes.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;View the execution results by using the ADLS data explorer.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="0EM5b0000074vBf.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/123558i7F76C3CAA96C8D08/image-size/large?v=v2&amp;amp;px=999" role="button" title="0EM5b0000074vBf.png" alt="0EM5b0000074vBf.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3&gt;&amp;nbsp;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Related Content&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/H3&gt;
&lt;P&gt;Using HDFS components to work with Azure Data Lake Store (ADLS)&lt;BR /&gt;&lt;A href="https://help.talend.com/reader/Sm466hmdh~Y~2GehtIo6xw/NPn7PIuX_Dcqib9WP0Zl_g" target="_blank" rel="noopener"&gt;https://help.talend.com/reader/Sm466hmdh~Y~2GehtIo6xw/NPn7PIuX_Dcqib9WP0Zl_g&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Overview of Azure Data Lake Storage Gen1&lt;BR /&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview" target="_blank" rel="noopener"&gt;https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Azure Databricks&lt;BR /&gt;&lt;A href="https://azure.microsoft.com/en-us/services/databricks/" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-us/services/databricks/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Access control in Azure Data Lake Storage Gen1&lt;BR /&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control" target="_blank" rel="noopener"&gt;https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 02:35:30 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Official-Support-Articles/Process-data-stored-in-Azure-Data-Lake-Store-with-Databricks/ta-p/2151411</guid>
      <dc:creator>TalendSolutionExpert</dc:creator>
      <dc:date>2024-01-23T02:35:30Z</dc:date>
    </item>
  </channel>
</rss>

