<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Customer Data Cleansing including data pre-processing/standardization in Data Quality</title>
    <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274128#M2944</link>
    <description>&lt;P&gt;Thanks very much for all your responses.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 22 May 2017 12:25:33 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2017-05-22T12:25:33Z</dc:date>
    <item>
      <title>Customer Data Cleansing including data pre-processing/standardization</title>
      <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274124#M2940</link>
      <description>&lt;P&gt;Apologies if it sounds a stupid question. I have been relentlessly searching to get a high level answer. I am working on Customer Personal Identifier/Information. Over a period of years Customer information like Name/Address/Email/Phone....has not been standardized/cleansed.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am looking to first&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;1) Standardize data i.e. remove any white space characters...ensure email address is correct and so on&lt;/P&gt; 
&lt;P&gt;2) There after I need to de-duplicate data but based on some algorithm&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;UL&gt; 
 &lt;LI&gt;&lt;SPAN&gt;Same Address or Payment Card: 0.3&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;&lt;SPAN&gt;Last Name SoundEx: 0.25&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;&lt;SPAN&gt;First Name SoundEx: 0.1&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;&lt;SPAN&gt;Title: 0.05&lt;/SPAN&gt;&lt;/LI&gt; 
 &lt;LI&gt;&lt;SPAN&gt;Email, Telephone, or Visitor Id : 0.3&lt;/SPAN&gt;&lt;/LI&gt; 
&lt;/UL&gt; 
&lt;P&gt;&lt;SPAN&gt;define upper and lower threshold...&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;The total of above adds to one/1&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;So I need to check row by row every record with all other records and come up with&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;TABLE&gt; 
 &lt;TBODY&gt; 
  &lt;TR&gt; 
   &lt;TD&gt;Customer Key1&lt;/TD&gt; 
   &lt;TD&gt;Cusomer Key2&lt;/TD&gt; 
   &lt;TD&gt;Match Type&lt;/TD&gt; 
  &lt;/TR&gt; 
  &lt;TR&gt; 
   &lt;TD&gt;A&lt;/TD&gt; 
   &lt;TD&gt;B&lt;/TD&gt; 
   &lt;TD&gt;Same Customer&lt;/TD&gt; 
  &lt;/TR&gt; 
  &lt;TR&gt; 
   &lt;TD&gt;A&lt;/TD&gt; 
   &lt;TD&gt;C&lt;/TD&gt; 
   &lt;TD&gt;Same Customer&lt;/TD&gt; 
  &lt;/TR&gt; 
  &lt;TR&gt; 
   &lt;TD&gt;D&lt;/TD&gt; 
   &lt;TD&gt;&amp;nbsp;&lt;/TD&gt; 
   &lt;TD&gt;NoMatch&lt;/TD&gt; 
  &lt;/TR&gt; 
 &lt;/TBODY&gt; 
&lt;/TABLE&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;In above case for e.g. A and B are 2 separate records but they have same payment card information and same address&amp;nbsp;and same email so Same customer&lt;/P&gt; 
&lt;P&gt;record A and C also match as they have same First name Last Name and address....after that within this i will create a Golden record&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I can see and have tried Talend for Data Quality does Data Profiling only.....not actual transformation. This gives you stats on how good or bad your data is....&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I have seen Talend for Data Preparation..here I can load a file apply my basic preparations i.e. remove white spaces...etc..and use this preparation in a job.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;My fundamental question was where can use a component where I can define my weight and match (threshold) and then decide which ones are my Customer Golden records???&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I seem to have got lost.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I am looking to standardise/cleanse/merge &amp;nbsp;to a golden customer record.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Any pointers will be greatly appreciated.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Please can you refer to this video&lt;/P&gt; 
&lt;P&gt;&lt;A href="https://youtu.be/sozxWzAXLBM?list=PLZrVWXgbuqT5OEM_QwwgopJHlUHAZzp2i&amp;amp;t=1477" target="_blank" rel="nofollow noopener noreferrer"&gt;https://youtu.be/sozxWzAXLBM?list=PLZrVWXgbuqT5OEM_QwwgopJHlUHAZzp2i&amp;amp;t=1477&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;here in this step through talend the key value match is given weights.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Thanks&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 20 May 2017 21:38:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274124#M2940</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-05-20T21:38:33Z</dc:date>
    </item>
    <item>
      <title>Re: Customer Data Cleansing including data pre-processing/standardization</title>
      <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274125#M2941</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;What you see in the video are the Data Quality components which can be leveraged in a Talend job an (namely &lt;A href="https://help.talend.com/#/reader/KxVIhxtXBBFymmkkWJ~O4Q/rq9_mEH64nfnizte_E4Axg" target="_self" rel="nofollow noopener noreferrer"&gt;tMatchGroup&lt;/A&gt; here), which address your deduplication use case. These components are only available in the commercial version of Talend Data Quality, not in Talend Open Studio for Data Quality. See the feature matrix in &lt;A href="https://www.talend.com/products/data-quality" target="_blank" rel="nofollow noopener noreferrer"&gt;https://www.talend.com/products/data-quality&lt;/A&gt; for more details.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Let me know if you need additional details.&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Regards,&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Gwendal&lt;/P&gt;</description>
      <pubDate>Mon, 22 May 2017 08:12:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274125#M2941</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-05-22T08:12:40Z</dc:date>
    </item>
    <item>
      <title>Re: Customer Data Cleansing including data pre-processing/standardization</title>
      <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274126#M2942</link>
      <description>&lt;P&gt;Thanks for your reply.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now I understand, the component label has been renamed in the demo... We have licensed version of Talend Open Studio for Big Data. I can see the palette does have all the required Data Quality components, would be great if you can please re-confirm the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Many Thanks for your quick reply.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;&lt;A href="https://community.qlik.com/legacyfs/online/tlnd_dw_files/0683p000009Lr7v"&gt;TalendDQ Palette.png&lt;/A&gt;</description>
      <pubDate>Mon, 22 May 2017 10:54:31 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274126#M2942</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-05-22T10:54:31Z</dc:date>
    </item>
    <item>
      <title>Re: Customer Data Cleansing including data pre-processing/standardization</title>
      <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274127#M2943</link>
      <description>Hi Ashish&lt;BR /&gt;&lt;BR /&gt;Talend Open Studio for Data Quality  is the open source free studio, it does not contain the cleansing components you're looking for such as tMatchgroup.&lt;BR /&gt;If you can find tMatchgroup in your palette, then you're on a Subscription-based product.&lt;BR /&gt;HTH&lt;BR /&gt;Elisa</description>
      <pubDate>Mon, 22 May 2017 12:21:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274127#M2943</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-05-22T12:21:03Z</dc:date>
    </item>
    <item>
      <title>Re: Customer Data Cleansing including data pre-processing/standardization</title>
      <link>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274128#M2944</link>
      <description>&lt;P&gt;Thanks very much for all your responses.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 May 2017 12:25:33 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Data-Quality/Customer-Data-Cleansing-including-data-pre-processing/m-p/2274128#M2944</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2017-05-22T12:25:33Z</dc:date>
    </item>
  </channel>
</rss>

