<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CRC duplicate value when exporting Date-Host-URL from Google Analytics in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321317#M91356</link>
    <description>The way you describe the build of the CRC checksum, I have also no idea whats wrong with this approach. 
&lt;BR /&gt;But, why do you build a checksum? 
&lt;BR /&gt;I giess the duplicate error is caused by an unique constraint in the database? 
&lt;BR /&gt;Next: I would not build 5 very similar jobs, I would build one job which gets the URL (or host, as well as the profile Id) as context parameter and I would start 5 instances of this job with different values for the url. This way you avoid copy&amp;amp;paste errors. 
&lt;BR /&gt;To detect the way how the wrong CRC value appears, you could use the Trace Mode and inspect the values of all flows. 
&lt;BR /&gt;The very last method could be Java Debugging.</description>
    <pubDate>Wed, 11 Jun 2014 21:02:38 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2014-06-11T21:02:38Z</dc:date>
    <item>
      <title>CRC duplicate value when exporting Date-Host-URL from Google Analytics</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321314#M91353</link>
      <description>I'm exporting data from Google Analytics into MySQL db. I have 5 different websites' data to import. 
&lt;BR /&gt;I have 3 columns: 
&lt;BR /&gt;date - string (yyyyMMdd) 
&lt;BR /&gt;host - string - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; or 
&lt;A href="http://www.website2.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website2.com&lt;/A&gt; etc. (constant value in tMap -&amp;gt; different for each job, constant under the job) 
&lt;BR /&gt;URL - string 
&lt;BR /&gt;Basic logic: 
&lt;BR /&gt;1) 2014-06-10 - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; - "/" is different from 2014-06-11 - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; - "/" 
&lt;BR /&gt;2) 2014-06-10 - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; - "/" is different from 2014-06-10 - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; - "/1" 
&lt;BR /&gt;3) 2014-06-10 - 
&lt;A href="http://www.website1.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website1.com&lt;/A&gt; - "/" is different from 2014-06-10 - 
&lt;A href="http://www.website2.com" target="_blank" rel="nofollow noopener noreferrer"&gt;www.website2.com&lt;/A&gt; - "/" 
&lt;BR /&gt;I assume that there is no way how rows can be duplicate... 
&lt;BR /&gt;Lenght of CRC is set to 255 (it generates 8-10 integer numbers) in database it is set to BigINT. 
&lt;BR /&gt;1) What might cause this problem? 
&lt;BR /&gt;2) If I'm wrong about uniqueness of my rows, how can I save duplicate values to check them after the job is complete? (although I can't understand how it is possible since "HOST" variable is set as constant inside tMap and is 100% different from whatever I had in previous job runs) 
&lt;BR /&gt;Thanks, 
&lt;BR /&gt;Ivan 
&lt;BR /&gt;P.S. Any idea why on the long daterange my job ends at 1000000 rows? 
&lt;BR /&gt;P.S.S. Same issue was reported here: 
&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCpKgCAK" target="_blank" rel="nofollow noopener noreferrer"&gt;https://community.talend.com/t5/Design-and-Development/Talend-4-2-3-taddCRCrow-same-CRC-value-for-2-different-data-set/td-p/101332&lt;/A&gt; 
&lt;BR /&gt; 
&lt;BR /&gt;UPDATE: I've checked couple codes from statisctics window manually and it gives me both: 
&lt;BR /&gt;1) Duplicate CRCs within 1 "host" job 
&lt;BR /&gt;2) Duplicate CRCs for different jobs with diffrent "host" constants. 
&lt;BR /&gt; 
&lt;IMG src="https://community.qlik.com/" /&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MEOS.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/128136i885F1D4D88D86012/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MEOS.jpg" alt="0683p000009MEOS.jpg" /&gt;&lt;/span&gt; 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MEDu.jpg"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/131821i37A11D4243FF7CBC/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MEDu.jpg" alt="0683p000009MEDu.jpg" /&gt;&lt;/span&gt;</description>
      <pubDate>Wed, 11 Jun 2014 10:50:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321314#M91353</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-06-11T10:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: CRC duplicate value when exporting Date-Host-URL from Google Analytics</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321315#M91354</link>
      <description>At the moment I have no idea what is your problem. Could you please explain a bit more detailed what exactly does not work. 
&lt;BR /&gt;To your post script comments: 
&lt;BR /&gt;There is actually not technical reason to stop at a number of rows (e.g. 100,000) caused by the component tGoogleAnalyticsInput. I guess Google sets here some limit. 
&lt;BR /&gt;Here all available limits but I have not read anything about row count limits. 
&lt;BR /&gt; 
&lt;A href="https://developers.google.com/analytics/devguides/reporting/core/v3/limits-quotas" rel="nofollow noopener noreferrer"&gt;https://developers.google.com/analytics/devguides/reporting/core/v3/limits-quotas&lt;/A&gt; 
&lt;BR /&gt;To be honest I am never running into this trouble because I design the requests in a way they returns only the data for a day or for an hour or for one profile. I mean, trying to gather all data at once is a bad design and and prevents restart capability. 
&lt;BR /&gt;I suggest you run multiple queries with a reasonable smaller amount of data. You can always check your queries in the API console. 
&lt;BR /&gt;Please keep in mind, the component receives a huge JSON file as result and I guess there is a natural limit how large a answer should be.</description>
      <pubDate>Wed, 11 Jun 2014 13:15:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321315#M91354</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-06-11T13:15:13Z</dc:date>
    </item>
    <item>
      <title>Re: CRC duplicate value when exporting Date-Host-URL from Google Analytics</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321316#M91355</link>
      <description>Thanks for your reply, Jlolling. 
&lt;BR /&gt;My issue: 
&lt;BR /&gt;I have 5 websites: 
&lt;BR /&gt;TUT.BY 
&lt;BR /&gt;SPORT.TUT.BY 
&lt;BR /&gt;NEWS.TUT.BY 
&lt;BR /&gt;AUTO.TUT.BY 
&lt;BR /&gt;LADY.TUT.BY 
&lt;BR /&gt;They have different URIs, but I need to have them all in the SAME DB. 
&lt;BR /&gt;What I do: 
&lt;BR /&gt;1) Query Google Analytics: 
&lt;BR /&gt;"ga:date,ga:sourceMedium" + "ga:sessions,ga 
&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009MAB6.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/158321i00588DF41617C922/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009MAB6.png" alt="0683p000009MAB6.png" /&gt;&lt;/span&gt;ageviews,ga:bounces,ga:sessionDuration,ga:users,ga:newUsers" 
&lt;BR /&gt;2) Add column "Host": for "TUT.BY" job = "TUT_BY", for "NEWS.TUT.BY" job = "NEWS_TUT_BY", etc. So I have 5 different jobs for each profile. (I use tMap component for that.) 
&lt;BR /&gt;3) I pass this table into tAddCRCrow, and generate 32bit code based on ga:date + ga:sourceMedium + Host columns 
&lt;BR /&gt;(CRC column is set to "key" to enable future updates.) 
&lt;BR /&gt;4) upload this data into MySQL db. 
&lt;BR /&gt;BUT when I run it I see in statistics "Duplicate value error" for CRC column. 
&lt;BR /&gt;I.e. CRC1 for "TUT.BY" job sometimes = CRC2 for "NEWS.TUT.BY" job. 
&lt;BR /&gt;How is that possible? Or how can I fix that? 
&lt;BR /&gt;Best, 
&lt;BR /&gt;Ivan</description>
      <pubDate>Wed, 11 Jun 2014 13:39:40 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321316#M91355</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-06-11T13:39:40Z</dc:date>
    </item>
    <item>
      <title>Re: CRC duplicate value when exporting Date-Host-URL from Google Analytics</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321317#M91356</link>
      <description>The way you describe the build of the CRC checksum, I have also no idea whats wrong with this approach. 
&lt;BR /&gt;But, why do you build a checksum? 
&lt;BR /&gt;I giess the duplicate error is caused by an unique constraint in the database? 
&lt;BR /&gt;Next: I would not build 5 very similar jobs, I would build one job which gets the URL (or host, as well as the profile Id) as context parameter and I would start 5 instances of this job with different values for the url. This way you avoid copy&amp;amp;paste errors. 
&lt;BR /&gt;To detect the way how the wrong CRC value appears, you could use the Trace Mode and inspect the values of all flows. 
&lt;BR /&gt;The very last method could be Java Debugging.</description>
      <pubDate>Wed, 11 Jun 2014 21:02:38 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CRC-duplicate-value-when-exporting-Date-Host-URL-from-Google/m-p/2321317#M91356</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2014-06-11T21:02:38Z</dc:date>
    </item>
  </channel>
</rss>

