<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Detect Data type and store to parquet in App Development</title>
    <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2426461#M96201</link>
    <description>&lt;P&gt;Dear everybody,&lt;/P&gt;
&lt;P&gt;I'm trying to achieve a mass conversion of QVD file to parquet file for a demo server.&lt;/P&gt;
&lt;P&gt;Many of my QVD contain "&lt;EM&gt;mixed&lt;/EM&gt;" data - ie numeric and text data in the same field which leads to a loss of data with parquet file&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Storing fields with mixed data types into parquet may result in loss of data
[657748] values dropped in [MY_FIELD]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;I can deal with mixed field one by one using the text() function but my purpose is to do a general load &amp;amp; Store with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;LOAD * from myqvdfile &lt;/SPAN&gt;&lt;SPAN&gt;then&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;store * to myparquetfile;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Does anybody have a solution ?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 04 Mar 2024 08:51:13 GMT</pubDate>
    <dc:creator>NicolasAimain1</dc:creator>
    <dc:date>2024-03-04T08:51:13Z</dc:date>
    <item>
      <title>Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2426461#M96201</link>
      <description>&lt;P&gt;Dear everybody,&lt;/P&gt;
&lt;P&gt;I'm trying to achieve a mass conversion of QVD file to parquet file for a demo server.&lt;/P&gt;
&lt;P&gt;Many of my QVD contain "&lt;EM&gt;mixed&lt;/EM&gt;" data - ie numeric and text data in the same field which leads to a loss of data with parquet file&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Storing fields with mixed data types into parquet may result in loss of data
[657748] values dropped in [MY_FIELD]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;BR /&gt;I can deal with mixed field one by one using the text() function but my purpose is to do a general load &amp;amp; Store with&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;LOAD * from myqvdfile &lt;/SPAN&gt;&lt;SPAN&gt;then&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;store * to myparquetfile;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Does anybody have a solution ?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2024 08:51:13 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2426461#M96201</guid>
      <dc:creator>NicolasAimain1</dc:creator>
      <dc:date>2024-03-04T08:51:13Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2426732#M96224</link>
      <description>&lt;P&gt;You may read the XML-Header of the QVD's which contain a lot of information about the field-data - at least numeric fields and string-fields are uniquely to identify in &amp;lt;NumberFormat&amp;gt; and &amp;lt;Tags&amp;gt;. By mixed fields I'm not sure but I think the reverse approach of not being a number or text should identify the mixed ones.&lt;/P&gt;
&lt;P&gt;A bit simpler may be to load all fields as strings - maybe by using qvdnooffields() and qvdfieldnames() in a loop and creating an appropriate load-statement within a variable which then used in the final table-load:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/Scripting/FileFunctions/file-functions-script.htm" target="_blank"&gt;File functions | Qlik Cloud Help&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Mar 2024 16:20:58 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2426732#M96224</guid>
      <dc:creator>marcus_sommer</dc:creator>
      <dc:date>2024-03-04T16:20:58Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2464859#M99127</link>
      <description>&lt;P&gt;Hello, any updates regarding this, we have the same issue here.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2024 09:13:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2464859#M99127</guid>
      <dc:creator>JonasS</dc:creator>
      <dc:date>2024-06-24T09:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2486863#M101432</link>
      <description>&lt;P&gt;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/200778"&gt;@NicolasAimain1&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;Did you get this issue sorted in your store statements?&lt;BR /&gt;I have the same challenge and would like to know how you got around this in your example.&lt;BR /&gt;&lt;BR /&gt;I am binary loading an app and looping through the tables and storing them out to our data lake as parquet files.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Many thanks, Carl.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2024 12:13:10 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2486863#M101432</guid>
      <dc:creator>CarlFortey</dc:creator>
      <dc:date>2024-10-14T12:13:10Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2487286#M101501</link>
      <description>&lt;P&gt;Hi, it's still not solved&lt;/P&gt;
&lt;P&gt;I may give a try to Marcus solution within a few week&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2024 07:24:57 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2487286#M101501</guid>
      <dc:creator>NicolasAimain1</dc:creator>
      <dc:date>2024-10-16T07:24:57Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2487293#M101503</link>
      <description>&lt;P&gt;A qvw contained also various xml meta-data within the header which may be useful to get the needed information.&lt;/P&gt;
&lt;P&gt;Beside this you may loop through all fields from a table after the binary load but not as a resident-load else querying the system-tables, maybe with something like this as starting point:&lt;/P&gt;
&lt;P&gt;for i = 0 to nooftables() -1&lt;BR /&gt;&amp;nbsp; &amp;nbsp;let t = tablename($(i));&lt;BR /&gt;&amp;nbsp; &amp;nbsp;for ii = 1 to nooffields('$(t)')&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; let f = fieldname($(ii), '$(t)');&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; c: load isnum([$(f)]) as C autogenerate fieldvaluecount([$(f)]);&lt;BR /&gt;&amp;nbsp; &amp;nbsp;next&lt;BR /&gt;next&lt;/P&gt;
&lt;P&gt;Within the loop and directly after the load the fieldvaluecount() of C could be checked which should be 1 for each pure numeric respectively string field and if it's 2 it's a mixed field and the single value of -1 or 0 for C returns also the data-type information.&lt;/P&gt;
&lt;P&gt;With a bit on-top logic you may also exclude key-fields or already known fields from the check-loop. By the most fields the above logic is quite fast but by dozens of millions of distinct key-values such approach will take some time ...&lt;/P&gt;</description>
      <pubDate>Wed, 16 Oct 2024 07:57:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2487293#M101503</guid>
      <dc:creator>marcus_sommer</dc:creator>
      <dc:date>2024-10-16T07:57:42Z</dc:date>
    </item>
    <item>
      <title>Re: Detect Data type and store to parquet</title>
      <link>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2490179#M101903</link>
      <description>&lt;P&gt;Using a loop to load XML information from all my QVD file I can see that there is no corresponding data&amp;nbsp; tag in "String%Table" for problematic field&lt;/P&gt;
&lt;P&gt;I created a chart table based on below code using&amp;nbsp;&lt;STRONG&gt;Concat(distinct [String%Table],'|')&lt;/STRONG&gt; to find out field without data tags (or with only $key tags)&lt;/P&gt;
&lt;P&gt;tagging these fields appears to solve the problem (no more&amp;nbsp;&lt;EM&gt;[xxxxx] values dropped in [&amp;lt;field&amp;gt;]&lt;/EM&gt; message)&lt;/P&gt;
&lt;P&gt;next step will be to&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;confirm in parquet file that my data are still here&lt;/LI&gt;
&lt;LI&gt;do this &lt;STRONG&gt;concat&lt;/STRONG&gt; on script side on check wether the resulting field is empty or if it contains $key to automatize the process&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;for each vFile in filelist('lib://&amp;lt;mylib&amp;gt;')
let vFilename  = &amp;lt;keep only filename from vFile&amp;gt;;

String:
LOAD
String%Table,
'$(vFilename)_'&amp;amp;%Key_QvdFieldHeader_59D66ED49CFF179D as %Key_QvdFieldHeader
FROM [lib://&amp;lt;mylib&amp;gt;$(vFilename).qvd]
(XmlSimple, table is [QvdTableHeader/Fields/QvdFieldHeader/Tags/String]);

QvdFieldHeader:
LOAD
'$(vFilename)' as Filename,
"FieldName",
'$(vFilename)_'&amp;amp;%Key_QvdFieldHeader_59D66ED49CFF179D as %Key_QvdFieldHeader
FROM [lib://&amp;lt;mylib&amp;gt;$(vFilename).qvd]
(XmlSimple, table is [QvdTableHeader/Fields/QvdFieldHeader]);

Next&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2024 14:54:44 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Detect-Data-type-and-store-to-parquet/m-p/2490179#M101903</guid>
      <dc:creator>NicolasAimain1</dc:creator>
      <dc:date>2024-10-31T14:54:44Z</dc:date>
    </item>
  </channel>
</rss>

