Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
NicolasAimain1
Partner - Contributor III
Partner - Contributor III

Detect Data type and store to parquet

Dear everybody,

I'm trying to achieve a mass conversion of QVD file to parquet file for a demo server.

Many of my QVD contain "mixed" data - ie numeric and text data in the same field which leads to a loss of data with parquet file 

 

Storing fields with mixed data types into parquet may result in loss of data
[657748] values dropped in [MY_FIELD]

 


I can deal with mixed field one by one using the text() function but my purpose is to do a general load & Store with 
LOAD * from myqvdfile then store * to myparquetfile;

Does anybody have a solution ?

Labels (1)
6 Replies
marcus_sommer

You may read the XML-Header of the QVD's which contain a lot of information about the field-data - at least numeric fields and string-fields are uniquely to identify in <NumberFormat> and <Tags>. By mixed fields I'm not sure but I think the reverse approach of not being a number or text should identify the mixed ones.

A bit simpler may be to load all fields as strings - maybe by using qvdnooffields() and qvdfieldnames() in a loop and creating an appropriate load-statement within a variable which then used in the final table-load:

File functions | Qlik Cloud Help

JonasS
Contributor
Contributor

Hello, any updates regarding this, we have the same issue here. 

CarlFortey
Contributor II
Contributor II

@NicolasAimain1 
Did you get this issue sorted in your store statements?
I have the same challenge and would like to know how you got around this in your example.

I am binary loading an app and looping through the tables and storing them out to our data lake as parquet files. 

Many thanks, Carl.

NicolasAimain1
Partner - Contributor III
Partner - Contributor III
Author

Hi, it's still not solved

I may give a try to Marcus solution within a few week 

marcus_sommer

A qvw contained also various xml meta-data within the header which may be useful to get the needed information.

Beside this you may loop through all fields from a table after the binary load but not as a resident-load else querying the system-tables, maybe with something like this as starting point:

for i = 0 to nooftables() -1
   let t = tablename($(i));
   for ii = 1 to nooffields('$(t)')
      let f = fieldname($(ii), '$(t)');
      c: load isnum([$(f)]) as C autogenerate fieldvaluecount([$(f)]);
   next
next

Within the loop and directly after the load the fieldvaluecount() of C could be checked which should be 1 for each pure numeric respectively string field and if it's 2 it's a mixed field and the single value of -1 or 0 for C returns also the data-type information.

With a bit on-top logic you may also exclude key-fields or already known fields from the check-loop. By the most fields the above logic is quite fast but by dozens of millions of distinct key-values such approach will take some time ...

NicolasAimain1
Partner - Contributor III
Partner - Contributor III
Author

Using a loop to load XML information from all my QVD file I can see that there is no corresponding data  tag in "String%Table" for problematic field

I created a chart table based on below code using Concat(distinct [String%Table],'|') to find out field without data tags (or with only $key tags)

tagging these fields appears to solve the problem (no more [xxxxx] values dropped in [<field>] message)

next step will be to

  1. confirm in parquet file that my data are still here
  2. do this concat on script side on check wether the resulting field is empty or if it contains $key to automatize the process

 

for each vFile in filelist('lib://<mylib>')
let vFilename  = <keep only filename from vFile>;

String:
LOAD
String%Table,
'$(vFilename)_'&%Key_QvdFieldHeader_59D66ED49CFF179D as %Key_QvdFieldHeader
FROM [lib://<mylib>$(vFilename).qvd]
(XmlSimple, table is [QvdTableHeader/Fields/QvdFieldHeader/Tags/String]);

QvdFieldHeader:
LOAD
'$(vFilename)' as Filename,
"FieldName",
'$(vFilename)_'&%Key_QvdFieldHeader_59D66ED49CFF179D as %Key_QvdFieldHeader
FROM [lib://<mylib>$(vFilename).qvd]
(XmlSimple, table is [QvdTableHeader/Fields/QvdFieldHeader]);

Next