Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
glencote
Contributor II
Contributor II

Integer to Float - int64 vs float64

Hi ,

I have quite a unique situation where we are extracting data to QVDs which then get parsed to Parquet files.

I have a situation where the parquet files are not being parsed because of multiple data types being detected within a field, specifically float64 and int64 (it needs to be one and not the other)

When looking at the field in QV i see the following:

Capture.PNG

I need to convert the above fields to Float.

Now i have looked at the fields values and there are mostly zeros, i have tried:
- Changing the QV Number Variables - Thousands Seperator and Decimal Seperators
- Using different derivatives of num() - using with and without decimals as format, with and without commas for thousands seperators in format etc etc
- Tried Summarizing the data by summing the fields re-storing them. But still QV stores this as the wrong type.

How do I get QV to only see the field as a float (with decimals), is this possible?

Thank you in advance for your help!

I'm sure you have further questions, please let me know.

BR, /Richard

Labels (7)
3 Replies
evan_kurowski
Specialist
Specialist

Hello Richard1989,

The field tags you see under Document properties ~> Tables are descriptive only, and do not represent the "official" data-typing when encapsulated in QVD.

There is XML metadata in the front of each QVD, and the <type> tag indicates how each field was QVD encapsulated:  REAL, INTEGER, ASCII, &  UNKNOWN.

Notice the resultant QVD metadata when running this sample script, even though some of these fields only load 1 integer, 1 float, or 1 text, the QVD still types the fields UNKNOWN.  This is not a mistake, but rather Qlik fields do not like to limit their options by "painting themselves into a data type corner".   These fields can still be appended with any other type of data, in any mixture, this data type flexibility is one of the major conveniences of the platform.

One way to "cast" the data <type> is using the Floor(), Num(#.#), or Text() functions around the first entry that is entered into the field.    You'll notice in these cases the QVD commits to REAL, INTEGER, ASCII.   And in some cases when you attempt to append non-numeric data to a "committed numeric" data type, you'll see strange kind of errors observable in other software, things like values showing up with '?' symbols. (meaning Qlik is unable to assign a numeric value to the entry)

I hadn't realized Parquet wasn't able to accept blended fields, but once you get used to not having to explicitly cast data types (at least during load scripts), you never want to go back.   (naturally data typing still comes into play when using functions designed for a specific type of data).


[TYPES]:
LOAD
     Floor(1) AS Integer_w_Num_Function,   //this produces QVD type INTEGER
     Num(1.) As Real,                           //this produces QVD type REAL
     Text('Text') As Text_Only,                //this produces ASCII type
     1 AS Integer,                             //this and the rest all produce type UNKNOWN
     'ஸ' AS UNICODE,                           //Unicode comes up as UNKNOWN
     1.1 AS Float,
     1 AS Mixed_Int_Float,
     'Text' AS AlphaNumeric_Text_Loads_First,
     1 AS AlphaNumeric_Num_Loads_First              AUTOGENERATE(1);

CONCATENATE(TYPES) LOAD
     1.1 AS Mixed_Int_Float,
     1 as AlphaNumeric,
     'Text' AS AlphaNumeric_Num_Loads_First         AUTOGENERATE(1);    

 

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 <QvdTableHeader>
   <CreatorDoc>datatypes.qvw</CreatorDoc>
   <CreateUtcTime>2020-03-14 01:15:30</CreateUtcTime>
   <TableName>TYPES</TableName>
   <Fields>
     <QvdFieldHeader>
       <FieldName>Integer_w_Num_Function</FieldName>
       <BitOffset>0</BitOffset>
       <BitWidth>8</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>INTEGER</Type>
         <nDec>0</nDec>
         <UseThou>1</UseThou>
         <Fmt>###0</Fmt>
         <Dec>.</Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>0</Offset>
       <Length>5</Length>
       <Comment></Comment>
       <Tags>
         <String>$numeric</String>
         <String>$integer</String>
       </Tags>
     </QvdFieldHeader>
     <QvdFieldHeader>
       <FieldName>Real</FieldName>
       <BitOffset>8</BitOffset>
       <BitWidth>2</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>REAL</Type>
         <nDec>14</nDec>
         <UseThou>1</UseThou>
         <Fmt>##############</Fmt>
         <Dec>.</Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>5</Offset>
       <Length>7</Length>
       <Comment></Comment>
       <Tags>
         <String>$numeric</String>
         <String>$integer</String>
       </Tags>
     </QvdFieldHeader>
     <QvdFieldHeader>
       <FieldName>Text_Only</FieldName>
       <BitOffset>10</BitOffset>
       <BitWidth>2</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>ASCII</Type>
         <nDec>0</nDec>
         <UseThou>0</UseThou>
         <Fmt></Fmt>
         <Dec></Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>12</Offset>
       <Length>6</Length>
       <Comment></Comment>
       <Tags>
         <String>$ascii</String>
         <String>$text</String>
       </Tags>
     </QvdFieldHeader>
     <QvdFieldHeader>
       <FieldName>Integer</FieldName>
       <BitOffset>12</BitOffset>
       <BitWidth>2</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>UNKNOWN</Type>
         <nDec>0</nDec>
         <UseThou>0</UseThou>
         <Fmt></Fmt>
         <Dec></Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>18</Offset>
       <Length>7</Length>
       <Comment></Comment>
       <Tags>
         <String>$numeric</String>
         <String>$integer</String>
       </Tags>
     </QvdFieldHeader>
     <QvdFieldHeader>
       <FieldName>UNICODE</FieldName>
       <BitOffset>14</BitOffset>
       <BitWidth>2</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>UNKNOWN</Type>
         <nDec>0</nDec>
         <UseThou>0</UseThou>
         <Fmt></Fmt>
         <Dec></Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>25</Offset>
       <Length>5</Length>
       <Comment></Comment>
       <Tags>
         <String>$text</String>
       </Tags>
     </QvdFieldHeader>
     <QvdFieldHeader>
       <FieldName>Float</FieldName>
       <BitOffset>16</BitOffset>
       <BitWidth>2</BitWidth>
       <Bias>-2</Bias>
       <NumberFormat>
         <Type>UNKNOWN</Type>
         <nDec>0</nDec>
         <UseThou>0</UseThou>
         <Fmt></Fmt>
         <Dec></Dec>
         <Thou></Thou>
       </NumberFormat>
       <NoOfSymbols>1</NoOfSymbols>
       <Offset>30</Offset>
       <Length>13</Length>
       <Comment></Comment>
       <Tags>

 


STORE * FROM TYPES INTO TYPES.QVD (QVD);

 

 

glencote
Contributor II
Contributor II
Author

Hi Evan,

Thank you for the detailed answer and for explaining it so well with the examples. 

In my situation, after some more testing etc i have found that 0 values are my enemy when converted QVD to Parquet files are read into Google Cloud.

I have Monthly QVDs. I have a field called AMT. Some months the AMT has figures eg. 21.23 which Parquet will read as Float64 but months where the values are only 0 it reads this as INT64 and it doesn't matter what i try to do to those 0s, it always reads them as INT64. The only time i have come close is when adding a dummy row with AMT = 1.23, restoring the QVD - then Float64 is picked up again, but obviously i don't want to add dummy data to my main data store. When i drop the dummy data after creating it and re-store it converts it straight back to an Int64 again. 

Is it even possible to convert a zero to Float64 in the underlying data? num(AMT,'# ##0,00',',',' ') on a 0 doesn't help, it just changes the visual representation of 0 to 0,00 but it is still an underlying int64.

Is it time to look for another solution possibly? 

evan_kurowski
Specialist
Specialist

What happens when you export the data to .csv's?  Is the Hadoop import able to convert a set of .csv's that have both zero's and non-zero values into Parquet and preserve a consistent data-type?

If so, then maybe there's an inability on the part of the "Hadoop import wizard" that can't enforce columnar data types.  Is this something that could be scripted?  

Is it Google Cloud that is performing the "conversion from QVD to Parquet"?  or some Hadoop utility?