Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all
I have a table in sas that was exported to hadoop/hive and saved there. So, the sas and Hive table are exactly the same with roughly 200 million rows and one column(for testing).
I first created qvds and the qvd sizes were very similar. Then I read the table into a qliksense app and the sizes were quite different. The app with the Hive data is 400mb while the app with the same table sourced from sas is 10mb.
Why is the app from Hive much bigger? This is a problem because when I add all the columns the app size gets much bigger. And alternative is to just run the sas table but we were trying to move away from sas.
One thing to note is the connection for both tables to qliksense was through Ole db to sas, and with the Hive table I used a library in sas to pull the Hive table, so I guess perhaps I'm not pulling the data directly from Hive using the apache Hive connection. The reason is I can't seem to connect to the apache Hive on Qliksense, but I can connect to sad using the Ole dB connection.
I wonder if reading the table from the Hive into sas is causing the issue or if some properties of the data changes once it goes into the Hive, or perhaps there is something else going on. What I don't understand is the qvd size of the Hive and and the sas table is very similar but once I load the tables into the app the app sizes are considerably different. Also, loading the qvd for the Hive table takes much longer than loading the qvd from sas.
Why would the the app sizes be different if the tables are the same, just saved in different places (Hive vs sas)?
It may depend on the storing-type - row level based with/without considering the data-type respectively column-level based with just a data-interpretation.
A classical sql used the row-level with data-types which means the needed size is:
(Field1 * data-type * number of records) + (Field2 * data-type * number of records) and so on
In a qvd it's a column-level storing in which only distinct values without data-types are stored and a bit-stuffed pointer.