Skip to main content
Woohoo! Qlik Community has won “Best in Class Community” in the 2024 Khoros Kudos awards!
Announcements
Nov. 20th, Qlik Insider - Lakehouses: Driving the Future of Data & AI - PICK A SESSION
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Example for you - Learn document compression

I want to make an example based on what my book says -

Consider the example of a timestamp field, which contains a date and a time. If we were to load data corresponding to a single year, it could potentially lead to 31,536,000 unique timestamp values: 365 days x 24 hours x 60 minutes x 60 seconds.

However, we could also decide to split the timestamp into two fields: a date field and a time field. In this scenario the maximum number of unique values would be reduced to 86,765, that is, 365 days for the date field and 86,400 for the time field. This is only 0.28 percent of the original volume, and can therefore have a tremendous impact on the document size, especially on larger data volumes.

So, I made a Table WebTraffic(TStamp, Visitors) to log the number of visits to a website in a each hour of 2014.

My sample data file has 365 * 24 = 8760 rows. In Qlikview, I want to decrease that to just 365 + 24 rows by splitting TStamp

into Date and Time.

The problem - Please refer to the image below. When I select date only and time separately, qlikview does not select the visitors. I know I have to put some key to link the tables. How do I make that happen ? Once done, I will delete all the timestamps which store each hour. My qlikview file is also attached.

Capture.JPG.jpg

1 Solution

Accepted Solutions
swuehl
MVP
MVP

Your tables are unlinked, that's why your selections don't affect the Visitors.

Try

LOAD TStamp,

  Date(Floor(TStamp)) as [DateOnly],

  Time(Frac(TStamp)) as [TimeOnly],

     Visitors

FROM

WebTraffic.csv

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

to link your tables, or

LOAD

  Date(Floor(TStamp)) as [DateOnly],

  Time(Frac(TStamp)) as [TimeOnly],

     Visitors

FROM

WebTraffic.csv

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

to remove the timestamp with its many distinct values.

You wouldn't need the separate dimension tables in this example.

edit: this might also be of interest:

http://community.qlik.com/blogs/qlikviewdesignblog/2014/03/11/the-importance-of-being-distinct

View solution in original post

2 Replies
swuehl
MVP
MVP

Your tables are unlinked, that's why your selections don't affect the Visitors.

Try

LOAD TStamp,

  Date(Floor(TStamp)) as [DateOnly],

  Time(Frac(TStamp)) as [TimeOnly],

     Visitors

FROM

WebTraffic.csv

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

to link your tables, or

LOAD

  Date(Floor(TStamp)) as [DateOnly],

  Time(Frac(TStamp)) as [TimeOnly],

     Visitors

FROM

WebTraffic.csv

(txt, codepage is 1252, embedded labels, delimiter is ',', msq);

to remove the timestamp with its many distinct values.

You wouldn't need the separate dimension tables in this example.

edit: this might also be of interest:

http://community.qlik.com/blogs/qlikviewdesignblog/2014/03/11/the-importance-of-being-distinct

Not applicable
Author

Thanks swuehl. The file size has reduced from 199 KB to 170KB. The DateOnly and TimeOnly columns show 8760 rows even when there are only 365 and 24 rows in them respectively.  Why does this happen ?
I added DISTINCT in those listboxes to display the correct count - = 'Time Only = ' & Count(DISTINCT TimeOnly).

FINAL FIXED FILE attached.