Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Valnar
Contributor II
Contributor II

Comparing sets of yearly data in a table that has the year as on of its dimensions

Hello to everyone. I don't know if what I am trying to do is even possible, but maybe someone can help me see a solution that actually eludes me.

Let's start with the data.  I have a list of companies with indication of their activity status for each year. Think something like this:

 

 

 

Company|Year|Status
company1|2017|Active
company1|2018|Active
company1|2019|Inactive
company1|2020|Active
company2|2018|Active
company2|2019|Active
company2|2020|Active
...

 

 

 

And so on.

What I am trying to do is making a chart showing, for each year, how many companies that were active the year before are now inactive (in one column), and how many companies that the year before were inactive or didn't exist, are now active (in another column).

I have tried with set analysis, but this obviously doesn't work, as Year is a dimension in my table. I have experimented with creating a [Year2] field, but it doesn't work either (or at least I didn't understand how to make it work).

I also tried to add, in the script level, a new field noting the change from Active to inactive with a -1 and from inactive (or non-existent) to active with a 1, like this:

 

 

Company|Year|Status|StatusChange
company1|2017|Active|1
company1|2018|Active|0
company1|2019|Inactive|-1
company1|2020|Active|1
company2|2018|Active|1
company2|2019|Active|0
company2|2020|Active|0
...

 

 

However, even though looking at the resulting data everything seems OK, the rults don't add up. For example, I have 370 companies in 2018 and 360 in 2019, but the status changes only show a +4/-12.

So, is what I am trying to do effectively impossible, or is there some other way that I didn't think of?

Labels (1)
3 Replies
marcus_sommer

It's indeed not trivial - because by using year as dimension the years above and below didn't belong to the dimension-value and are therefore not available - at least not in the standard. With set analysis you could in general overwrite this behaviour but it's much complex, slow and related with serious disadvantages to the user-experience. I wouldn't recommend to go in this direction.

I think I would remove the year dimension and not applying a single expression else n expressions - one for each year and view-type. Maybe something like:

sum(aggr(pick(match(
   count({< Year = {2018} >} distinct Year) & '|' & count({< Year = {2019} >} distinct Year),
   '0|0', '0|1', '1|0', '1|'1'), 0, 1, 0, 1), Company))

The bold-marked years and return-values would needed to be adjusted for each Year/View-Combination. Within a table-chart it should work well but within other charts there are restrictions to the number of dimensions and expressions in relation to the available chart-settings which may not always be suitable, for example by a stacked bar-chart I have doubts that it would be working. But with a line-chart a sensible visualization may be possible.

Another approach may be to remain by Year as  dimension and a single expression for a view-type with a similar logic like the shown expression above but not counting to years else using concat() with TOTAL to get all results within a string - and then applying also a pick(match()) approach to pick the appropriate values - whereby the pick-logic is probably a bit more complicated as above and may need also a second-layer.

Further I could imagine to solve such challenge with The As-Of Table - Qlik Community - 1466130.

Valnar
Contributor II
Contributor II
Author

Thanks for your answer. I had already tried using the AsOf logic, but it didn't work for me. 

I think I migh've actually found a solution using a FOR loop, iterating on the table, making a new table with only the companies from year x and then making a table with only the new companies from year x+1 using the NOT EXIST instruction.

Like this:

FOR y = 2018 TO 2023 STEP 1

  [Existing]:
  LOAD
  	[company] AS [CompanyEx]
  RESIDENT administration_signed_company
  WHERE YearC= $(y) AND [State] = 'Active';

  [New]:
  LOAD
      [company] AS [CompanyNew],
      [YearC] AS [YearNew]
  RESIDENT administration_signed_company
  WHERE YearC = ($(y)+1) AND [State] = 'Active' AND NOT EXISTS([CompanyEx], [company]);

	DROP TABLE [Existing];

NEXT;

Dropping the Existing table at the end of each iteration insures that I me consider only one year at a time. 

However exist seems unable to compare the values from the field company. I thought it might be because it is a numeric code, often with leading zeroes, however it doesn't work even if i change that value to a string by concatenating a letter in front of it.

Any ideas?

marcus_sommer

Exists() queries always against the loaded field-values completely independent from which load and in which tables this field is resident. Therefore the load-order of the entire script and the field-names itself (it's not uncommon to use multiple renaming-statements in a longer script-logic) are very important to ensure that the query would cover the wanted logic. It's not always trivial ...

If it's really possible to solve this task within the script or at least to create some supporting logic I would tend to slightly different approach not with loops else just by loading the data twice - with some adjustments, for example like:

t1: load *, Company & '|' & Year as Key inline [
Company|Year|Status|StatusChange
company1|2017|Active|1
company1|2018|Active|0
company1|2019|Inactive|-1
company1|2020|Active|1
company2|2018|Active|1
company2|2019|Active|0
company2|2020|Active|0
] (txt, delimiter is '|');

t2:
load Key, Year as YearWithPreviousYear, Status as StatusX,
           pow(2, match(Status, 'Active', 'Inactive')) as Value, 'Current' as Source
resident t1 order by Key asc;
concatenate(t2)
load Key, Year - 1 as YearWithPreviousYear, previous(Status) as StatusX,
           pow(3, match(previous(Status), 'Active', 'Inactive')) as Value, 'Previous' as Source
resident t1 order by Key asc;;

left join(t2) load Key, sum(Value) as ValueSum
resident t2 group by Key;

which would result in data like:

marcus_sommer_0-1691666195016.png

which might be regarded as a kind of an as-of-table and which already contained with ValueSum already a flag-field for the task. Not all of the included information are necessary and just integrated to demonstrate the logic whereby you might need some adjustments in the details - handling non exists years, sorting-order asc or desc, a definitely not overlapping Value-logic which means that ValueSum would be unique for each case and finally defining which ValueSum covers which use-case,

The aim is to cover all scenarios just with one field by applying something like this:

count({< ValueSum = {11}>} distinct Company)

With the appropriate ValueSum the lost and new companies as well as the remaining ones should be displayable against the year-dimension.