11 Replies Latest reply: Feb 7, 2018 8:58 AM by Katarzyna Wójcik

# histogram counts duplicates

Suppose I have two tables joined by id, where one table contains this id as unique identified and second table has multiple records with the same id.

data1:

d1_ID, param1

A1, 1

A2, 2

A3, 2

A4, 3

A5, 2

A6, 4

A6, 3

A7, 5

];

data2:

d2_ID, d1_ID, other_param

1, A1, x

2, A1, y

3, A1, z

4, A2, t

5, A2, x

6, A3, y

7, A4, z

7, A4, t

8, A6, x

];

I would like to make a histogram of param1 values, counting each once per d1_ID. But having these two tables joined, it counts '1' three times, as A1 occurs 3 times in data2 table... Any ideas how to make it?

• ###### Re: histogram counts duplicates

Hi,

here you can create a flag when loading each of your tables, like this you can filter your expresion with a set analysis..

it is one way to do it

• ###### Re: histogram counts duplicates

Hi,

Can you give me an example of how to do it?

• ###### Re: histogram counts duplicates

data1:

LOAD d1_ID, param1, 'data1' as Flag;

d1_ID, param1

A1, 1

A2, 2

A3, 2

A4, 3

A5, 2

A6, 4

A6, 3

A7, 5

];

data2:

d2_ID, d1_ID, other_param

1, A1, x

2, A1, y

3, A1, z

4, A2, t

5, A2, x

6, A3, y

7, A4, z

7, A4, t

8, A6, x

];

expression = Count ({<Flag={'data1'}>} d1_ID)

Let me know

• ###### Re: histogram counts duplicates

Thank you for your reply! However, I can't use Count in histogram - I just provide the name of field, hence I don't know how to use set analysis expression in there

• ###### Re: histogram counts duplicates

What is the expression you are using on the histogram ?

• ###### Re: histogram counts duplicates

There is no expression. I just use one dimension field

• ###### Re: histogram counts duplicates

So you can try on the Dimension:

=if(Flag='data1', your_dimension_field)

• ###### Re: histogram counts duplicates

hmm... looks good, but I just tried on this dummy data and suprisingly this problem does not occur - histogram is calculated properly.. and when tried on my real data and used other parameter as a flag, it didn't work

• ###### Re: histogram counts duplicates

you can maybe add some screen shots here, from your script for example

• ###### Re: histogram counts duplicates

I've realized that there is another part of the script that probably causes the issue... on this dummy data it would be:

Temp:

'data1' as [data1.Table] ,

'1'  as [data1.Count] ,

*

Resident data1;

Outer Join (Temp)

'data2' as [data2.Table] ,

'1'  as [data2.Count] ,

*

Resident data2;

join:

NoConcatenate

if ( [data1.Table] = 'data1' and isnull([data2.Table])

, dual('Yes',1)    , dual('No',0) )  as    [data1.Only] ,

if ( [data2.Table] = 'data2' and isnull([data1.Table])

, dual('Yes',1)    , dual('No',0)     )  as    [data2.Only] ,

if ( [data2.Table] = 'data2' and [data1.Table] = 'data1'

, dual('Yes',1)    , dual('No',0)     )  as    [data1.and.data2] ,

*

resident Temp ;

Drop Table Temp ;

Drop Table data1;

Drop Table data2;

It allows me to filter all data that occurs in both tables. Now while having this, the histogram gives false result

• ###### Re: histogram counts duplicates

Finally my colleague found the solution - posting here for anybody who ever meets this issue.

join2:

NoConcatenate