Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Concatenate distinct values of a field in a variable

Hello friends,

I am trying to insert in a variable the distinct values of a field, concatenating them with the single quote mark ('), to then use them as a load filter in a query.

I'm a bit stuck, I could do it by loading the unique values, calculate the number of rows and make a subroutine to insert row by row to the variable. Is there a simpler way? I have tried using concat in the load but I can't use the special character (').

Thank you in advance,

Luis Andrade

Labels (1)
1 Solution

Accepted Solutions
marcus_sommer

I think it's not seldom that the database resides on much more performant hardware-resources it's rather rarely that they is faster as a well-designed Qlik environment.

On the Qlik side it's very important to keep attention on the fact that the qvd-loadings are optimized which could become a bit difficult in more complex scenarios because not only an avoiding of processing-statements within a single qvd-load is enough else the entire order of all load-steps and that the table-structures are identically are important. More information could you get from the last link-area of this post:

Advanced topics for creating a qlik datamodel - Qlik Community - 1494747

Beside this by switching your load-approach from a wildcard-load to a filelist() loop you could avoid that none relevant datasets are touched. This means not loading:

t: load * from path\*.qvd (qvd) where exists(Period);

else something like:

for each file in filelist('path\*.qvd')
   if keepchar('$(file)', '0123456789') >= $(vMinPeriod) then
      t: load * from [$(file)] (qvd);
    end if
next

Keepchar() is here used for simplicity - if your full-path contained further numbers you need to use subfield() and/or further string-functions to extract the period- or any other relevant information. Filelist() and dirlist() are very fast even by large folders and the same is true for the embedded if-loops to compare the within the filenames included information with your load-requirements.

I use this method for nearly all my rawdata + etl-generators + the following datamodels within the incremental approaches which allows me to decide if I want to load full, any partial dataset and/or incremental respectively incremental+ which is a combination of a partial + incremental load. Beside this the filelist() + following if-loops enables you to use another field within a where exists(condition); without the need to combine them in beforehand.

A further possibility to speed up the reload of an application would be to load the data binary from a datamodel maybe with data until yesterday and the data from today are added on top of the binary by loading optimized from appropriate qvd's.

- Marcus

View solution in original post

8 Replies
chrismarlow
Specialist II
Specialist II

Hi,

Does;

Concat(DISTINCT [Dim],''',''')

Not work?

Maybe try using something that does not appear in the field & use replace, so;

''''&Replace(Concat(DISTINCT [Dim],'~'),'~',''',''')&''''

Or am I missing something as to how this is then used?

Cheers,

Chris.

 

Anonymous
Not applicable
Author

Finally, I ended up doing this, I didn't use @chrismarlow  method as I believe the fields have a character limitation, but I would understand that it should work using a variable instead of the special character in the concatenation.

SUB VARIABLES (vTable)

	TRACE >>> Tabla: $(vTable);
    LET vFields = NoOfFields('$(vTable)');

    TRACE >>> #Campos: $(vFields);

    FOR i = 1 TO $(vFields)

        TRACE >>> $(i);
        LET vNameField	= FieldName($(i), '$(vTable)');
        TRACE >>> $(vNameField);

        NOCONCATENATE

        [$(vNameField)]:
        LOAD DISTINCT
            $(vNameField)
        RESIDENT $(vTable);

        LET vRows	= NoOfRows('$(vNameField)');
        TRACE >>> #Filas: $(vRows);

        SET $(vNameField)	= "'";
        SET vDelimiter		= "','";
        SET vEnd			= "'";

        FOR j = 1 TO $(vRows)-1

            LET vValueField	= Peek('$(vNameField)', $(j)-1, '$(vTable)');
            LET $(vNameField)		= '$($(vNameField))' & '$(vValueField)' & '$(vDelimiter)';

        NEXT

        LET j	= $(vRows);
        LET vValueField		= Peek('$(vNameField)', $(j)-1, '$(vTable)');
        LET $(vNameField)	= '$($(vNameField))' & '$(vValueField)' & '$(vEnd)';

        DROP TABLE [$(vNameField)];

    NEXT
    
    LET i			= Null();
    LET j			= Null();
    LET vFields		= Null();
    LET vNameField	= Null();
    LET vRows		= Null();
    LET vDelimiter	= Null();
	LET vEnd		= Null();
    LET vValueField	= Null();

END SUB

 

Thank you friends

marcus_sommer

I'm not sure if I understand your requirement right but you used no field-functions to get the distinct field-values else using a resident load to load the data and afterwards peek() within another loop to fetch the values. Instead of this you may use respectively adopt the following approach:

for each x in FieldValueList('YourField')
   let y = '$(y)' & ',' & '#$(x)#';
next
let y = replace(mid('$(y)', 2), '#', chr(39));

Beside this are you sure that saving it within a variable has benefits against storing it as tables? What is the aim of it? It's just to know in each application which values does exists you may better use the system-fields $Table, $Fields and so on in the UI - there is already an object for it (the last one of the list by the right-click and then new object).

- Marcus

Anonymous
Not applicable
Author

Hi @marcus_sommer , what happens is that I have a detail app with ODAG, and the user also needs some object from this app to use in NPrinting. To avoid doing a reload of ~16 million records per month, for 36 periods, it is better to load only what is needed. So I needed to reduce, for that I needed the distinct values in a variable (vField1) to use it in the database query
...WHERE field1 IN ('value1','value2',.'value3',...)

Instead
...WHERE field1 IN ($(vField1))

I will test your code to reduce lines.

Thanks

marcus_sommer

Usually it's not necessary to extract all field-values for all fields to load a certain subset of data from a bigger dataset. Commonly for such use-cases is that some key's and/or dates/timestamps and/or hierarchically structures like products or regions - maybe in combinations - are used to pick the wanted dataset.

Important is that the used filter(-combinations) are properly associated to the dataset - this is usually not reachable with field-values because they are stored distinct - their relations to the records could not simply combined.

Further your approach to filter the sql with a where in() isn't really performant and often restricted to a max. number of in() parameter - AFAIK something by 1000.

Therefore I suggest to use the above mentioned key's, dates and so on for the filtering und to do it completely against Qlik data. This means implementing an incremental approach of transferring all relevant historical database-data to Qlik and storing it there in appropriate qvd's - slicing them in YYYYMM.qvd's or similar suitable datasets - and by calling the database only the new or changed records are queried.

An then on top of such an incremental approach you may apply your ODAG and/or NPrinting requirements with where exists(KEY); filters against the qvd's to keep the reloads optimized.

- Marcus

Wooden215
Contributor
Contributor

I definitely enjoy every little bit of it and I have you bookmarked to check out new stuff of your blog a must read blog!

 

firstcallonline

Anonymous
Not applicable
Author

Hi @marcus_sommer ,

I already tried to do it with QVDs and it is usually faster than pointing to the database, but in this case, when using WHERE EXISTS, it apparently reloaded ALL the records of each QVD (YYYYMM) and then applied the WHERE EXISTS condition. This takes about 1 minute per period, which was not acceptable. I tried targeting the database and it only took ~30 seconds for all periods (with MATCH condition), which is why I went this route, plus it saved ~250 GB of hard drive.

Regards,

Luis

marcus_sommer

I think it's not seldom that the database resides on much more performant hardware-resources it's rather rarely that they is faster as a well-designed Qlik environment.

On the Qlik side it's very important to keep attention on the fact that the qvd-loadings are optimized which could become a bit difficult in more complex scenarios because not only an avoiding of processing-statements within a single qvd-load is enough else the entire order of all load-steps and that the table-structures are identically are important. More information could you get from the last link-area of this post:

Advanced topics for creating a qlik datamodel - Qlik Community - 1494747

Beside this by switching your load-approach from a wildcard-load to a filelist() loop you could avoid that none relevant datasets are touched. This means not loading:

t: load * from path\*.qvd (qvd) where exists(Period);

else something like:

for each file in filelist('path\*.qvd')
   if keepchar('$(file)', '0123456789') >= $(vMinPeriod) then
      t: load * from [$(file)] (qvd);
    end if
next

Keepchar() is here used for simplicity - if your full-path contained further numbers you need to use subfield() and/or further string-functions to extract the period- or any other relevant information. Filelist() and dirlist() are very fast even by large folders and the same is true for the embedded if-loops to compare the within the filenames included information with your load-requirements.

I use this method for nearly all my rawdata + etl-generators + the following datamodels within the incremental approaches which allows me to decide if I want to load full, any partial dataset and/or incremental respectively incremental+ which is a combination of a partial + incremental load. Beside this the filelist() + following if-loops enables you to use another field within a where exists(condition); without the need to combine them in beforehand.

A further possibility to speed up the reload of an application would be to load the data binary from a datamodel maybe with data until yesterday and the data from today are added on top of the binary by loading optimized from appropriate qvd's.

- Marcus