I'm supposed to analyse a document-term matrix using Qlik Sense. My rows represent documents and my columns represent words. The values of the table are the occurrences of each word in each document.
What I need is knowing the words appearing most in my corpus. For that, I have to compute the sum of each word through all documents (rows) and choose the max, or (better) have a ranking from the most to the least appearing.
I tried to do it on my own, but my capabilities in Qlik Sense are very limited, especially for the script part. Can someone help me find a solution?
Thanks for your answer @loveisfail. I don't understand what does measure and document stand for, since I have many documents as rows and several words as columns. I don't have only one measure but more than 2000.
Let's imagine we have two documents : "you are learning" and "they are understanding". The resulting matrix would be :
you | are | learning | they | understanding
document_1 1 | 1 | 1 | 0 | 0
document_2 0 | 1 | 0 | 1 | 0
My objective is to order the words (that are in columns) from the most appearing in the corpus to the least appearing. In this exemple, the word "are" should be the first with a number of occurrences = 2.
What I did is that I called the crosstable function and made an aggregation. I don't know if there is a simpler way.