Comparing Qlik Analytics and R Software for Accurate Two-Sample T-Tests Research Paper

igoralcantara · May 24, 2024 2:46:29 PM

A while ago, I conducted an internal research project with my colleagues @AlaneMiguelis, and Dr. @priscilarubim, aiming to validate the accuracy of a Two-Sample T-Test in Qlik by comparing it with the medical research gold standard: R. While we explored various hypothesis testing methods, we prioritized formalizing our findings on the T-Test. Recently, we decided to make this document available to the whole community.

The focus of our project was to assess whether the statistical results produced by Qlik matched those generated by R, particularly for the Two-Sample T-Test. Our methodology involved rigorous data analysis and repeated testing to ensure the reliability and consistency of our results. Through our detailed comparative analysis, we aimed to provide insights into the robustness of Qlik’s statistical capabilities. The code and dataset is available in a Git repository for those who want to reproduce the study.

Here is the abstract of our study. The full document can be accessed through the link at the end of this post.

ABSTRACT

Objective

The statistical determination of a large or small difference between two groups is not based on an absolute standard, but is rather an evaluation of the probability of an event.1,2 In the field of medical research, it is common to use statistical software for descriptive statistics as well as to perform statistical tests.3 However, most software provides ready-to-use functions, and the researchers have almost no information as to how those statistical tests are calculated inside those functions. This article evaluates the accuracy of two-sample Student’s t-test using Qlik analytics software. The gold standard used for this evaluation is the set of standard t-test functions available in R software, a widely used, robust, and reliable statistical software.5–7

Materials and Methods

The tests performed in this evaluation used a subset of Framingham heart study data. The dataset contains data on 4,434 anonymous participants, collected in three periods apart from each other by 6 years from 1956 to 1968. Five t-tests with 2 scenarios each were performed in Qlik analytics and in R and the results compared.

Results

In general, the results for multiple statistics obtained in Qlik analytics match the ones found in R for multiple scenarios: small and large sample sizes, small and large p-values, assuming and not assuming equal variance.

Discussion

Although Qlik analytics matches all statistics for t-test found in R, the p-value only matches up to four decimal points, which is concluded to be enough for testing hypothesis since the conventional levels of significance do not go lower than 0.1.

Conclusion

This research concluded that Qlik analytics can be used for two-sample t-tests in multiple scenarios.

Keywords: Qlik, t-test, r language, Framingham.

Link for the full article: Comparing Qlik Analytics and R Software for Accurate Two-Sample T-Tests Research Paper (ipc-global.c...

Comparing Qlik Analytics and R Software for Accurate Two-Sample T-Tests Research Paper

Comparing Qlik Analytics and R Software for Accurate Two-Sample T-Tests Research Paper

Client Managed

Qlik Sense

QlikView

SaaS