Functions for statistical tests and sampling
Statistical tests are important for determining the correlation between two (or more) variables and what is their direction of correlation (positive, neutral, or negative). Statistically speaking, the correlation is a measure of the strength of the association between two variables and their direction. The RevoScaleR
package supports calculation of Chi-square, Fischer, and Kendall rank correlation. Based on the types of variable, you can distinguish between Kendall, Spearman, or Pearson correlation coefficient.
For Chi-Square test, we will be using the rxChiSquareTest()
function that uses the contingency table to see if two variables are related. A small chi-square test statistic means that the observed data fits your expected data very well, denoting there is a correlation, respectively. The formula for calculating chi-square is as follows:

Prior to calculating this statistical independence test, we must have data in the xCrossTab
or xCube format...