Discovering associations between continuous and discrete variables
The last possibility left for discovering and measuring the strength of associations is dependencies between continuous and discrete variables. Let me start by an example. In the dataset I use, the dbo.vTargetMail
view from the AdventureWorksDW2017
demo database, I have the variables that show the occupation and the income of each person. You would expect that there is some association between these two variables—some occupations have higher mean and median income, some lower. However, there could be a surprise hidden in the data. Imagine that somebody would mark their occupation as skilled manual
for an excellent basketball player, for an NBA star. By comparing the mean income over occupation, you could wrongly conclude that you need to go for a skilled manual job, to have the highest possible income. But the difference in mean income between skilled manual and other occupation comes in this case from the variability within...