Ordinal data – Mantel-Haenszel statistic

30Quite often I encounter situations where the response variable is ordinal (and not your traditional continuous data) and the research question is trying to determine whether there are differences in the response variable between 2 or more groups.  Another way of phrasing this:  Is there an association between the ordinal outcome response and input factors.  If you read this, your natural inclination may be to say ANOVA!  That should answer the research question.  However, can you meet the assumption that the response variable is normally distributed with homogenous variance?  More than likely NO!  So what can you do?  Try a variety of transformations?  Chances are they won’t do the trick. So, let’s leave the data in its collected form and look at the Mantel-Haenszel statistic.

Why should we look at the Mantel-Haenszel statistic? I’m going to pull a quote from the Categorical Data Analysis Using the SAS System by Stokes, Davis and Koch (1995) since they do an awesome job at describing this strategy:

“The Mantel-Haenszel strategy potentially removes the confounding influence of the explanatory variables that comprise the stratification and so provides a gain of power for detecting association by comparing like subjects with like subjects.  In some sense, the strategy is similar to adjustment for blocks in a two-way analysis of variance for randomized blocks; it is also like covariance adjustment for a categorical explanatory variable.”  (p 40-41)

Let’s take a look at an example from the source listed above.  The data are looking at the number of colds children contract in 2 regions.  The data are as follows:

Periods with Colds
Gender Residence 0 1 2 Total
Female Urban 45 64 71 180
Female Rural 80 104 116 300
Total 125 168 187 480
Male Urban 84 124 82 290
Male Rural 106 117 87 310
Total 190 141 169 600

What we’re looking at are two 2 rows x 3 columns tables, one for the females and a second for the males in the sample.  Our research question is to determine whether there is an association between residence (rural and urban) and the number of periods the children had colds.  In other words, is there a difference in the number of colds children caught in the rural vs urban residence?

Data colds;
  input gender $ residence $ per_cold count @@;
female urban 0   45   female urban 1    64   female urban 2    71
female rural   0   80   female rural   1  104   female rural   2  116
male    urban 0   84   male    urban 1  124   male    urban 2    82
male    rural   0 106   male    rural   1  117   male    rural   2    87

Proc freq data=colds;
  weight count;
  tables gender*residence*per_cold / all nocol nopct;

To obtain the Mantel-Haensel statistic we will need to use the Proc Freq.  The coding used in this example for the Proc Freq is coding we have all seen in the past – the only “new” feature is that we are now looking at 3 variables – gender by residence by per_cold.

Try running the code and then review the output.

The first thing you will notice is that you have  3 pages of output.  The first page pertains to the cross tabulation of residence by per_cold for the females, the second is the cross tabulation of residence by per_cold for the males, and the third page is a summary page.  Now how did that happen?  When we asked for gender*residence*per_cold – SAS interprets this as the crosstab of residence (row) by per_cold (column) for each level of gender (layer).

The second thing you’ll notice is that you have a few more statistical tests to choose from than the regular frequency and chi-square results.

Let’s work through the results:

What we are doing with this analysis is determining whether there is an association between where the children live (rural and urban) and the number of colds they’ve had.  Because we have boys and girls, we want to be able to control for the gender.  The first page of output shows us that there is NO association between residence and Per_cold for females, Mantel-Haenszel Chi-square = 0.1059 with a p-value of 0.7448.  If we look further down the page there are a number of other measures of association available, please read only the ones you set out to examine before you ran the analysis.  In our case, we are only interested in the Mantel-Haenszel Chi-square.

If we look at the same test for the Males – we see that the Mantel-Haenszel Chi-square has a value of 0.7412 and a p-value of 0.3893, showing us that there is NO association between residence and per_cold.

Our real question was whether there was an association among all 3 variables, the last page gives us the final test.  When we control for gender, we still see no association between residence and number of colds in this sample of children (Mantel-Haenszel Chi-square = 0.7379 with a p-value of 0.39).

Screen Shot 2013-11-18 at 7.33.07 PM