The table below classifies n = 68; 694 passengers involved in automobile accidents in the state of Maine according to gender G = fmale, femaleg, location of accident L = frural, urbang, whether a seat belt was used S = fyes, nog, and injury I = fyes, nog. The time period over which the data were collected is unknown.

1. (4 points) Suppose we want to analyze the data to understand the impact that seat belts have on whether an injury occurs in an automobile accident. Briefly explain why the variables G = gender and L = location should be accounted for in addition to S and I. Support your answer using the data. (Note: speed limits tend to be higher in rural areas than in urban areas, so traffic accidents in rural areas tend to be more severe.)

2. (6 points) Consider fitting a logistic regression model using I as a response variable, and G, L, and S as explanatory variables.

For each possible pairwise interaction of explanatory variables, explain how its inclusion in the model would impact the probability of injury.

3. (8 points) Fit a logistic regression model predicting the probability of injury I using the explana tory variables G, L, and S. Include at least one pairwise interaction between explanatory variables in your model.

For each possible interaction of explanatory variables that could have been included in your model,briefly explain either why you did include it or why you excluded it. Justify your answers.

4. With respect to the model you estimated in Question #3, answer the following questions:

(a) (14 points) Estimate all the distinct effects associated with S in your model (depending
on the interactions you included, there could be as many as four distinct conditional odds
ratios associated with S, one for each combination of gender and location). Calculate 90%
confidence intervals for all distinct effects associated with seat belt use S. Use some method
to control the simultaneous confidence level if necessary. Interpret all quantities in the context
of the problem.

(b) (18 points) Estimate the probability of injury at all eight combinations of gender G, location
S, and seat belt use S. Then calculate 90% confidence intervals for the true values of all eight
probabilities, using some method to control the simultaneous confidence level. Interpret all
the quantities you calculate in context.

(c) (6 points) Assess the goodness of fit of your model. You may use formal or informal methods.
Whatever method(s) you use, justify your use of them and explain your conclusions.

5. (2 points) Now fit a model predicting injury I from G, L, and S using only main effects. Using
a formal hypothesis test, assess the goodness of fit of this model compared to the model you fit in Question #3. Interpret the results in context.

6. (38 points) Repeat all parts of Question #4 for the main effects model you fit in Question #5.

7. (4 points) Which model do you think is preferred? Fully justify your answer.