Instructions:

1. Calculate the infection rate (total infected/population), death rate (total deaths/population) for each FIPSand take those two as dependent variables (Y1, Y2)
2. Conduct linear regression. Set news desertdummy values (1 or 0) as one of the independent variables, and add median household income, high income population rate (high Income population/population), median age, education rate (educational attainment population 25 years and over/population), poverty rate (family poverty/family households total), unemployment rate (unemployment/employment status civilian labor force), population density (population/land area), Average annual temperature, Hospital , Population (use log function to get the smaller numbers for population) as covariates. Use infection rate and death rate as the results (Y) separately. You need to conduct the linear regression twice. Y1(infection rate) = ax1(news desert dummy values) + bx2(median household income)+cx3(median age)+….. and Y2(death rate) = ax1(news desert dummy values) + bx2(median household income) +cx3(median age)+…..

Get the statistical report to check 1) if it’s statistically significant 2) the coefficient for each X especially the coefficient for news desert variable

1. Conduct instrumental variable regression. Use Y1 and Y2 separately. You need to conduct the instrumental variable regression twice. For each analysis, you should use Elevationas the instrumental variable for news desert and put transportation development level (public transportation + Bicycle + taxicab motorcycle or other means), recreational housing (for seasonal recreational or occasional use/housing units), Average annual temperature, Hospital as covariates.

Get the statistical report to check 1) if it’s statistically significant 2) the coefficient for news desert variable 3) if this instrumental variable makes sense

1. If you can’t get good statistical reports for 2) and 3), think about how to improve features selection to get better reports for 2) and 3)

1. Make all data group by state. Instead of using FIPS as the primary key for each record, you should group records based on the state. Once it’s finished, calculate the infection rate (total infected/population), death rate (total deaths/population) for each state and take those two as dependent variables (Y3, Y4)
2. Conduct the linear regression analysis in 2) for Y3 and Y4 on state level
3. Conduct the instrumental variable regression analysis in 3) for Y3 and Y4 on state level
4. Can you get good statistical reports for 6) and 7)? If no, what else methods should you use to get better reports? Can you try them to get better reports?