这是一篇英国的数据分析及可视化**经济学代写**

**Assignment 2 **

The deadline for submission of the assignment is the 24th of April at 11.59 pm.Each of the five parts is worth 20%.You should explain your answers fully. Where you are using SPSS, you do not need to show the steps involved in calculating the results but you should explain the meaning of your results fully.You should draw any diagrams by hand and then insert images of them into your answer document.

**Part 1 **

(a) Consider the following data:

It has been claimed that the population has a Poisson distribution with a mean of 5.6 complaints per day. Use a Chi-Square Goodness-of-Fit test to compare the sample data above with the expected frequencies for the five categories according to the Poisson distribution. Carry out the appropriate test **without **using SPSS. You can make use of Excel. Use a level of significance of 5%.

(b) Use SPSS and the data in “Northern Ireland Random Poll” to test the hypothesis that this sample was drawn from a population with the following frequencies. Use a level of significance of 10%.

In the datafile, 0=DUP, 1=SF, 2=UUP, 3=SDLP, 4=Alliance, 5=TUV, 6=Other

**Part II **

(a) Use SPSS and the Northern Ireland data to test the hypothesis that there is no relationship between sex and preferred political party. In the data file, 0=Male,1=Female. Use a level of significance of 10%.

(b) We asked a random sample of students to choose one style of music from a list of four. Test the hypothesis that there is no relationship between preferred style of music and category of student. Use a level of significance of 5%. Do not use SPSS.You can use Excel.

**Part III **

(a) Why might we use the Kruskal-Wallis test instead of ANOVA? (max. 60 words)

(b) Use the Kruskal-Wallis test to test the hypothesis that the following three groups all have the same mean at the population level. Use a level of significance of 10%. Do not use SPSS. You can use Excel

**Part IV **

(a) Using diagrams where appropriate, explain the concepts of covariance, correlation and slope coefficient and the relationship between them.

(b) Use the dataset “Carseats”. There an information document with the data file that describes the variables. Perform a regression analysis in which “Sales” is the dependent variable. Use the following variables as explanatory variables: CompPrice, Income,Advertising, Price, Age. Use SPSS. Write one or two sentences for each explanatory variable to explain the meaning of its coefficient in your results.

(c) What do we mean when we say that the OLS estimators are efficient, consistentand unbiased?

(d) Explain the concept of the coefficient of determination.

**Part V **

In this course, we have typically assumed that our samples are gathered randomly.However, in practice we often encounter a problem that is called “selection bias”. I would like you to investigate this concept and write a short essay explaining some of the main kinds of selection bias and providing examples. (Max. 350 words.)

**Submission Details **

The deadline for submission of the assignment is the 25th of April at 11.59 pm.

- It must be submitted via Canvas. Submissions made after the deadline will be deemed late and subject to penalty (See below).

- The front page of the assignment should contain your name and I.D. number.Penalties for Late Submission:

- Where work is submitted up to and including 7 days late, 10% of the total marks available shall be deducted from the mark achieved. Where work is submitted up to and including 14 days late, 20% of the total marks available shall be deducted from the mark achieved. Work submitted 15 days late or more shall not be accepted