这是一篇来自澳洲的关于现代应用统计学的统计代写
- The fifile assignment3 prob1.txt contains 300 observations. We can read the observations and make a histogram as follows.
> X = scan(file=”assignment3_prob1.txt”, what=double())
Read 300 items
> length(X)
[1] 300
> hist(X)
We will model the observed data using a mixture of three binomial distributions. Specififically,we assume the observations X1, . . . , X300 are independent to each other, and each Xi follows this mixture model:
Zi ∼ categorical (π1, π2, 1 − π1 − π2),
Xi|Zi = 1 ∼ Binomial(20, p1),
Xi|Zi = 2 ∼ Binomial(20, p2),
Xi|Zi = 3 ∼ Binomial(20, p3).
The binomial distribution has probability mass function
f(x; m, p) = m x px (1 − p)m−x.
We aim to obtain MLE of parameters θ = (π1, π2, p1, p2, p3) using the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X300) and Z = (Z1, . . . , Z300). Derive the expectation of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P(X, Z|θ))].
(b) (3 marks) Derive E-step of the EM algorithm.
(c) (5 marks) Derive M-step of the EM algorithm.
(d) (5 marks) Note: Your answer for this problem should be typed. Answers including screen-captured R codes or fifigures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm to the observed data, X1, . . . , X300. Set EM iterations to stop when either the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with the following two difffferent initial values and report estimators with the highest incomplete log-likelihood.
π1 π2 p1 p2 p3
1st initial values
0.3 0.3 0.2 0.5 0.7
2nd initial values
0.1 0.2 0.1 0.3 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-step by plotting them.
- The fifile assignment3 prob2.txt contains 100 observations. We can read the 300 observations from the problem 1 and the new 100 observations and make histograms as follows.
> X = scan(file=”assignment3_prob1.txt”, what=double())
Read 300 items
> X.more = scan(file=”assignment3_prob2.txt”, what=double())
Read 100 items
> length(X)
[1] 300
> length(X.more)
[1] 100
> par(mfrow=c(2,2))
> hist(X, xlim=c(0,20), ylim=c(0,80))
> hist(X.more, xlim=c(0,20), ylim=c(0,80))
> hist(c(X,X.more), xlim=c(0,20), ylim=c(0,80), xlab=”X + X.more”, main = “Histogram of X + X.more”)
Let X1, . . . , X300 and X301, . . . , X400 denote the 300 observations from assignment3 prob1.txt and the 100 observations from assignment3 prob2.txt, respectively. We assume the observations X1, . . . , X400 are independent to each other. We model X1, . . . , X300 (from assignment3 prob1.txt) using the mixture of three binomial distributions (as we did in the problem 1), but we model X301, . . . , X400 (from assignment3 prob2.txt) using one of the three binomial distributions. Specififically, for i = 1, . . . , 300, Xi follows this mixture model:
Zi ∼ categorical (π1, π2, 1 − π1 − π2),
Xi|Zi = 1 ∼ Binomial(20, p1),
Xi|Zi = 2 ∼ Binomial(20, p2),
Xi|Zi = 3 ∼ Binomial(20, p3),
and for i = 301, . . . , 400,
Xi ∼ Binomial(20, p1).
We aim to obtain MLE of parameters θ = (π1, π2, p1, p2, p3) using the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X400) and Z = (Z1, . . . , Z300). Derive the expectation of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P(X, Z|θ))].
(b) (5 marks) Derive E-step and M-step of the EM algorithm.
(c) (5 marks) Note: Your answer for this problem should be typed. Answers
including screen-captured R codes or fifigures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm to the observed data, X1, . . . , X400. Set EM iterations to stop when either the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with the following two difffferent initial values and report estimators with the highest incomplete log-likelihood.
π1 π2 p1 p2 p3
1st initial values
0.3 0.3 0.2 0.5 0.7
2nd initial values
0.1 0.2 0.1 0.3 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-step by plotting them.