- In environmental sciences, it is common to encounter observations below the limit of detection (LoD) and reported as ”< value”.
(a) Formally defifine the limit of detection and write down an equation for it in terms of mean concentration and standard deviation. [3 MARKS]
Detection limit: defifined as the lowest concentration that can be distinguished with reasonable confifidence from a blank, a hypothetical sample containing zero concentration of the analysis of interest. Limit of detection: cL or qL is related to the smallest measure of response xL that can be detected, where xL = xB +k×sB.xB is the mean of blanks, sB is the standard deviation of the blanks, k a numerical constant cL = k × sB, where k is usually 3 in the analytical sciences literature. (total of 3 for comments and defifinition; bookwork).
(b) Brieflfly describe three widely used non–statistical techniques for dealing with such LoD observations in the environmental science literature and contrast them with the two statistical approaches. Comment critically on the difffferent approaches. [5 MARKS]
Non-statistical approaches/ environmental stats approaches ignore the ”¡ value”,and simply take the numerical values, else take the “¡value” and replace it by a fifixed constant (e.g. 0.5CL), or treat the value as zero. Another approach is to replace the “¡value” from a probability distribution (uniform over 0, CL). Statistical approaches include the Kaplan Meier approach, based on non-parametric estimate of the distribution function or the robust regression on order statistics where a distribution will be assumed. Note that the KM approach is common in survival analysis with right censored data – in the environmental context data are left censored (so we change the signs of the values). These methods are based on replacing non-detected results with values generated to match the distribution of the rest of the data set. (2 marks env sci approaches, and then 3 for the description of 2 stats approaches; bookwork)
(c) Defifine what is meant by the accuracy, precision and bias of a measurement process and illustrate their defifinitions graphically. [4 MARKS]
Accuracy is the closeness of agreement between a measurement and the true value (random and systematic components). Bias is the difffference between the average of a series of measurements and the true value (systematic). Precision is the closeness of agreement between independent measurements. Precision does not relate to the true value. Figure to show the accurate and precise, accurate imprecise, inaccurate precise and inaccurate and imprecise combinations as in lecture notes. (3 marks for defifinitions, 1 for the fifigure; bookwork)
(d) Why do we consider both the expected value and the variance (or standard error) of an estimator in the context of sampling? [4 MARKS]
The expected value helps us assess whether the estimator is biased or not. The variance reflflects the precision of the estimation. The formula for the standard error,√ var¯x, contains n so if we specify how precise we want our interval to be then we can solve to fifind n and hence allows us to calculate the required sample sizes for a given precision.
(1 mark for bias, 1 mark for precision, 2 marks for sample size; discussed but not stated explicitly)
(e) Suppose that a soil scientist wishes to estimate the mean concentration of Chromium in soil. She has conducted a preliminary study, using 16 samples, giving a mean of 3.4396mg/kg and sample variance of 0.3013. She will calculate a 95% confifidence interval for the mean concentration and wishes the total width of the interval to be 0.2. Determine how many soil samples the scientist will need to collect to achieve this. Compare and contrast 2 difffferent random statistical sampling schemes that could be used to identify the soil sample locations in a large 1km2 area. [4 MARKS]
sample size calculation:
total width of 95% CI is 4p var/n so 0.2 = 4p 0.3013/n, and hence n = 0.3013/(0.2/4)2 = 120.52 so need 121 samples to have a CI of this width.
Difffferent sampling schemes could be used, so students could compare SRS and stratifified random sampling and systematic sampling.
They should comment on the effiffifficiency of the two methods they choose, if stratifified random sampling, they should indicate the basis of the strata, and for systematic,should comment on issues such as hidden periodicity and also having a random starting point.
(2 marks for the CI calculation and 2 marks for the comparison of the sampling schemes, seen similar)
- (a) Figure 1 shows the spatial locations of radioactive particles detected on Sandside beach near Dounreay, in Northern Scotland.
Explain why this dataset can be interpreted as a spatial point pattern data set.
In particular, say why this is not an example of geostatistical data. [3 MARKS]
A spatial point pattern is a realisation of a spatial point process, which models the spatial location of objects or events in space. Here, the locations of the particles and the pattern formed by these are of interest and considered random. This is difffferent from geostatistical data, where a spatially continuous phenomenon is measured in a fifinite number of locations, which are predetermined and hence fifixed.
(2 marks for explanation as to why spp, 2 marks for comparison with geostats; unseen context)
(b) A researcher analysing the data intends to estimate Ripley’s K-function for the Figure 1: Locations of radioactive particles detected on Sandside beach between 1999 and 2013.
pattern in Figure 1.
- What is the purpose of using the K-function when analysing a spatial point pattern and how is it interpreted? [3 MARKS]
The K-function is used to assess whether a point pattern exhibits complete spatial randomness. Its value is known for the homogeneous Poisson process and the estimated function for an empirical pattern is compared to that of the Poisson process. Values above Poisson are considered clustering, below as regularity/repulsion. Students don’t need to state the formula for K-function nor its value in the Poisson case.
(1 marks for correct explanation of purpose 2 marks for comparison with Poisson; bookwork)
- How would you expect the estimated K-function to look like for the pattern in Figure 1? In particular, sketch the shape of that K-function and comment on the suitability of the K-function in this case. [4 MARKS]
The pattern in Figure 1 looks rather non-homogeneous; K-function only defifined for homogeneous patterns hence unsuitable; estimated K-function likely to indicate clustering for all distances.
(2 marks for identifying that pattern is not homogeneous; 2 for describing function; advanced)
(c) In 4 difffferent lakes close to a nuclear reactor equally sized water samples have been taken repeatedly. For each of these samples the number of radioactive particles has been determined. The resulting data are plotted in Figure 2. The analyst dealing with these data is concerned that outliers are present in the data. The results of an analysis she has run in R are given below, where radioact.all refers to the data from all lakes and radioact.3 to those from lake 3 only.
Grubbs test for one outlier
EasyDue™ 支持PayPal, AliPay, WechatPay, Taobao等各种付款方式!
E-mail: firstname.lastname@example.org 微信:easydue