Statistics in Engineering
With examples in MATLAB^® and R

Andrew Metcalfe, David Green, Tony Greenfield, Mahayaudin Mansor, Andrew Smith and Jonathan Tuke.

Chapter 6 solutions to odd numbered exercises

Exercise 6.1
The following R code gives the answers.
> n1=35;xb1=27.052;s1=0.030;n2=20;xb2=27.047;s2=0.037
> m=(n1*xb1+n2*xb2)/(n1+n2)
> print(m)
[1] 27.05018
> v=((n1-1)*s1^2+(n2-1)*s2^2)/(n1+n2-2)
> print(v)
[1] 0.001068132
> s=sqrt(v)
> print(s)
[1] 0.03268229BR> > propout=pnorm(27,m,s)+(1-pnorm(27.1,m,s))
> print(propout)
> [1] 0.1260522
1. The weighted estimate of the population mean is \(27.050\)
2. The weighted estimate of the population variance is \(0.001068132\)
3. The weighted estimate of the population standard deviation is \(0.033\)
4. The estimated proportion outside the specification is \(0.126\).
  [The proportion outside specification is too high. Can the process standard deviation be reduced by improving procedures? Can the specification be relaxed? If neither apply is it worth the supplier investing in machinery with higher precision?]

Exercise 6.3
1. 1. \(c = \rho \times\sigma_{X} \times \sigma_{Y}\)
  2. The mean of \(X + a\) is \(\mu_{X} + a\) and similarly for \(Y\) and therefore \(X + a - (\mu_{X} +a) = X - \mu_{X}\) and similarly for \(Y\) and the covariance will be unchanged.
  3. We can choose \(a = -\mu_{X}\) and \(b = -\mu_{Y}\), and the covariance of mean adjusted variables is identical to the covariance of the original variables. The practical consequence is that we can assume variables have a mean of \(0\), without loss of generality, when proving results about covariances.
2. 1. \(\overline{Y} = \sum \dfrac{Y_{i}}{n}\).
  2. We can assume without loss of generality that the mean of \(X\) and the mean of \(Y\) is \(0\). Then the covariance of \(\overline{X}\) and \(\overline{Y}\) is
    \(E\left[\sum X_i \times \sum \dfrac{Y_{i}}{n^2} \right] = \dfrac{nc}{n^2} = \dfrac{c}{n}\) since \(X_{i}\) and \(Y_{j}\) are independent unless \(i = j\).
  3. The correlation between \(\overline{X}\) and \(\overline{Y}\) is \(\dfrac{\frac{c}{n}}{\frac{\sigma_{X}}{\sqrt{n}}} \times \frac{\sigma_{Y}}{\sqrt{n}} = \rho\)

Exercise 6.5

> time=seq(0,18,1.5)
> conc=c(21,19,15,12.5,10.5,9,7.8,7,6.2,5.7,5.4,5,4.7)
> par(mfrow=c(1,2))
> plot(time,conc)
> plot(time,log(conc))
> cor(time,conc)
[1] -0.9326413
> cor(time,log(conc))
[1] -0.9826006
1. The correlation is \(-0.933\). The concentration decreases with time, and there is a corresponding high, in absolute value, negative correlation. However, the plot shows a distinct tendency for the decrease in concentration over time to reduce in absolute terms.
  If we plot the logarithm of concentration against time the relationship is closer to linear and the correlation is \(-0.983\).
  It is likely that taking \(\log{(conc-L)}\) where \(L\) is some lower limit would give an even closer linear relationship (for example, taking \(L=4\) gives a correlation of \(-0.9989983\)).

Exercise 6.7

> CO=c(28,155,190,68,55,56,96,133,55,120,56,195,110,128,55,105)/10
> bp=c(5,1,8,9,10,11,39,40,13,57,15,60,73,81,22,95)/10
> cor(CO,bp)
[1] 0.3550973
> plot(bp,CO,xlab="CO",ylab="benzoapyrene") > cov(CO,bp)
[1] 5.511042
> cov(CO,bp)/(sd(CO)*sd(bp))
[1] 0.3550973

The correlation is positive but only slight. There is a discernible tendency for benzoapyrene to increase as CO increases, but there is considerable scatter about a linear relationship.

Exercise 6.9
\(F(x,y) = \displaystyle \int_0^y \int_0^x (u+v) du dv\)
\(\hspace{14mm}= \displaystyle \int_0^y \left[\frac{u^2}{2} + u v \right]_0^x dv\)
\(\hspace{14mm}= \displaystyle \int_0^y \left(\frac{x^2}{2} + x v \right)dv\)
\(\hspace{14mm}= \displaystyle \left[v \frac{x^2}{2} + x \frac{xv^2}{2} \right]_0^y\)
\(\hspace{14mm}= \displaystyle \frac{1}{2} \left(x^2y + xy^2\right), \quad 0 \le x,y, \le 1\)

Check that \(F(1,1) = 1\)
1. \(F(0.5,0.5) = 0.0625\)
2. \(1 - F(1,0.5) - F(0.5,1) +F(0.5,0.5)\)
  \(= 1- 0.375 -0.375 + 0.0625 = 0.3125\)
3. \(f(x) = \displaystyle \int_0^1 (x+y) dy\)
  \(\hspace{10mm} = \left[xy + \frac{y^2}{2}\right]_0^1\)
  \(\hspace{10mm} = \frac{1}{2} + x, \quad 0 \le x \le 1\)
4. \(f(x|y) = \displaystyle\frac{f(x,y)}{f(y)}\)
  \(\hspace{12mm}= \displaystyle\frac{x + y}{\frac{1}{2} + y}, \quad 0 \le x \le 1 \mbox{ for } 0 \le y \le 1\)
  
  Check by finding \(F(x|y) = \displaystyle\frac{\frac{x^2}{2} +xy}{\frac{1}{2} + y}\) and verifying that \(F(0|y) = 0\) and \(F(1|y) = 1\)
Exercise 6.15
Let \(X\) be the clearance: hole diameter - rivet diameter
The random selection justifies an assumption that hole and rivet diameters are independent.
Mean of \(X\) is \(2.35 - 2.30 = 0.05\)
Variance of \(X\) is \(0.1^2 + 0.05^2 = 0.0125\)
(covariance is \(O\) since hole and rivet diameters are independent)
Assume that the rivet will fit if the clearance is positive.
> mu=2.35-2.30
> v=0.1^2+0.05^2
> v
[1] 0.0125
> sig=sqrt(v)
> sig
[1] 0.1118034
> 1-pnorm(0,mu,sig)
[1] 0.6726396
The probability of a fit is \(0.673\). This is far too low for sustainable manufacturing.

Exercise 6.17
1. \(T\) has mean \(100 + 100 + 100 = 300\)
  \(T\) has variance \(30^2 + 30^2 + 30^2 = 2700\)
  The standard deviation is the quare root of the variance \(51.96\)
  The distribution is normal because \(T\) is a linear combination of normal random variables.
2. We need the probability that \(T\) exceeds \(200\)
  > mu=300
  > sig=sqrt(3*30^2)
  > sig
  [1] 51.96152
  > 1-pnorm(200,mu,sig)
  [1] 0.9728541
  The probability that three batteries will suffice is \(0.973\).
3. Let \(M\) be the mission length. We need the distribution of \(T - M\), which is normal with mean \(300 - 200 = 100\), and given the independence of \(T\) and \(M\), standard deviation \(= \sqrt{2700+50^2}\).
  We need the probability that \(T - M\) is positive.
  > mud=300-200
  > sigd=sqrt(2700+50^2)
  > 1-pnorm(0,mud,sigd)
  [1] 0.9172411
  The probability that three batteries will suffice is now \(0.917\). Notice that it is reduced because of the uncertainty about the duration of the mission.

Exercise 6.19
1. 1. \(W\) has mean \(5 \times 720 = 3600\) and, given independence, variance \(5 \times 75^2 = 28125\).
    It follows that the stadard deviation is \(167.7\)
  2. \(P(4000 < W)\)
    > mu=5*720
    > print(mu)
    [1] 3600
    > va=5*75^2
    > sda=sqrt(va)
    > print(sda)
    [1] 167.7051
    > 1-pnorm(4000,mu,sda)
    [1] 0.008536331
    The probability is \(0.00850\)
2. 1. The mean of \(W\) is \(5 \times 720 = 3600\)
    The variance of \(W\) is the sum of the variances for the \(5\) days plus twice the covariances between
    \(Day1\) and \(Day2\), ... , \(Day1\) and \(Day5\), and \(Day2\) and \(Day3\), ... ,\(Day5\), ... , \(Day4\) and \(Day5\):
    \(5 \times 75^2 + 2 \times 75^2 \times (0.8 + 0.8^2 + 0.8^3 + 0.8^4 + 0.8 + 0.8^2 + 0.8^3 + 0.8 + 0.8^2 + 0.8) = 101853\)
    The standard deviation is \(319.1\)
  2. > vb=5*75^2 + 2*75^2*(0.8+0.8^2+0.8^3+0.8^4+0.8+0.8^2+0.8^3+0.8+0.8^2+0.8)
    > vb
    [1] 101853
    > 1-pnorm(4000,mu,sqrt(vb))
    [1] 0.1050388
    The probability has increased to \(0.1050\). The increased variance increases the probability of large deviations from the mean.

Exercise 6.21
1. \(mean = \mu_{1} + \mu_{2} + \mu_{3}\)
  \(variance = \sigma_{1}^2 + \sigma_{2}^2 + \sigma_{3}^2 + 2(g_{12} + g_{13} + g_{23})\)
  \(sd = \sqrt{variance}\)
2. \(mean = a_{1} \times \mu_{1} + a_{2} \times \mu_{2} + a_{3} \times \mu_{3}\)
  \(variance = a_{1}^2 \times \sigma_{1}^2 + a_{2}^2 \times \sigma_{2}^2 + a_{3}^2 \times \sigma_{3}^2 + 2(a_{1} \times a_{2} \times g_{12} + a_{1} \times a_{3} \times g_{13} + a_{2} \times a_{3} \times g_{23})\)
  \(sd = \sqrt{variance}\)
3. \(mean = \sum_{i=1}^n a_{i} \mu_{i}\)
  The variance is neatly expressed as a double sum
  \(variance = \sum_{i=1}^n \sum_{j=1}^n a_{i} a_{j} g_{ij}\), where \(g_{ii}\) is the variance of \(X_{i}\). Check it gives the answer in b. when \(n = 3\).
  The standard deviation is the square root of the variance.

(To be continued)

Exercise 6.1

Exercise 6.3

Exercise 6.5

Exercise 6.7

Exercise 6.9

Exercise 6.15

Exercise 6.17

Exercise 6.19

Exercise 6.21