Lab 08: Statistics & Random Sampling

Lab 08: Statistics & Random Sampling#

Examples (by instructor)#

In this lab, you’ll apply the statistics tools from Chapter 13 to real engineering data: summarizing datasets with descriptive statistics, working with probability distributions (uniform and normal), computing probabilities using the CDF and PPF, and building confidence intervals for the mean.

Example 1: Descriptive Statistics#

Given 10 batch yields (%): 82.1, 85.4, 79.8, 83.7, 86.2, 81.0, 84.5, 80.3, 85.9, 82.8

Compute mean, median, sample std, and the fraction below 81%.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

yields = np.array([82.1, 85.4, 79.8, 83.7, 86.2, 81.0, 84.5, 80.3, 85.9, 82.8])

Example 2: Normal Distribution — CDF and PPF#

Reactor temperature follows \(\mathcal{N}(\mu=350\text{ K},\; \sigma=5\text{ K})\).

What fraction of the time is temperature below 343 K?
What temperature is exceeded only 5% of the time?

Example 3: Confidence Interval for the Mean#

Five repeat measurements of outlet concentration (mol/L): 2.31, 2.28, 2.35, 2.29, 2.33

Compute the 95% confidence interval using the \(t\)-distribution.

C = np.array([2.31, 2.28, 2.35, 2.29, 2.33])

Warm-Up: Syntax Practice#

Short exercises to get comfortable with the key statistics functions before the main problems.

Exercise 1 — Descriptive statistics.

Given the following 8 reactor conversion measurements (%), compute the mean, median, sample standard deviation, and IQR. Then print whether the mean and median are close (within 1%).

78.2, 81.5, 76.9, 83.1, 79.4, 80.7, 77.8, 82.3

import numpy as np

X = np.array([78.2, 81.5, 76.9, 83.1, 79.4, 80.7, 77.8, 82.3])

mean   = ___                          # mean
median = ___                          # median
std    = ___                          # sample std dev (ddof=1)
iqr    = ___ - ___                    # Q3 - Q1 (use np.percentile)

print(f"Mean   = {mean:.2f} %")
print(f"Median = {median:.2f} %")
print(f"Std    = {std:.2f} %")
print(f"IQR    = {iqr:.2f} %")

if abs(mean - median) < 1.0:
    print("Mean and median are close — distribution is roughly symmetric.")
else:
    print("Mean and median differ — distribution may be skewed.")

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 8
      6 median = ___                          # median
      7 std    = ___                          # sample std dev (ddof=1)
----> 8 iqr    = ___ - ___                    # Q3 - Q1 (use np.percentile)
     10 print(f"Mean   = {mean:.2f} %")
     11 print(f"Median = {median:.2f} %")

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Exercise 2 — Random sampling with a seed.

Generate 1000 samples from a Normal distribution with mean 350 K and standard deviation 15 K using np.random.default_rng(seed=7). Then:

Print the sample mean and sample standard deviation
Print what fraction of samples exceed 375 K

import numpy as np

rng = np.random.default_rng(seed=___)   # use seed=7

samples = rng.normal(___, ___, ___)     # loc=350, scale=15, size=1000

print(f"Sample mean : {np.mean(samples):.2f} K")
print(f"Sample std  : {np.std(samples, ddof=1):.2f} K")

frac_above = ___                        # fraction of samples > 375
print(f"Fraction > 375 K : {frac_above:.4f}")

Exercise 3 — CDF and PPF.

Flow rate through a valve follows a Normal distribution \(\mathcal{N}(\mu=50,\, \sigma=4)\) L/min.

Use scipy.stats.norm to answer:

What fraction of the time is flow rate below 45 L/min? (cdf)
What flow rate is exceeded 90% of the time? (ppf)

from scipy import stats

dist_flow = stats.norm(loc=___, scale=___)   # mu=50, sigma=4

# 1. P(flow < 45)
p_below_45 = ___                             # use .cdf()
print(f"P(flow < 45 L/min) = {p_below_45:.4f}")

# 2. Flow rate exceeded 90% of the time  →  10th percentile
flow_p10 = ___                               # use .ppf(0.10)
print(f"Flow exceeded 90% of the time = {flow_p10:.2f} L/min")

Practice Problems (by students)#

Problem 1: Pressure Relief Valve — Distribution Analysis#

A safety engineer tests 25 pressure relief valves. The opening pressure (bar) for each valve is recorded:

#	Pressure (bar)
1–5	10.2, 9.8, 10.5, 10.1, 9.6
6–10	10.4, 9.9, 10.3, 10.7, 9.7
11–15	10.0, 10.6, 9.5, 10.2, 10.4
16–20	10.8, 9.9, 10.1, 10.3, 9.8
21–25	10.5, 10.0, 9.7, 10.6, 10.2

Valves are rejected if the opening pressure falls outside [9.8, 10.6] bar.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

P = np.array([10.2, 9.8, 10.5, 10.1, 9.6,
              10.4, 9.9, 10.3, 10.7, 9.7,
              10.0, 10.6, 9.5, 10.2, 10.4,
              10.8, 9.9, 10.1, 10.3, 9.8,
              10.5, 10.0, 9.7, 10.6, 10.2])

lo_spec, hi_spec = 9.8, 10.6

(a) Compute mean, median, std (sample), min, max, and IQR. Print all values.

# ── (a) Descriptive statistics ────────────────────────────────────────────────

print("(a) Descriptive Statistics")

(b) How many valves are rejected? What is the rejection rate (%)? Print the values of the rejected valves.

# ── (b) Rejection analysis ────────────────────────────────────────────────────

print(f"\n(b) Rejection Analysis")

(c) Assume the opening pressure follows a Normal distribution with the sample mean and std you computed. Using scipy.stats.norm, compute the theoretical probability that a valve falls outside [9.8, 10.6] bar. Compare to the observed rejection rate.

# ── (c) Theoretical rejection probability ─────────────────────────────────────

print(f"\n(c) Theoretical vs Observed Rejection Rate")

(d) Plot a histogram of the data with:

A vertical line at the mean
Vertical dashed lines at the spec limits (9.8 and 10.6 bar)

# ── (d) Histogram plot ────────────────────────────────────────────────────────

Problem 2: Random Sampling and Distribution Analysis#

A flow meter has a known measurement error that follows a Uniform distribution over \([-0.5,\; +0.5]\) L/min. The true flow rate is 100 L/min, so measured values = \(100 + \text{Uniform}(-0.5, 0.5)\).

(a) Using np.random.default_rng(seed=21), generate 500 simulated measurements. Compute the sample mean and sample std. Compare them to the theoretical mean and std of the Uniform distribution.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

true_flow = 100.0
lo_spec, hi_spec = 99.7, 100.3

# ── (a) Uniform instrument: 500 samples ──────────────────────────────────────

(b) What fraction of measurements fall outside \([99.7,\; 100.3]\) L/min? Compute it both from the simulated samples and theoretically using scipy.stats.uniform.

# ── (b) Fraction outside spec — Uniform ──────────────────────────────────────

(c) Now suppose the flow meter is replaced with a better instrument whose error follows a Normal distribution \(\mathcal{N}(0,\; 0.2^2)\). Using the same seed and 500 samples, what fraction of measurements now fall outside \([99.7,\; 100.3]\) L/min? Again compute both from simulation and from the Normal CDF.

# ── (c) Normal instrument: same seed, 500 samples ────────────────────────────

(d) Plot side-by-side histograms of the simulated measurements for both instruments on the same x-axis range. Add vertical lines at 99.7 and 100.3. Which instrument is more likely to give a reading within spec?

# ── (d) Side-by-side histograms ───────────────────────────────────────────────