Statistics for Surveyors

Mean, standard deviation, variance, confidence intervals, and probability distributions in surveying.

Statistics provides the mathematical tools to analyze measurements, quantify uncertainty, and make informed decisions about data quality. For the surveyor, statistics is not an academic abstraction -- it is a daily working necessity. Every time you report a distance, an elevation, or a coordinate, you are implicitly making a statistical statement about the reliability of that value. Understanding the statistical foundations covered here allows you to determine the most probable value from repeated measurements, express the uncertainty of your results in a rigorous and defensible manner, detect and deal with outliers, and combine measurements of varying quality into a single best estimate.

"The theory of probability and statistics forms the foundation of the adjustment of observations, quality analysis, and much of the decision-making process in the geospatial sciences." -- Ghilani & Wolf, Elementary Surveying, 13th Ed., Ch. 2

The Arithmetic Mean#

The arithmetic mean is the most fundamental statistic in surveying. When a surveyor takes multiple measurements of the same quantity under the same conditions, the mean of those measurements is the most probable value (MPV).

For nn equally weighted measurements x1,x2,,xnx_1, x_2, \ldots, x_n, the arithmetic mean is:

xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}

The mean has two important optimality properties:

  1. The sum of the residuals equals zero: i=1nvi=0\sum_{i=1}^{n} v_i = 0, where vi=xixˉv_i = x_i - \bar{x}. The mean is the balancing point of the data.
  2. The sum of the squared residuals is a minimum: i=1nvi2=minimum\sum_{i=1}^{n} v_i^2 = \text{minimum}. No other value produces a smaller sum of squared deviations.

These properties are not arbitrary -- they follow directly from calculus. If you take the derivative of (xic)2\sum (x_i - c)^2 with respect to cc and set it to zero, you get c=xˉc = \bar{x}. This is the mathematical basis for the method of least squares, which extends the same principle to more complex adjustment problems.

Example

A surveyor measures a baseline distance five times:

MeasurementValue (m)
1285.332
2285.338
3285.335
4285.330
5285.340

xˉ=285.332+285.338+285.335+285.330+285.3405=285.335 m\bar{x} = \frac{285.332 + 285.338 + 285.335 + 285.330 + 285.340}{5} = 285.335 \text{ m}

Residuals#

A residual is the difference between an individual measurement and the mean:

vi=xixˉv_i = x_i - \bar{x}

Residuals are the foundation of statistical analysis in surveying. They tell us how individual measurements deviate from the most probable value. For the example above:

Measurementxix_ivi=xixˉv_i = x_i - \bar{x}vi2v_i^2
1285.3320.003-0.0030.0000090.000009
2285.338+0.003+0.0030.0000090.000009
3285.335+0.000\phantom{+}0.0000.0000000.000000
4285.3300.005-0.0050.0000250.000025
5285.340+0.005+0.0050.0000250.000025

Check: vi=0.003+0.003+0.0000.005+0.005=0.000\sum v_i = -0.003 + 0.003 + 0.000 - 0.005 + 0.005 = 0.000 (as expected).

The residuals themselves carry important information. Large residuals suggest either low precision in the measurement process or the possible presence of a blunder. The pattern of residuals -- whether they appear random or systematic -- can reveal problems with the measurement procedure that raw values alone might not expose.

Standard Deviation#

The standard deviation quantifies the spread (dispersion) of measurements around the mean. It is the most commonly used measure of precision in surveying.

Standard Deviation of a Single Measurement

For a sample of nn measurements, the standard deviation of a single measurement is:

s=i=1nvi2n1s = \sqrt{\frac{\sum_{i=1}^{n} v_i^2}{n - 1}}

The denominator is n1n - 1, not nn. This is known as Bessel's correction and accounts for the fact that we used one degree of freedom to compute the mean. Since the residuals are computed from xˉ\bar{x} rather than from the true (unknown) population mean μ\mu, dividing by nn would systematically underestimate the true variance. Using n1n - 1 produces an unbiased estimate of the population variance.

"The denominator n1n - 1 is the number of degrees of freedom and is equal to the number of observations minus the number of unknowns determined from them." -- Ghilani, Adjustment Computations, 6th Ed., Ch. 2

Continuing the example:

s=0.000009+0.000009+0.000000+0.000025+0.00002551=0.0000684=0.000017=0.004 ms = \sqrt{\frac{0.000009 + 0.000009 + 0.000000 + 0.000025 + 0.000025}{5 - 1}} = \sqrt{\frac{0.000068}{4}} = \sqrt{0.000017} = 0.004 \text{ m}

This tells us that any single measurement in this set is expected to be within about ±0.004\pm 0.004 m of the mean, roughly 68% of the time.

Standard Deviation of the Mean

The mean of nn measurements is more precise than any single measurement. The standard deviation of the mean is:

sxˉ=sns_{\bar{x}} = \frac{s}{\sqrt{n}}

For our example:

sxˉ=0.0045=0.0042.236=0.002 ms_{\bar{x}} = \frac{0.004}{\sqrt{5}} = \frac{0.004}{2.236} = 0.002 \text{ m}

This is a powerful result: by taking multiple measurements, the uncertainty of the reported value decreases proportionally to 1/n1/\sqrt{n}. Doubling the precision requires four times the measurements -- a relationship of diminishing returns that every surveyor should keep in mind when planning fieldwork.

The 68-95-99.7 Rule

For data that follows a normal distribution:

  • 68.3% of measurements fall within ±1σ\pm 1\sigma of the mean
  • 95.4% of measurements fall within ±2σ\pm 2\sigma of the mean
  • 99.7% of measurements fall within ±3σ\pm 3\sigma of the mean

A measurement lying beyond 3σ3\sigma is expected only 0.3% of the time -- roughly 3 in 1,000 observations. Values this extreme warrant investigation as potential blunders.

Variance#

The variance is the square of the standard deviation:

s2=i=1nvi2n1s^2 = \frac{\sum_{i=1}^{n} v_i^2}{n - 1}

While the standard deviation has the same units as the measurements (making it more intuitive to interpret), the variance has a critical mathematical property: variance is additive. If two independent error sources contribute variances s12s_1^2 and s22s_2^2, the total variance is:

stotal2=s12+s22s_{\text{total}}^2 = s_1^2 + s_2^2

This property makes variance the natural quantity to work with in error propagation. When computing a result from multiple measured quantities, each with its own uncertainty, the variance of the result is a function of the individual variances. Standard deviations, by contrast, do not simply add.

Covariance and Correlation

When two measured quantities are not independent -- for example, two angles measured from the same instrument setup -- the relationship between their errors matters. The covariance between two variables xx and yy is:

sxy=i=1n(xixˉ)(yiyˉ)n1s_{xy} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{n - 1}

The correlation coefficient normalizes covariance to a dimensionless value between 1-1 and +1+1:

r=sxysxsyr = \frac{s_{xy}}{s_x \cdot s_y}

A correlation of r=0r = 0 indicates independence; r=1|r| = 1 indicates perfect linear dependence. In surveying, correlated errors arise in situations such as GPS baselines sharing a common endpoint, angles observed from the same setup, and leveling circuits with shared benchmarks. Ignoring correlations when they exist leads to overly optimistic (or pessimistic) estimates of uncertainty in adjusted results.

The Normal Distribution#

Survey measurements subject to many small, independent, random influences tend to follow the normal (Gaussian) distribution. This is not an assumption made for convenience -- it is a consequence of the Central Limit Theorem, which states that the sum (or mean) of a large number of independent random variables tends toward a normal distribution, regardless of the underlying distribution of the individual variables.

The probability density function (PDF) of the normal distribution is:

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}

where μ\mu is the population mean and σ\sigma is the population standard deviation. The curve is symmetric about μ\mu, and the mean, median, and mode all coincide.

Standard Normal Distribution and zz-Scores

Any normal distribution can be transformed to the standard normal distribution (with μ=0\mu = 0 and σ=1\sigma = 1) using the zz-score transformation:

z=xμσz = \frac{x - \mu}{\sigma}

The zz-score tells you how many standard deviations a particular measurement lies from the mean. Standard normal tables (or software) then give the probability of observing a value at least that extreme. For example, z=1.96z = 1.96 corresponds to the 97.5th percentile, meaning 95% of the distribution lies between z=1.96z = -1.96 and z=+1.96z = +1.96.

"The normal distribution, or bell-shaped curve, applies when a large number of random errors are present. It describes the expected frequency of various sized errors and provides the basis for confidence interval estimation." -- Ghilani & Wolf, Elementary Surveying, 13th Ed., Ch. 2

Why Surveying Measurements Tend Toward Normality

A single distance measurement is affected by many small, independent sources of error: atmospheric refraction, instrument centering, pointing, reading, target centering, plumbing, temperature gradients, and more. Each source contributes a small random component. The Central Limit Theorem guarantees that the aggregate effect of these sources will be approximately normally distributed -- even if any individual source is not. This is why the normal distribution is the default model for random measurement errors in surveying.

Confidence Intervals#

A single number (the mean) without an associated uncertainty statement is incomplete. Confidence intervals provide a range within which the true value is expected to lie, at a specified level of probability.

Using the Normal Distribution (Large Samples)

For large samples (n30n \geq 30), the confidence interval for the mean is:

xˉ±zα/2sxˉ\bar{x} \pm z_{\alpha/2} \cdot s_{\bar{x}}

where zα/2z_{\alpha/2} is the critical value from the standard normal distribution for the desired confidence level. Common values:

Confidence Levelzα/2z_{\alpha/2}
68.3%1.000
90.0%1.645
95.0%1.960
99.0%2.576
99.7%3.000

The tt-Distribution for Small Samples

In surveying, we rarely have 30 or more repeated measurements of the same quantity. For small samples, the standard normal distribution underestimates the true uncertainty because the sample standard deviation ss is itself uncertain. The Student's tt-distribution accounts for this additional uncertainty.

The tt-distribution is similar to the standard normal but has heavier tails. The shape depends on the degrees of freedom ν=n1\nu = n - 1. As ν\nu \to \infty, the tt-distribution converges to the standard normal.

The confidence interval using the tt-distribution is:

xˉ±tα/2,νsxˉ\bar{x} \pm t_{\alpha/2, \nu} \cdot s_{\bar{x}}

For example, with n=5n = 5 measurements (ν=4\nu = 4) and a 95% confidence level, t0.025,4=2.776t_{0.025, 4} = 2.776. This is notably larger than the normal approximation of 1.9601.960, reflecting the greater uncertainty inherent in small samples.

Example (Continued)

For our baseline measurement with xˉ=285.335\bar{x} = 285.335 m, sxˉ=0.002s_{\bar{x}} = 0.002 m, n=5n = 5:

285.335±2.776×0.002=285.335±0.006 m285.335 \pm 2.776 \times 0.002 = 285.335 \pm 0.006 \text{ m}

We can state with 95% confidence that the true distance lies between 285.329285.329 m and 285.341285.341 m.

Surveying Convention

The surveying profession generally reports results at 95% confidence. This corresponds to approximately ±2σ\pm 2\sigma (more precisely, ±1.96σ\pm 1.96\sigma for large samples). When you see a survey accuracy reported as, say, ±0.01\pm 0.01 m, it typically implies a 95% confidence interval unless stated otherwise. Always specify the confidence level when reporting uncertainty.

Weighted Mean#

Not all measurements are created equal. A distance measured with a total station and a distance measured with a tape have different precisions. A GPS baseline observed for four hours is more reliable than one observed for fifteen minutes. When combining measurements of different quality, we need a weighted mean.

The weighted mean is:

xˉw=i=1nwixii=1nwi\bar{x}_w = \frac{\sum_{i=1}^{n} w_i \, x_i}{\sum_{i=1}^{n} w_i}

where the weight wiw_i assigned to each measurement is inversely proportional to the square of its standard deviation:

wi=1σi2w_i = \frac{1}{\sigma_i^2}

This weighting scheme gives more influence to precise measurements and less to imprecise ones. It can be shown that the weighted mean, computed in this way, is the minimum-variance unbiased estimator -- the most probable value considering the varying quality of the observations.

Example

A distance is measured by two methods:

MethodValue (m)σ\sigma (m)Weight w=1/σ2w = 1/\sigma^2
Total station500.3250.003111,111
GPS500.3310.00815,625

xˉw=(111,111)(500.325)+(15,625)(500.331)111,111+15,625=55,591,736.475+7,817,671.875126,736=500.326 m\bar{x}_w = \frac{(111{,}111)(500.325) + (15{,}625)(500.331)}{111{,}111 + 15{,}625} = \frac{55{,}591{,}736.475 + 7{,}817{,}671.875}{126{,}736} = 500.326 \text{ m}

The result is pulled strongly toward the total station measurement because it carries much greater weight. The GPS observation still contributes, but its influence is proportional to its precision.

The standard deviation of the weighted mean is:

sxˉw=1wi=1126,736=0.003 ms_{\bar{x}_w} = \frac{1}{\sqrt{\sum w_i}} = \frac{1}{\sqrt{126{,}736}} = 0.003 \text{ m}

"In combining quantities measured with differing precisions, the weighted mean gives the most probable value. The weights are inversely proportional to the variances." -- Ghilani, Adjustment Computations, 6th Ed., Ch. 3

Rejection of Outliers#

Occasionally, a measurement set will contain one or more values that are far removed from the rest. These outliers may result from blunders (reading errors, transposition mistakes, equipment malfunction) or from genuinely extreme random errors. The question is: should they be included in the analysis or rejected?

Chauvenet's Criterion

Chauvenet's criterion provides a systematic rule for outlier rejection. A measurement is rejected if the probability of obtaining a residual as large (or larger) as its residual is less than 1/(2n)1/(2n), where nn is the number of observations.

The procedure is:

  1. Compute xˉ\bar{x} and ss from all observations (including the suspect value).
  2. Compute the ratio vi/s|v_i| / s for the suspect observation.
  3. Look up the probability PP of exceeding this ratio in a normal distribution.
  4. If P<1/(2n)P < 1/(2n), reject the observation.

Equivalently, for a given nn, there is a maximum allowable ratio vi/s|v_i|/s:

nnMax v/s\|v\|/s
31.38
51.65
71.80
101.96
152.13
252.33
502.57

Cautions

Outlier rejection must be handled with care. There is an inherent tension between two types of error: rejecting a valid measurement (which discards real information and biases the result) and keeping a blunder (which corrupts the entire analysis). Several principles should guide your approach:

  • Investigate before rejecting. A measurement should not be discarded purely on statistical grounds. Check the field notes. Was there a known equipment issue? An unusual atmospheric condition? A recording error?
  • Never iterate blindly. After rejecting an outlier and recomputing the mean and standard deviation, another value may now appear to be an outlier. Repeated application of rejection criteria can strip away legitimate data, artificially inflating apparent precision.
  • Document everything. Any rejected observation should be noted in the project records with a reason for rejection. This is both good practice and, in many jurisdictions, a professional requirement.

"Suspected outliers should never be discarded simply because they fail a statistical test. The reasons for their rejection should be justified and documented." -- Ghilani & Wolf, Elementary Surveying, 13th Ed., Ch. 2

Key Takeaways#

  • The arithmetic mean is the most probable value of equally weighted measurements and minimizes the sum of squared residuals.
  • Residuals reveal the quality of individual measurements and provide the raw material for computing standard deviation.
  • Standard deviation quantifies precision; the standard deviation of the mean decreases by 1/n1/\sqrt{n}, providing a clear incentive for repeated measurement.
  • Variance is additive, making it the natural quantity for error propagation calculations.
  • Survey measurements tend toward the normal distribution by the Central Limit Theorem, validating the use of Gaussian statistics.
  • Confidence intervals express uncertainty at a stated probability level; surveying convention uses 95% confidence, and the tt-distribution should be used for small samples.
  • The weighted mean properly combines measurements of different precision by weighting inversely with variance.
  • Outlier rejection (e.g., Chauvenet's criterion) must be applied judiciously -- always investigate before rejecting, and document every decision.

References#

  • Ghilani, C. D., & Wolf, P. R. (2012). Elementary Surveying: An Introduction to Geomatics (13th ed.). Pearson.
  • Ghilani, C. D. (2018). Adjustment Computations: Spatial Data Analysis (6th ed.). John Wiley & Sons.
  • Mikhail, E. M., & Gracie, G. (1981). Analysis and Adjustment of Survey Measurements. Van Nostrand Reinhold.