Least Squares Adjustment - Error Theory & Measurement Statistics | Survey Bible

Overview#

Least squares adjustment is the standard mathematical method for obtaining the most probable values from a set of redundant measurements. Whenever a surveyor takes more measurements than the minimum needed to determine the unknowns -- and good practice demands that they always do -- there will be small discrepancies among the observations. These discrepancies arise from the random errors inherent in every measurement. Least squares provides a rigorous, mathematically optimal procedure for resolving those discrepancies: it finds the set of adjusted values that minimizes the sum of the weighted squares of the residuals.

What sets least squares apart from simpler adjustment methods is that it simultaneously solves for all unknowns using all measurements, properly accounts for the relative quality of different observations through weighting, and produces a complete set of statistical quality measures for every adjusted quantity. The result is not just a set of "best" coordinates or elevations, but a full accounting of how good those values are -- standard deviations, error ellipses, confidence regions, and diagnostics for detecting blunders.

"The method of least squares is the most rigorous adjustment procedure available to surveyors and is applicable to any measurement situation." -- Ghilani, Adjustment Computations: Spatial Data Analysis (6th Ed.), Ch. 1, p. 3

Every modern commercial survey adjustment package -- whether for traverses, level networks, GNSS baselines, or combined networks -- uses least squares as its computational engine. Surveyors who understand the method can interpret their software's output with confidence: they know what the residuals mean, what the standard deviations represent, and when a result should be questioned.

The Principle#

The fundamental premise of least squares is straightforward. Given a set of $n$ measurements used to determine $u$ unknowns, where $n > u$ , the system is overdetermined. No single set of unknown values will satisfy all observations exactly. Each measurement will have a residual $v_i$ -- the difference between the adjusted (computed) value and the observed value. Least squares requires that the unknowns be determined so that the sum of the weighted squares of these residuals is minimized:

$\Phi = \sum_{i=1}^{n} w_i v_i^2 = \mathbf{v}^T \mathbf{W} \mathbf{v} = \text{minimum}$

where $v_i$ is the residual for the $i$ -th observation, $w_i$ is its weight, $\mathbf{v}$ is the column vector of all residuals, and $\mathbf{W}$ is the diagonal weight matrix (or, more generally, the full weight matrix when observations are correlated).

The weight $w_i$ assigned to each measurement reflects its relative precision. More precise measurements receive larger weights and therefore exert more influence on the adjusted values. The standard formulation sets the weight inversely proportional to the variance of the observation:

$w_i = \frac{\sigma_0^2}{\sigma_i^2}$

where $\sigma_i^2$ is the variance of the $i$ -th observation and $\sigma_0^2$ is an arbitrary reference variance (often set to 1). This ensures that a measurement with half the standard deviation receives four times the weight -- exactly as it should.

"The principle of least squares states that the most probable values of a set of unknown quantities, upon which observations have been made, are obtained by making the sum of the weighted squares of the residuals a minimum." -- Ghilani & Wolf, Elementary Surveying: An Introduction to Geomatics (13th Ed.), Ch. 3, p. 62

This minimization principle is not an arbitrary choice. Under the assumption that measurement errors follow a normal (Gaussian) distribution, the least squares solution yields the maximum likelihood estimate of the unknowns -- the values that are most consistent with the observed data given the known precision of each measurement.

Redundancy#

The concept of redundancy is central to least squares adjustment. Redundancy is the number of measurements in excess of the minimum required to compute a unique solution. It is expressed as the degrees of freedom:

$r = n - u$

where $n$ is the number of observations and $u$ is the number of unknowns.

If $r = 0$ , there are exactly as many measurements as unknowns. The system has a unique solution -- the measurements completely determine the unknowns with nothing left over. In this case, all residuals are zero, and there is no way to assess the quality of the result or detect errors. This is the situation in, for example, an unadjusted closed traverse with just enough angle and distance measurements.

If $r > 0$ , the system is overdetermined. The extra measurements create redundancy, and redundancy is valuable for two reasons:

Better values. The adjusted unknowns are more reliable than they would be from any minimum-data subset of the observations. The redundant measurements provide independent checks that pull the solution toward the truth.
Quality assessment. With $r > 0$ , the residuals are generally non-zero, and their magnitudes provide a direct measure of internal consistency. Statistical tests can be applied to detect blunders, evaluate the overall fit, and compute uncertainties for every adjusted quantity.

As a rule, more redundancy produces better results. A level network with 20 observations and 10 unknowns ( $r = 10$ ) will yield more reliable adjusted elevations and tighter standard deviations than one with 12 observations and 10 unknowns ( $r = 2$ ), assuming comparable measurement quality. This is why professional surveying standards routinely require redundant measurements.

Observation Equations#

Each measurement in a survey produces one observation equation -- a mathematical relationship between the observed value, the unknown quantities, and the residual. In a general survey network, these relationships are typically nonlinear functions of the unknown coordinates. For example, the distance between two points is:

$d = \sqrt{(\Delta X)^2 + (\Delta Y)^2}$

which is a nonlinear function of the four coordinate unknowns.

Least squares adjustment handles nonlinearity through linearization. The nonlinear observation equations are expanded in a Taylor series about approximate values of the unknowns, and only the first-order (linear) terms are retained. This produces a set of linear equations in the corrections $\boldsymbol{\delta}$ to the approximate values.

The linearized system for all $n$ observations takes the compact matrix form:

$\mathbf{A}\boldsymbol{\delta} + \mathbf{v} = \mathbf{l}$

where:

$\mathbf{A}$ is the $n \times u$ design matrix (also called the coefficient matrix or Jacobian), whose elements are the partial derivatives of the observation equations with respect to the unknowns, evaluated at the approximate values
$\boldsymbol{\delta}$ is the $u \times 1$ vector of corrections to the approximate unknowns
$\mathbf{v}$ is the $n \times 1$ residual vector
$\mathbf{l}$ is the $n \times 1$ vector of observed minus computed values (the discrepancies between the actual observations and the values computed from the approximate unknowns)

The design matrix $\mathbf{A}$ encodes the geometry of the survey network. Its structure determines how each measurement contributes to the determination of each unknown. A well-designed network produces a well-conditioned design matrix with strong geometric connections between the observations and the unknowns.

"The elements of the A matrix are the partial derivatives of each observation equation with respect to each unknown, evaluated at the approximate coordinates." -- Ghilani, Adjustment Computations: Spatial Data Analysis (6th Ed.), Ch. 11, p. 213

Because the linearization is an approximation, the solution is iterative. After solving for $\boldsymbol{\delta}$ , the approximate values are updated, new values of $\mathbf{A}$ and $\mathbf{l}$ are computed, and the process repeats until $\boldsymbol{\delta}$ becomes negligibly small (convergence). For well-conditioned problems with reasonable approximate values, convergence typically occurs in two to four iterations.

Normal Equations#

The minimization condition $\Phi = \mathbf{v}^T\mathbf{W}\mathbf{v} = \text{minimum}$ is enforced by substituting $\mathbf{v} = \mathbf{l} - \mathbf{A}\boldsymbol{\delta}$ into the expression for $\Phi$ , expanding, and taking partial derivatives with respect to each element of $\boldsymbol{\delta}$ . Setting these partial derivatives to zero yields the normal equations:

$\mathbf{A}^T \mathbf{W} \mathbf{A} \, \boldsymbol{\delta} = \mathbf{A}^T \mathbf{W} \mathbf{l}$

This is conventionally written in shorthand as:

$\mathbf{N}\boldsymbol{\delta} = \mathbf{t}$

where $\mathbf{N} = \mathbf{A}^T\mathbf{W}\mathbf{A}$ is the $u \times u$ normal equation matrix and $\mathbf{t} = \mathbf{A}^T\mathbf{W}\mathbf{l}$ is the $u \times 1$ constant vector.

The solution for the corrections is obtained by inverting the normal equation matrix:

$\boldsymbol{\delta} = \mathbf{N}^{-1}\mathbf{t}$

The matrix $\mathbf{N}$ is symmetric and positive-definite (provided the network geometry is sufficient to determine all unknowns), which guarantees that a unique solution exists. The inverse $\mathbf{N}^{-1}$ is not merely a computational byproduct -- it is the cofactor matrix of the adjusted unknowns, and it plays a central role in the statistical analysis that follows.

A Simple Example#

Consider a small leveling network with three benchmarks: BM-A (known elevation $100.000$ m), BM-B, and BM-C. Five leveling runs produce the following observed elevation differences:

Run	From	To	Observed $\Delta h$ (m)	Distance (km)
1	BM-A	BM-B	$+12.018$	2.0
2	BM-A	BM-C	$+25.511$	3.0
3	BM-B	BM-C	$+13.502$	1.5
4	BM-A	BM-B	$+12.014$	2.5
5	BM-B	BM-C	$+13.496$	2.0

There are $n = 5$ observations and $u = 2$ unknowns (the elevations of BM-B and BM-C), giving $r = 5 - 2 = 3$ degrees of freedom.

Setting Up the Observation Equations

Let $H_B$ and $H_C$ denote the unknown elevations. Each observation equation takes the form: observed value = function of unknowns + residual.

Run 1: $H_B - H_A = 12.018 + v_1 \implies H_B = 112.018 + v_1$
Run 2: $H_C - H_A = 25.511 + v_2 \implies H_C = 125.511 + v_2$
Run 3: $H_C - H_B = 13.502 + v_3$
Run 4: $H_B - H_A = 12.014 + v_4 \implies H_B = 112.014 + v_4$
Run 5: $H_C - H_B = 13.496 + v_5$

Since these equations are already linear, no linearization is needed. Set weights inversely proportional to leveling distance (a standard approach, since variance in leveling is proportional to distance):

$w_1 = \frac{1}{2.0} = 0.500, \quad w_2 = \frac{1}{3.0} = 0.333, \quad w_3 = \frac{1}{1.5} = 0.667, \quad w_4 = \frac{1}{2.5} = 0.400, \quad w_5 = \frac{1}{2.0} = 0.500$

Forming the Normal Equations

The design matrix and observation vector (using unknowns $H_B$ and $H_C$ directly) are:

$\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ -1 & 1 \\ 1 & 0 \\ -1 & 1 \end{bmatrix}, \qquad \mathbf{l} = \begin{bmatrix} 112.018 \\ 125.511 \\ 13.502 \\ 112.014 \\ 13.496 \end{bmatrix}$

Computing $\mathbf{N} = \mathbf{A}^T\mathbf{W}\mathbf{A}$ :

$\mathbf{N} = \begin{bmatrix} 0.500 + 0.667 + 0.400 + 0.500 & -0.667 - 0.500 \\ -0.667 - 0.500 & 0.333 + 0.667 + 0.500 \end{bmatrix} = \begin{bmatrix} 2.067 & -1.167 \\ -1.167 & 1.500 \end{bmatrix}$

Computing $\mathbf{t} = \mathbf{A}^T\mathbf{W}\mathbf{l}$ :

$\mathbf{t} = \begin{bmatrix} 0.500(112.018) - 0.667(13.502) + 0.400(112.014) - 0.500(13.496) \\ 0.333(125.511) + 0.667(13.502) + 0.500(13.496) \end{bmatrix} = \begin{bmatrix} 93.817 \\ 57.549 \end{bmatrix}$

Solving

$\boldsymbol{\delta} = \mathbf{N}^{-1}\mathbf{t}$

Inverting the $2 \times 2$ matrix and solving yields:

$H_B = 112.016 \text{ m}, \qquad H_C = 125.514 \text{ m}$

The residuals are:

Run	Observed	Adjusted	Residual $v_i$ (mm)
1	12.018	12.016	$-2.0$
2	25.511	25.514	$+3.0$
3	13.502	13.498	$-4.0$
4	12.014	12.016	$+2.0$
5	13.496	13.498	$+2.0$

Note that the residuals are small and roughly balanced in sign -- a healthy indicator that the observations are consistent and free of blunders.

Statistical Output#

Beyond the adjusted values themselves, least squares produces a rich set of statistical information that quantifies the quality of every result.

Reference Variance

The reference variance (also called the variance factor or variance of unit weight) is:

$\hat{\sigma}_0^2 = \frac{\mathbf{v}^T\mathbf{W}\mathbf{v}}{r}$

This is the weighted sum of squared residuals divided by the degrees of freedom. It serves as an overall indicator of how well the observations agree with each other. If the a priori weights accurately reflect the true measurement precision, then $\hat{\sigma}_0^2$ should be close to 1. A value significantly greater than 1 suggests the measurements are less precise than assumed (or that a blunder is present); a value significantly less than 1 suggests the precision was underestimated.

"The reference variance is an overall indicator of fit. It provides a single number that summarizes how well the observations are explained by the adjustment model." -- Ghilani, Adjustment Computations: Spatial Data Analysis (6th Ed.), Ch. 7, p. 137

Cofactor and Covariance Matrices

The cofactor matrix of the adjusted unknowns is the inverse of the normal equation matrix:

$\mathbf{Q} = \mathbf{N}^{-1} = (\mathbf{A}^T\mathbf{W}\mathbf{A})^{-1}$

The covariance matrix is obtained by scaling the cofactor matrix by the reference variance:

$\boldsymbol{\Sigma} = \hat{\sigma}_0^2 \, \mathbf{Q}$

The diagonal elements of $\boldsymbol{\Sigma}$ are the variances of the adjusted unknowns. Their square roots give the standard deviations:

$\sigma_{H_B} = \sqrt{\Sigma_{11}}, \qquad \sigma_{H_C} = \sqrt{\Sigma_{22}}$

These standard deviations are the primary quality measures reported for adjusted quantities. They tell the surveyor how precise the adjusted values are, given the quality and geometry of the measurements.

Error Ellipses

The off-diagonal elements of $\boldsymbol{\Sigma}$ express the correlation between adjusted unknowns. For two-dimensional horizontal positions, the $2 \times 2$ covariance submatrix for a station's easting and northing defines an error ellipse -- an elliptical confidence region around the adjusted position. The semi-major axis ( $a$ ), semi-minor axis ( $b$ ), and orientation ( $\theta$ ) of the standard error ellipse are derived from the eigenvalues and eigenvectors of the covariance submatrix:

$a = \hat{\sigma}_0 \sqrt{\lambda_{\max}}, \qquad b = \hat{\sigma}_0 \sqrt{\lambda_{\min}}$

where $\lambda_{\max}$ and $\lambda_{\min}$ are the larger and smaller eigenvalues, respectively. Error ellipses provide a visual and intuitive representation of positional uncertainty that captures both magnitude and directional dependence.

Practical Applications#

Traverse Adjustment

Traverses are among the most common survey configurations adjusted by least squares. A closed or connecting traverse with angles and distances produces an overdetermined system when redundant observations are present. Least squares adjusts all angles and distances simultaneously, distributing the misclosure in a way that is consistent with the precision of each measurement. Unlike the compass rule or transit rule, which distribute the closure error based on simple geometric proportions, least squares uses the actual measurement weights to produce a statistically optimal result.

Level Network Adjustment

Differential leveling networks -- particularly those with multiple loops and connections to benchmarks -- benefit greatly from least squares. The method simultaneously adjusts all elevation differences, properly weighting each by its section length (or number of setups), and produces adjusted elevations with full statistical quality measures.

GNSS Network Adjustment

Modern GNSS surveys produce baseline vectors between stations. When multiple baselines form a network with loops and connections to control points, least squares adjusts all baselines simultaneously. The three-dimensional covariance matrices of the individual baselines propagate through the adjustment to produce positional uncertainties for every station. This is particularly important because GNSS baseline precision is direction-dependent -- the covariance matrix captures this anisotropy, and the adjustment propagates it correctly.

Control Network Adjustment

Large-scale control networks combining terrestrial angles, distances, GNSS baselines, and astronomical observations represent the most general application of least squares. The method accommodates any combination of observation types, each with its own precision characteristics, in a single unified adjustment. This flexibility is one of the method's greatest practical strengths.

"The advantage of least squares over other adjustment methods is that it simultaneously adjusts all observations, provides the most probable values, furnishes the precision of the adjusted quantities, and enables detection of blunders." -- Ghilani & Wolf, Elementary Surveying: An Introduction to Geomatics (13th Ed.), Ch. 16, p. 470

Advantages Over Other Methods#

For several decades, traverse adjustment in practice relied on simpler methods: the compass rule (Bowditch), the transit rule, and Crandall's method. These approaches are computationally convenient and still appear on licensing exams, but they are limited compared to least squares.

Simultaneous Use of All Measurements

The compass rule distributes a traverse's linear misclosure proportionally to traverse leg length. The transit rule distributes it proportionally to latitude and departure components. Both methods apply corrections sequentially and do not consider the full geometry of the network. Least squares, by contrast, uses all $n$ observations simultaneously to determine all $u$ unknowns, exploiting every geometric relationship in the data.

Proper Weighting

The compass rule implicitly assumes all measurements have equal relative precision. In reality, a 30-m distance measured with a steel tape and a 500-m distance measured with an EDM have very different precisions. Least squares assigns each observation a weight that reflects its actual quality, ensuring that the best measurements have the most influence on the result.

Statistical Quality Measures

Neither the compass rule nor the transit rule produces standard deviations for the adjusted coordinates. There is no cofactor matrix, no error ellipses, no reference variance. The surveyor gets adjusted coordinates but no rigorous way to assess their quality. Least squares provides a complete statistical characterization of every adjusted quantity.

Arbitrary Network Geometry

The compass and transit rules apply only to traverses. They cannot handle networks with multiple loops, branches, or mixed observation types. Least squares handles any network geometry -- open, closed, branching, or fully interconnected -- and any combination of angles, distances, directions, azimuths, and GNSS baselines.

Blunder Detection

The residuals from a least squares adjustment provide a powerful tool for detecting blunders. An observation with an unusually large residual -- one that fails a statistical test such as the tau criterion or a data-snooping test -- is flagged as a potential blunder. This capability is unavailable with the simpler methods, which distribute errors mechanically without any diagnostic feedback.

"Residual analysis after a least squares adjustment is the primary tool for detecting blunders in survey observations." -- Ghilani, Adjustment Computations: Spatial Data Analysis (6th Ed.), Ch. 21, p. 449

Key Takeaways#

Least squares adjustment minimizes $\mathbf{v}^T\mathbf{W}\mathbf{v}$ to find the most probable values from redundant measurements. It is the standard of practice for survey adjustment.
Redundancy ( $r = n - u > 0$ ) is essential. Without redundant observations, there is no way to assess quality or detect errors. More redundancy yields better results.
The method requires forming observation equations, linearizing them into the system $\mathbf{A}\boldsymbol{\delta} + \mathbf{v} = \mathbf{l}$ , and solving the normal equations $\mathbf{N}\boldsymbol{\delta} = \mathbf{t}$ .
The inverse of the normal equation matrix ( $\mathbf{N}^{-1}$ ) is the cofactor matrix, which, scaled by the reference variance, gives the covariance matrix of adjusted unknowns -- the source of standard deviations and error ellipses.
The reference variance $\hat{\sigma}_0^2$ is an overall goodness-of-fit measure. Values near 1 indicate the assumed weights are realistic.
Least squares surpasses simpler methods (compass rule, transit rule) by using all measurements simultaneously, properly weighting observations, providing full statistical output, handling any network geometry, and enabling blunder detection.
Modern survey software performs least squares internally. Understanding the output -- residuals, standard deviations, error ellipses, and statistical test results -- is essential for every practicing surveyor.

References#

Ghilani, C.D. Adjustment Computations: Spatial Data Analysis (6th Ed.). Wiley, 2017. Chapters 1, 7, 10--12, 21.
Ghilani, C.D. & Wolf, P.R. Elementary Surveying: An Introduction to Geomatics (13th Ed.). Pearson, 2012. Chapters 3, 15--16.
Mikhail, E.M. & Gracie, G. Analysis and Adjustment of Survey Measurements. Van Nostrand Reinhold, 1981.
Leick, A., Rapoport, L. & Tatarnikov, D. GPS Satellite Surveying (4th Ed.). Wiley, 2015. Chapters 4--5.
National Geodetic Survey. "Guidelines for Establishing GPS-Derived Ellipsoid Heights." NOAA Technical Memorandum NOS NGS-58, 2008.