That’s a pretty bold title. Why should you believe it? To understand why proximal soil sensing (PSS) will obsolete soil coring and laboratory analysis (specifically for digital soil mapping and precision agriculture), we must first understand how each arrives at its answers. And, we must consider how close those answers can ever get to the truth of what a plant experiences in situ.
First, recognize that any method of measurement that quantifies properties of a substance, be it PSS or laboratory analysis, involves a model of some sort that relates the output of the method back to the property being ascertained. With sensors, we typically call this mathematical model a calibration, and we often forget that most laboratory methods require some sort of calibration, too.
Also consider that laboratory results almost always involve some form of sensing, even if not proximal. Rather than sensing an attribute of the soil directly, most laboratory analyses sense the result of a physical or chemical operation performed on the soil and relate that back to the property of interest. A texture analysis may be the result of laser diffraction, or of sensing the mass of soil on a plate after a sieving operation, or of sensing the pressure in a fluid suspension of soil particles as they settle out. Combustible organic carbon is also sensed as a mass change, while titration methods for chemical species often rely on colorimetric change sensed by an optical device, and conductivity, pH, and redox potential are measured using electrical sensing methods.
Unlike direct sensing methods, however, laboratory methods additionally rest on an assumption of the validity of process models — models of how a series of physical and chemical manipulations involved in the laboratory method will isolate or reveal a specific property of the soil that can be read by a sensor. These processes are usually performed by a human; and humans aren’t highly repeatable.
Repeatability of laboratory methods is reduced, and uncertainty is added to the result, at every step of the way among the physical and chemical manipulations performed on a soil sample, before some result is sensed by a laboratory transducer, and inferences are made about the value of a soil parameter based on the process model and the mathematical model(s) implicit in the method. Sometimes — such as with certain colorimetric wet chemistry methods — the sensor on which the laboratory ultimately depends may itself also be a human, further reducing objectivity and repeatability.
To summarize so far, some important points to understand about PSS methods versus laboratory analysis methods include:
- Both methods require the application of models, and models are never perfect;
- Sensing results are based on mathematical models;
- Laboratory results rest on the validity of both mathematical models (via sensing) and process models (via physical and chemical separations and reactions), as well as the assumptions inherent in both;
- Process models carry their own errors and uncertainties, and depend significantly on human objectivity and repeatability,
- PSS does not rely significantly on human objectivity and repeatability.
By now, there should be no doubt that laboratory methods of soil analysis present inherent opportunities for both inaccuracy and lack of repeatability that are at least on par with proximal soil sensing. In another article, we have discussed the advantages of fusing the results of multiple orthogonal sensing modalities, which is another strength of PSS.
What we haven’t talked about yet is how profoundly the process of retrieving a soil sample to send to a laboratory alters both in situ and bulk soil parameters (if not also intrinsic parameters). The reality is that, no matter how accurate or inaccurate the laboratory method, gross and myriad misrepresentations of the actual state of the soil in the field begin with the soil coring and sampling process before even reaching the lab.
“legacy methods of retrieving and sampling soil specimens for laboratory analysis are limited in their potential for accuracy and for resolving fine detail with respect to actual in-field conditions”
Methods of retrieving physical samples of soil from below the ground surface include augering, digging soil pits, and direct push soil coring. Auguring obviously destroys any semblance of in situ structure, grain packing and orientation, compaction state, and clarity of horizonation. Pits are the best way to retrieve a sample in close to its in situ condition by careful incision into the exposed face, but pits are time-consuming, expensive, disruptive to other operations, and disturb a lot of surrounding soil. The time and cost required for spatially significant sampling by way of pits discourages their use for mapping large sites. Consequently, soil cores are the most widely used option to characterize the full soil profile because they offer an attractive balance between affordability, operational disruptions, and disturbance relative to in situ conditions. But even careful collection of soil cores impacts the representativeness of the samples recovered, and delineation of layers that end up in the sample tube.
Some problems that arise when using soil coring include: clogging of the cutting mouth, bridging of the sample tube, compaction ahead of the core barrel, and loss of soil upon retraction. All of these contribute to incomplete sample recovery. Recovery is the length of soil returned in the sample tube relative to the depth that the sampler was advanced into the ground.
In agricultural soil sampling, core recovery is almost always less than 100% and usually quite significant. When that happens, it is almost impossible to determine at what depth each interface between soil horizons was encountered. You can’t simply scale the length of the recovered core to the depth of the hole, because you don’t know if one layer compacted more than another, if soil temporarily bridged the sample tube before allowing soil in again, if you encountered a rock that you pushed to target depth or merely a few inches before it pushed aside, if sticky wet clays adhered to the inside of the tube while less cohesive soils pushed past them (offsetting and smearing interfaces), or if loose soil fell out through the mouth of the core barrel as you withdrew it from the ground. Only in the last case, if the length of soil recovered exactly matches the remaining depth of the hole, and the tube is free of sticky soils, can you have any confidence in reconstructing the depths at which interfaces were encountered. Even then, vertical mixing may have happened in the tube when some fell out the bottom and the rest shifted down. People can and do correct for incomplete recovery, but it is a painstaking process that requires highly trained workers to maintain focus day after day in often tough workin conditions, adding another layer of human subjectivity and variance to the outcome.
Once recovered, the process of extracting the soil sample from the barrel can cause further disturbance, which can be especially problematic when studying undisturbed soil layers. Then comes the act of compositing — mixing soils from over a discrete interval of depth to obtain ‘average’ properties over the interval (a misguided conception from the beginning), or as was necessary to provide the lab enough sample quantity to perform all the analyses you are ordering. A requirement of 500 to 1000 grams per sample is not uncommon to support a comprehensive suite of physical and chemical parameters. This corresponds to a whopping 20 to 40 cm (8 to 15 inches) of core assuming a midrange bulk density. Good luck obtaining a full suite of soil parameters from any layer that’s thinner than 6 inches. A PSS technology such as LandScan’s Digital Soil Core™ (DSC™), in contrast, produces over 1200 data points from seven orthogonal sensing modalities every centimeter.
Compositing and homogenization are two sub-processes that are matters of scale, and they affect not only the scale to which one can resolve important structural variations affecting the fate, transport, and root availability of water and nutrients, but also the representativeness with which intrinsic soil parameters are determined by the laboratory. As noted above, compositing is intended to aggregate enough sample to support the number of analyses to be conducted, each analysis requiring its own aliquot separate from the others. The choice to composite is also driven by cost. Cost is saved by analyzing a single composited sample from within a depth interval that outwardly appears to be all the ‘same’ soil. But we know, from evaluating our proximal sensing systems in soil pits, and with the fine spatial resolution that our DSC™ provides, that what often outwardly appears to be homogeneous is actually rarely so.
Homogenization is intended to thoroughly mix the composited sample, thus ensuring physical correlation among analyzed parameters, rather than each lab analysis deriving from a slightly different sub-sample. While coring and compositing can lose important information about variability that occurs among thin layers, homogenization can destroy important physical variability that occurs at even smaller scales, such as the scale of soil aggregates. For example, if a soil sample is homogenized using a mechanical grinder, the grinding process can cause the destruction of small soil aggregates, which can result in changes to the soil structure that re-distribute organic matter and other soil components (including microbiota) at a small scale. This act of redistribution can bring together soil components that were kept separated in situ, facilitating chemical reactions and transformations that had not occurred in the soil in its natural state. In other words, fine scale homogenization that breaks up soil aggregates can alter what we usually regard as intrinsic soil properties.
“We have literally observed stark heterogeneities in oxidative state at sub-millimeter aggregate scale using our sensors.”
In contrast, a proximal sensor riding through the soil in a profiling probe preserves vertical spatial variability, thin layering, and the accurate reference of soil parameters to depth. A narrow penetrometer senses bulk soil properties and structure in a much less disturbed state than core samples shipped to the lab. In addition, PSS can provide benefits that coring, compositing, and homogenization could never provide, such as observing the natural distribution of soil water within the structural arrangement revealed, and the actual in situ bulk electrical conductivity rather than that of saturated paste extract which is a far cry from what roots and microbiota actually experience in their environment.
Getting back to the subject of mathematical models, we explained earlier how PSS and laboratory results both arrive at a quantitative estimate of a soil parameter by applying some form of mathematical model. One of the simplest forms of a mathematical model is simple linear regression. You’ve no doubt seen linear regression before — it’s a statistical method that draws a line of ‘best fit’ through a scattering of points that pair one variable with another variable. A common use of linear regression is in instrument calibration — the equation of the line of best fit is used to predict a material property from the output of a sensor transducer, or the mass of a chemical present to the quantity of a reagent used to neutralize or transform it.
The math of linear regression minimizes the deviation of data from the model only in the direction of the independent variable, while assuming the dependent variable represents the absolute truth of the data. You may have noticed that if you swap the dependent and independent variables as input to a simple linear regression, you get a different model with the line of the best fit in a new place. This is because you’ve swapped to which variable all the model error gets ascribed.
When both the independent and dependent variable are themselves the result of some imperfect inference of truth, such as when comparing the results of PSS to the results of a laboratory method, neither order of variables can be said to produce the best model, and always in question is which variable deviates more from the truth. Methods such as orthogonal distance regression (ODR) and total least squares (TLS) regression have been invented to address this issue. These methods recognize that there may be error and uncertainty in both variables. And as we have shown above, significant differences should be expected due to the sampling process. Which do you think gets closer to what a plant experiences in situ?
Taking this to a practical level, consider based on the principle of composability, that most modern machine learning techniques, such as those that comprise the calibrations for PSS methods like VNIR reflectance spectroscopy, are interconnected networks of relatively simple statistical models like linear regression whose accuracy, versatility, and robustness arise from the complexity of their connections. The branches of decisions trees that comprise various flavors of random forests (RFs) are multi-variate regressors and classifiers with non-linear activation functions, as are perceptrons (i.e., artificial neurons) that are wired together to compose artificial neural networks (ANNs). Fundamentally, the challenge of what the model should consider truth, and how to ascribe the errors due to imperfect data, remain the same as in simple linear regression. But due to their complexity, and with careful attention to proper training techniques, sophisticated ML models can actually yield a more accurate estimation of the true value of an output parameter than the target data that was used to train them. In other words, even though laboratory soil results are known to include a significant amount of error and uncertainty relative to true soil conditions in the field, when used to train ML models that process a sufficient amount of highly repeatable sensor reading as input, the trained models can perform more accurately on the PSS data than laboratory analyses of soil cores.
Especially when provided with multivariate input from orthogonal sensing modalities, ML models can get closer to the truth than can poorly repeatable reference methods. This is because the ML models can learn patterns and relationships in the data that are not immediately obvious to a human observer, or that have been obscured by alterations owing to soil extraction, sampling, and laboratory processes. These patterns and relationships can help the ML model make more accurate predictions, even in the presence of noise and uncertainty, so long as target training data is not uniformly tainted by quantitative bias.
Using adeptly and knowledgeably applied techniques such as regularization, imputation of missing data, training data amplification, multi-target training, and ensemble methods, sophisticated ML models can be made to produce more accurate estimates of truth than the reference methods that generated their target training data. ML models applied to inputs from multiple sensors that all respond to variability in soil parameters in different and complementary ways can more accurately estimate the true value of a parameter in situ than can poorly repeatable laboratory methods applied to soil samples which have been inescapably disturbed from their in situ condition by the extraction and sampling process.
Granted, there are some parameters for which lab methods currently provide a more direct measure than does PSS. But not with the same spatial resolution, and not at the same in situ soil condition. Furthermore, parameter-specific sensor transducer technology will continue to advance, and the combination of ML with PSS enables the fusion of information from multiple sensing modalities into each estimate of a soil parameter (see the parable of the campfire and the blow dryer in this article) — something few laboratory techniques do.
- Proximal soil sensing (PSS), especially deployed in a narrow profiling penetrometer, collects information about soils in a state that is closest to what roots and microbiota experience, and the state that controls the transport and fate of water and nutrients;
- PSS disallows the types of errors and uncertainties associated with subjective and poorly repeatable human operations that can taint laboratory results;
- Even with theoretically perfect laboratory processes and calibrations, samples retrieved and sent to the lab will always present an inaccurate representation of conditions in the field due to retrieval-related disturbance, compositing, homogenization, and uncertain depth registration;
- Some amount of structural detail is always lost through sampling, and conversely, PSS can differentiate and quantify variability at very fine spatial scales;
- Multiple orthogonal sensors that each respond in a different way to a given soil parameter can be combined to deconvolve and correct for the effects of intertwined soil parameters on any single modality, thus increasing the accuracy achievable from PSS;
- Sophisticated ML approaches, properly applied, can overcome noise and uncertainty in training targets attributable to the reference methods used to develop the target data, to produce more accurate estimates of soil parameters than the reference methods.
We assert that legacy methods of retrieving and sampling soil specimens for laboratory analysis are limited in their potential for accuracy and for resolving fine detail with respect to actual in-field conditions, and will never practically or affordably provide an accurate and detailed enough representation of the distribution of soil and structural parameters in the field to fully inform precision agricultural management. We believe proximal soil sensing done well (as with LandScan’s 7-sensor Digital Soil Core™) is not constrained by the same limitations, and in concert with powerful ML, will soon obsolete soil sampling and laboratory analysis for producing digital soil maps to inform precision agricultural management.
This article was co-authored with LandScan CEO Daniel J. Rooney, PhD, and LandScan CPO Woody Wallace.