Pierre Gy created, and single-handedly developed the Theory of Sampling (TOS) over a period of 25 years (1950-1975), and hereby initiated a new scientific discipline that has been growing significantly ever since. The 8th World Conference on Sampling and Blending will be held in Perth from 9-11 May 2017 and feature a session dedicated to his legacy.

Pierre Gy began his career in 1946 in French Equatorial Africa (Congo), working on the small M’Fouati lead mine as the mineral process engineer, where he was in charge of the processing plant and associated laboratories. In 1947, the Paris-based head office asked Pierre to estimate the grade of a 200 000 t, apparently low-grade stockpile that had been dormant since 1940. He soon realised that fragments on the stockpile varied from several tonnes to fine dust, he knew nothing about sampling, there was no meaningful literature available and that he would have to improvise. This request planted the seed of lifelong interest in his mind.

On his return to Paris in 1949, his work in a mineral processing laboratory also constantly brought issues of sampling to his attention, in particular the question of ‘the minimum sample weight necessary to achieve a certain degree of reliability’ (Gy, 2004a). In his search through the limited available literature, Gy found that Brunton (1895) claimed that the minimum sample weight was proportional to the cube of the top particle size, while Richards (1908) suggested that the square of the particle size was important. Brunton based his ideas on the ‘constant proportionality factor’, meaning that for samples with different fragment top sizes, the same number of fragments was required, but Gy (2004a) was concerned that variations in grade or density had not been properly incorporated.

It was the magnitude of financial transactions in the coal trade based on assays for ash and sulfur in ‘coal samples’ that promoted much of the early research into sampling. Gy tells about UK- and US-based researchers that ‘realised that sampling actually generated errors that could have a financial impact’, and so he began investigating coal properties in regard to particle top size, sample mass and sample variance. He mentions a Professor Hassialis from Columbia University, New York, who wrote a chapter on sampling based on a statistical multinomial model in the *Mineral Processing Engineer’s Bible*, first published in 1927. The number of influencing parameters that were never known meant that this approach could not be practically implemented. French mining engineer R Duval proposed a binomial model (in which the world is made up of white and black balls only) representing pure gangue and pure mineral in which all fragments were considered to have the same physical mass. While Gy (2004a) understandably found aspects of this model ‘dangerously misleading’, it germinated the seed of interest sown by his earlier experiences, leading to his 1949 decision to study the theoretical issues around sampling in earnest.

Gy now expressed his intention to develop a mathematical model relating the variance of the sampling error to the mass of the lot, the sample mass and the knowable physical properties of the material being sampled. Such a relationship would allow the minimum sample mass needed to achieve an acceptable sampling variance to be determined. Gy’s hopes of addressing the question ‘how much’ had to be pursued in his own time as his employer provided neither time nor resources for this research. Even with these obstacles, he devised and wrote up the formula and the basic tenets of the ‘theory of sampling’ in two internal, unpublished notes for his company entitled ‘A formula for the minimum sample mass’ and ‘Minimum sample mass required to represent a batch of ore’ as early as 1950. Historically, the theory of sampling was born in 1950.

This endeavour led to a first theoretical model specifically for particulate solids, and generalised models for solids of animal and vegetable origin and types of domestic and industrial waste. Liquids and gasses were also developed. By this stage, Gy recognised that the models had universal validity and that it was scale rather than physical state that differentiated between the range of applications.

## The formula established

The progression in Gy’s logic in regard to formulating the variance model, as early as 1950, is fascinating. He first identified all the unknown but physically well-defined parameters, including the number of fragments making up the lot, the corresponding fragments making up the sample, the grade of the sample, the number of fragments and the individual fragment mass. From these, he devised strict, algebraically simple mathematical relationships into which he introduced simplifications and approximations to produce easily implementable and practical formulae. At some point in this work, he realised that he needed to educate himself more properly in statistics, and only a few years later he was awarded his second PhD. This was necessary to be able to work more stringently with the crucial approximation simplifications of the full mathematical descriptions.

In this context, among his most important ideas was the concept of heterogeneity, which ‘lies at the root of all sampling errors’. Gy expressed the heterogeneity contribution carried by each fragment in the lot, which crucially can be summed up and, when divided by the number of fragments in the lot and given an appropriate statistical weight, leads to the desired approximate measure of the variance of the total sampling error.

After a very long process of trial and error (many years) in which he tested out many simplifications and approximations for correlating to the physical properties of the lot material, he arrived at the by now well-known general equation that is publically referred to as ‘Gy’s formula’ (Gy, 2004a).

## Practical experimentation with the formula

Gy originally attempted to validate the formula by calculating the variance of a lead ore using 16 ‘equally split’ samples of pulverised material, with splitting taking the role of sampling (Gy, 2004a). His experimental total sampling error was several times larger than the theoretical value, something that he interpreted as indicating that the fundamental sampling error was only one of several components in the game. He suggested that the other components of sampling error were the grouping and segregation error as well as sampling bias introduced through incorrect use of the riffle splitter. His research in the mid-1950s then led to the development of a circular cardboard sampling nomogram and later a sampling slide rule. The formula was first presented in English to the Society of Mining Engineers of the American Institute of Mining Engineers in 1957. However, it was only in 1965 that his research was presented in London at a meeting of the Institution of Mining and Metallurgy.

## Sampling of flowing streams

Pierre Gy’s 1960-62 research into flowing streams of materials on conveyor belts and liquid launders brought to his attention the importance of sampling the ‘whole stream’ for a fraction of the time (ie any increment must be a physical full slice of the stream). He identified the key issues in regard to cross-stream sampler operations, namely that the cutter velocity through the stream, the width of the cutter opening and the shape of the cutter are all-important, but it wasn’t until 1977 that these issues were scientifically resolved.

He also recognised that increments extracted at constant intervals from a flowing stream are not independent from one another, but that some level of auto-correlation exists between time series sample data. As early as 1962, Gy started publishing his work on chronostatistics, as it later became known, by borrowing the idea of spatial correlation between samples using concepts and data from the semi-variogram feature proposed by Matheron (1965) and later by David (1988) within geostatistics and transferring it to linear auto-correlation of time series data.

At this stage of his life, Gy made the choice to dedicate himself to writing and further research around the theory and practice of sampling, rather than continuing in his comfortable managerial position at Minerais et Metaux in Paris. This led to a 40-year period of theoretical research, consulting, troubleshooting, lecturing and teaching and writing articles and books that have gradually been disseminated all over the world.

## Theory of sampling introduced – and challenged

This time of progressive successes was not without serious challenges. Some parties and individuals strongly opposed Gy’s ideas and objected to his 1967 publication ‘Sampling of Particulate Materials’ (Gy, 2004b). No story is only about success – it is a sad historical fact that the response from ISO standards committees has been less than unanimous in accepting the work and insights of Pierre Gy (although this situation has been dramatically turned around since 2003 by a dedicated effort by the sampling community). However, the world now has at its disposition a first standard dedicated to the universal principles of representative sampling, DS 3077.

The notion of correct sampling and its linkages to probabilistic sampling were only first proposed by Gy in 1972. In modern parlance, the fundamental tenet is that a sample is correct if, and only if, each lot fragment has the same statistical probability of being selected for the sample as every other fragment. Under any other circumstances, the sampling procedure is said to be incorrect and will therefore result in unrepresentative lot ‘samples’ (better designated as ‘specimens’ for optimal distinction).

About this time, Gy found that some members of the scientific communities resisted his ideas about sampling as a scientific endeavour. His 1971 book *Sampling of Particulate Materials, Volume 2* was soon followed by another book *The Theory and Practice of the Sampling of Particulate Materials* in 1975, but only a few hundred copies were ever sold (Gy, 2004a). In this particular book, Gy made a very significant step in that he built ‘the mathematical bridge between selecting conditions and sampling errors’. He identified for the first time the distinction between *a priori* conditions of sample selection (conditions we can do something about before taking the sample) and *posteriori* conditions (conditions we observe, but about which we can do very little after the fact). The selection process itself can further be either probabilistic or non-probabilistic, and even if probabilistic, it can be correct or incorrect. Sampling errors are random errors, characterised by their statistical distribution and moments. Sampling can be accurate or biased (property of the mean), reproducible or not (property of the variance) and representative or not (property of the mean-squared error).

## Proportional sampling

Gy’s first encounter with metallurgical balance reconciliation was in a number of North African lead-zinc flotation plants, where he summarised the idea of balance saying that ‘whatever comes in must ultimately come out, one way or another.’ He noted that if this principle of balance is not observed, then there must be ‘measurement biases or unsuspected losses’, and that with a single exception in his 45 years of consulting, what came out was always less than what went in (Gy, 2004a). Eventually, after checking every sampling and measurement device, he reached the conclusion that the principal culprit for the 2-3 per cent deficit was the calibration of the conveyor belt scales. After observing numerous conveyor belts over the years, Gy concluded that they suffer from a structural lack of reliability, the main problem being the conversion of an electrical current into an accurate measurement of tonnes of ore. Rather than the 0.5 per cent accuracy claimed by manufacturers, plant personnel found a more realistic figure to be about ten per cent deviation.

## Bed blending

Perhaps the most important aspect of feeding a metallurgical furnace is to blend the raw materials in such a way that the average composition of the feed will be more or less uniform and homogenous in the one dimension of the ingoing material stream. Gy’s work on bed blending began with a study of material processed in cement kilns. The lack of flexibility and sensitivity of cement kilns is such that feed materials must be as uniform as absolutely possible to avoid costly damages. For this reason, a large cement company introduced a bed blending system to homogenise, as best as possible, the ingoing raw materials. Good sampling equipment aided by online analysers allowed major components in the cement to be determined every few minutes. Computerised assistance to calculate the average composition of the stockpiled kiln feed allowed the composition of the blending pile to be known with accuracy, providing an almost ideal feed to the kiln (Gy, 2004a).

## Gy’s publications

It is not possible to tell Gy’s story of discovery without talking about the more than 250 scientific books and papers that he published on the theory of sampling. His last textbook publication *Heterogeneite, Echantillonnage, Homogeneisation* (Heterogeneity, Sampling, Homogenising), published in 1988, was immediately translated into English and came to press in 1992. It was the French version of this book that Francis Pitard digested and shortened to produce his volume *Pierre Gy’s Sampling Theory and Sampling Practice, Heterogeneity, Sampling Correctness and Statistical Process Control*. The second edition of this book has become a world famous publication used by many practitioners and is taught in leading universities.

The *Proceedings of the First World Congress on Sampling and Blending* (WCSB1) in 2003 was published as a special issue of *Chemometrics and Intelligent Laboratory Systems* as a tribute to Pierre Gy’s life and work on the theory of sampling. This volume was to be the first in a series of proceedings, the eighth of which is associated with the Eighth World Conference on Sampling and Blending (WCSB8) to be held in Perth in May 2017. The proceedings from all of the World Congresses on Sampling and Blending are indispensable for anyone wanting to get into the theory and practice of sampling.

The sampling community honoured Pierre Gy in a special issue of *TOS Forum*, which contains personal tributes to Pierre Gy’s work and life from Francis Pitard, Dominique Francois-Bongarcon, Ralph Holmes, Ana Carolina Chieregati, Pentti Minkkinen and Kim H Esbensen. There are many more who would also like to be able to pay proper tribute to Pierre Gy’s life and legacy, which is why a session at the upcoming WCSB8 is dedicated exclusively to this purpose.

**References**

Brunton D W, 1895. The theory and practice of ore sampling, Trans AIME Volume XXV, 826-844.

David M, 1988. Handbook of Applied Advanced Geostatistical Ore Reserve Estimation, 232 p (Elsevier: Amsterdam).

Gy P, 1988. Heterogeneite, Echantillonnage, Homogeneisation (Heterogeneity, Sampling, Homogenizing), xiv+607 p (Masson: Paris).

Gy P, 2004a. Part IV: 50 years of sampling theory – a personal history, Chemometrics and Intelligent Laboratory Systems, 74:49-60.

Gy P, 2004b. Sampling of discrete materials – a new introduction to the theory of sampling. I. Qualitative approach, Chemometrics and Intelligent Laboratory Systems, 74:7-24.

Matheron G, 1965. Les variables regionalisees et leur estimation (Regionalized variables and their estimation), PhD thesis, Masson, Paris.

Richards R, 1908. Ore Dressing Volume 2, 508 p (McGraw-Hill).