1. Introduction
The relationship between theory and phenomena involves an interplay of theory, modeling, and experiment during which both the identification of parameters and the physical operations suitable for measuring them are jointly determined. This interplay has sometimes been suspected of threatening the objectivity of science. Thus, Kosso (Reference Kosso1989, 245–46) called for a “declaration of independence” between theory and experiment, which Alan Chalmers (Reference Chalmers2003) aptly describes as a preventative measure against theoretical nepotism. But would thorough prevention not leave the experimenter theoretically illiterate? That such theoretical neutrality is not feasible is a theme familiar from Thomas Kuhn (Reference Kuhn1961) on the function of measurement, assessing critically both the cliché that the theory can be back inferred from the data and the companion cliché that what counts as experiment, measurement, or data is independent of theory or neutral between theories.
But Kosso's cautions are not entirely idle, and Kuhn's moral cannot very well be ‘credo ut intelligam’. To elicit the precise role of theory or modeling in measurement, we need to examine seminal examples of measurements proposed, carried out, and assessed by scientists. For a specific operation used to gather information or generate numbers, two questions are pertinent: does it count as a measurement, and if so, what quantity does it measure? More important, what is the basis for answers to these questions, and to what extent is that basis independent of theory?
2. What Counts as Measurement, and What Is Measured?
Undoubtedly theories are tested by confrontation of the empirical implications or numerical simulations of their models with data derived from measurement outcomes. But for this confrontation to occur, it must first be a settled matter what count as relevant measurement procedures for physical quantities represented in those models. What settles that? I submit that the classification of a physical procedure as measurement of a parameter in such a model or simulation is itself provided by at least a core of the theory itself:
Whether a procedure is a measurement and, if so, what it measures are questions that have, in general, answers only relative to a theory.
To support this, I will explore several examples in physics. But this thesis is qualified by the recognition that
those answers, provided by theory, are part of what allows a theory to meet the stringent requirement of empirical grounding.
Skeptical conclusions and the fear of theoretical nepotism therefore can be disarmed, although only if we are able to set aside certain traditional foundationalist impulses concerning the possibility of confirmation, evidence, and evidential support.
3. Examination of Measurement Criteria in Action
3.1. Galileo Measures the Force of the Vacuum
In his Dialogues Concerning Two New Sciences Galileo presented the design of an apparatus to measure the force of the vacuum. Given Galileo's hypothesis concerning the vacuum, this does measure the magnitude of that force, although from a later point of view it is measuring a parameter absent from Galileo's theory, namely, atmospheric pressure.
The prevailing opinion concerning the vacuum in Galileo's time was that in nature there is a horror vacui, that a true vacuum is impossible. Galileo saw some evidence for this view but reinterpreted that evidence as equally supporting the weaker thesis that, indeed, there is an aversion of nature for the vacuum, but it is not an absolute. Rather, there is a force, the force of the vacuum, that tends to eliminate it by drawing the borders together, and this force has a definite but limited magnitude. Here is his initial evidence for the attractive force of the vacuum: “If you take two highly polished and smooth plates of marble, metal, or glass and place them face to face, one will slide over the other with the greatest ease, showing conclusively that there is nothing of a viscous nature between them. But when you attempt to separate them and keep them at a constant distance apart, you find the plates exhibit such a repugnance to separation that the upper one will carry the lower one with it and keep it lifted indefinitely, even when the latter is big and heavy” (Galilei Reference Galilei1914, 11).
Clearly this adhesion can be brought to an end, although not without difficulty. If indeed the adhesion is due to an attractive force, then the magnitude of that force should be measurable. So Galileo takes the bull by the horns and designs a measuring instrument. Presupposing his theory of the force of the vacuum, he presents a procedure for measuring that force, that is, determining its value, under suitable conditions. He describes the design and provides a diagram (Galilei Reference Galilei1914, 14, fig. 4), adding the following instructions:Footnote 1
The air having been allowed to escape and the iron wire having been drawn back so that it fits snugly against the conical depression in the wood, invert the vessel, bringing it mouth downwards, and hang on the hook K a vessel which can be filled with sand or any heavy material in quantity sufficient to finally separate the upper surface of the stopper, EF, from the lower surface of the water to which it was attached only by the resistance of the vacuum. Next weigh the stopper and wire together with the attached vessel and its contents; we shall then have the force of the vacuum. (Galilei Reference Galilei1914, 14)
The snug fit of the stopper duplicates the arrangement of the two smooth marble plates. But now this arrangement has been turned into a measuring instrument, with the force measured by the amount of weight it can support, so that a quantitative comparison is made possible.
In retrospect, we do not see things in the same way. Torricelli's reasoning and, more important, not much later that century, Pascal's barometer and his experiment on the Puy de Dome establish the reality of atmospheric pressure. From that point on, Galileo's instrument has a new theoretical classification: it is still a measuring instrument, but what it measures is a quite different parameter, the force the atmosphere exerts on the bottom surface of the stopper.
In this case, the instrument is on both sides recognized as a measuring apparatus. But relative to the two different theories, what it measures are two different physical quantities.
3.2. Atwood's Machine: Credentialing Newton's Conception
Atwood's machine, still often used in class demonstrations, was described by the Reverend George Atwood in his “A Treatise on the Rectilinear Motion and Rotation of Bodies, with a Description of Original Experiments Relative to the Subject” (Reference Atwood1784). Was this a measuring instrument, and if so, what can it be used to measure? The apparatus is described thusly: “The Machine consists of two boxes, which can be filled with matter, connected by an string over a pulley. … Result: In the case of certain matter placed in the boxes, the machine is in neutral equilibrium regardless of the position of the boxes; in all other cases, both boxes experience uniform acceleration, with the same magnitude but opposite in direction” (299–300). Newton's second law implies that, with masses M and m and gravitational constant g, the acceleration equals . Assuming the second law, therefore, it is possible to calculate values for the theoretical quantities from the empirical results. The value of g is determined via the acceleration of a freely falling body (also assuming the second law); hence, measuring the acceleration then determines the mass ratio
. Choosing a unit for mass, and assuming the third law that action = reaction (tested earlier in a different way by colliding pendulums), the result tests the second law itself (e.g., Hanson Reference Hanson1958, 100–102).
Kuhn describes how this originated in a crucial scientific controversy, “that of deriving testable numerical predictions from Newton's three Laws of motion and from his principle of universal gravitation. … The first direct and unequivocal demonstrations of the Second Law awaited the development of the Atwood machine, … not invented until almost a century after the appearance of the Principia” (Reference Kuhn1961, 168–69). Cartesian physics had not died with Descartes, and Newton's theory too had to struggle for survival, for almost a century. Atwood himself notes the controversy:
Many experiments, however, have been produced, as tending to disprove the Newtonian measure of the quantities of motion communicated to bodies, and to establish another measure instead of it, viz. the square of the velocity and quantity of matter; and it immediately belongs to the present subject, to examine whether the conclusions which have been drawn from these experiments arise from any inconsistency between the Newtonian measures of force and matter of fact, or whether these conclusions are not ill founded, and should be attributed to a partial examination of the subject: but some considerations concerning the principles of retarded motions should [be] premised. (Reference Atwood1784, 30)
Thus, relying on what he could take to be a measure of mass, and assuming the value of gravitational acceleration, Atwood carefully verified that the objects accelerated at the predicted rate. But as we just saw, those values are themselves determined assuming Newton's laws.
The reasoning and its rationale were thoroughly investigated in the nineteenth and early twentieth centuries (e.g., Poincaré Reference Poincaré1905/1952, 97–105; Mach Reference Mach and McCormack1960, chap. 2, secs. 1 and 5). As Mach points out, Atwood's machine allows one to measure more precisely the constant acceleration postulated in Galileo's law of falling bodies, by calculations independent of Newton's theory. But its use to measure Newton's dynamic quantities is an operation that counts as such a measurement only relative to Newtonian theory. The Cartesian critique of Newtonian physics was that by introducing mass and force, not definable in terms of spatial and temporal extension, Newton had brought back the medievals’ occult qualities. The Newtonian response was, in effect, that admittedly what is measured directly in any setup are lengths and durations but that nevertheless they could show how to measure mass and force—Atwood's machine is a paradigm of how this could be done. This could not possibly satisfy the Cartesian. But for us it should display what the just requirements are on a newly developing theory: to show how certain procedures, modeled in accordance with that theory, count as measurements that will under propitious circumstances determine values for the theoretical quantities.
3.3. Quantum Mechanics: What Counts as a Measurement at All?
The examples so far are of procedures taken, on all hands, as measurements. The question, “which quantity was measured,” however, had only theory-relative answers. Quantum mechanics brought a greater rupture in the conception of measurement (e.g., Grünbaum Reference Grünbaum1957, 713–15). Heisenberg's uncertainty relations imply a statistical relation between the outcomes of concurrently conducted position and momentum measurements: the standard deviations will satisfy the relation that their product is less than or equal to a certain constant. On the face of it, any such a statistical relation is compatible with position and momentum having precise values at all times.
Bohr denied insistently that the Heisenberg uncertainty principle is merely a principle of limited measurability. The initial arguments by Heisenberg and Bohr, however, seemed to invoke merely operational incompatibility of what would classically have counted as measurement procedures and were challenged by designs for operationally feasible position-plus-momentum measurements.
Most salient here is that ‘time-of-flight’ measurement, a technique that makes perfect sense in quantum physics, has been subject to rigorous theoretical analysis (e.g., Heisenberg Reference Heisenberg1930, 20; Feynman Reference Feynman1965, 96–98) and is of common experimental and practical use (e.g., Wcirnar et al. Reference Wcirnar, Romberg, Frigo, Kasshike and Feulner2000). Thus, in time-of-flight mass spectrometry, ions are accelerated by an electrical field to the same kinetic energy, with the velocity of the ion depending on the mass-to-charge ratio. The time of flight is used to measure their velocity, from which the mass-to-charge ratio can be determined.
When this procedure is used together with a record of the emission and reception of the particles, values for velocity and position at, for example, the time of reception can be retrospectively assigned. Appeal to this technique to design a putative measurement of simultaneous sharp position and momentum values appears to be both persistent and recurrent in the literature (e.g., Dyson Reference Dyson and Barrow2004).
So, operational incompatibility is not at issue. Bohr's next reaction was to point out that the crucial term here is “retrospectively”: “Indeed, the position of an individual at two given moments can be measured with any desired degree of accuracy; but if, from such measurements, we would calculate the velocity of the individual in the ordinary way, it must be clearly realized that we are dealing with an abstraction, from which no unambiguous information concerning the previous or future behavior of the individual can be obtained” (Reference Bohr1963, 66). The retrospective judgment will not match any theoretically possible quantum mechanical state for the particle. Therefore, within the theory, there can be no prediction based on those putative measurement outcomes. Bohr asserts, in addition, that the spread in outcomes of subsequent measurements shows that no rule of any sort could improve on this predictive failure.
As stated, this could still allow that the procedure is indeed a measurement of simultaneous position and momentum values, with the qualification that the outcomes do not have any practical value. But the conclusion is stronger: the procedure does not count as a measurement at all. If that is so, then it is theory that decides on not only what is measured, if a measurement is made, but what counts as a measurement in the first place. It is the criterion for the latter judgment that is first given true rigor and precision in the foundations of quantum mechanics.
First Criterion for Counting as Measurement
The time-of-flight procedure offered a good example for this analysis and is analyzed at length, for this purpose, in articles by Margenau (Reference Margenau1958) and by Park and Margenau (Reference Park and Margenau1968). The direct measurements in the time-of-flight procedure are all of positions. But a calculation is presented, drawing on these direct measurement results, to yield a value for velocity or momentum. Should this procedure—call it P—be accepted as a true, complex, measurement of momentum? There is one minimal theoretical criterion—a coherence criterion—that is quite straightforward:
• the theory already provides a theoretical probability distribution for outcomes of momentum measurements given any quantum mechanical state;
• the procedure P in question also admits a quantum mechanical theoretical description that implies a probability distribution for its outcomes, given any quantum mechanical state;
• the criterion for P being a measurement of momentum is that these theoretically calculated probability distributions should coincide for all states.
This is a coherence condition; it is required on the basis of consistency. If this criterion were not satisfied for a given procedure P and yet P were counted as a measurement procedure for values of momentum, then the theory would yield inconsistent predictions. Momentum is only an example for this general point. So here already, with this minimal necessary condition (not to be taken as sufficient), the question, whether a given procedure counts as a measurement at all, requires a theoretical answer: the question can only be answered completely relative to a theory.
What about the putative time-of-flight measurement of momentum then? To begin, at least ideally, the time-of-flight technique does satisfy this criterion for a measurement of momentum, for a particular case.Footnote 2 With a particle prepared in a definite position state at time (i.e., localized within a small although finite region—a state with compact support) and a later measurement showing its position, a value for its momentum at time 0 can be calculated. So in this situation we see a sequence of direct position measurements plus a calculation of a value for momentum for the time of the first position measurement. And, for this state preparation, the predicted probability distribution of outcomes of this procedure is the same as the Born conditional probability for outcomes of momentum measurements on systems in that state.
However, the above criterion, although minimal, is strong: the final words, “for all states,” are crucial. We cannot conclude that momentum can be equated with a function of positions over time, on the basis that the measurement outcome predictions for the two will be the same in a particular sort of case. Since position and momentum are incompatible observables in quantum mechanics, that theory implies that there can be no functional relationship in general between outcomes of any series of position measurements and outcomes of momentum measurements. So we have to distinguish: in the particular case of a freely moving initially localized particle, the time-of-flight procedure is legitimate. It will, according to the theory, present no data that would conflict with the predictions for direct measurement of momentum. But it is not true that this procedure qualifies as a momentum measurement procedure.
Specifically, there is no warrant for concluding that the system is in a state similarly ‘localized’ with respect to momentum. The only conclusion that is legitimate is that if the time-of-flight ‘measurement’ of momentum is performed in a ‘large enough’ collective of systems prepared in the same state, then the distribution of outcomes will be the same as in another such collective subject to regular momentum measurements.
Second Criterion for Counting as Measurement
Why can we not just conclude that we have a measurement here of momentum, with a restricted domain of application? If we conclude that, and keep in mind that we have a simultaneous position measurement, then we will imply that we also have a derived measurement of such defined quantities as position + momentum. There are no observables of that sort in the theoretical framework. So then we would have putative measurements that are not measurements of any observables—hence, as far as the theory is concerned, not measurements of anything at all, hence, not measurements after all.
There is thus also a stronger requirement, apart from the above minimal coherence condition. For a procedure to be a measurement, relative to the theory, there must be a quantity that it measures. A simple way to make the point is this: for a procedure to qualify as a simultaneous joint measurement of quantities A and B, the theory would (according to the criterion displayed above) have to imply that the probabilities of its outcomes match the joint probabilities assigned to A and B. But if A and B do not commute, the theory affords no joint probabilities for their measurement outcomes. Hence the criterion cannot be satisfied, no matter what that procedure is like.
Or again, in the case of elementary quantum mechanics, all physical quantities are represented by Hermitean operators. If a procedure qualifies as a simultaneous measurement that yields a pair of values of A and B, then there needs to be such an operator representing the quantity measured. But then any linear function of that quantity, such as A + B, will also be represented by such an operator. As von Neumann already saw, if the operators representing A and B are noncommuting then there will be no such representing operator for A + B. So there cannot be a procedure that can count as a simultaneous measurement of such pairs of quantities.
The Criterion Applied to Uses of Entangled States
There is another putative procedure for simultaneous measurement of noncommuting observables, in addition to the time-of-flight argument. As made famous by the Einstein-Podolski-Rosen paradox, it is possible for two systems to form a total system in an entangled state of this sort:
the system composed of particles X and Y is in a pure state that is a superposition of the correlated states , for i = 1, 2, … , which is at the same time a superposition of the correlated states
,
and this is possible although
the values a(i) are values of observable A, while the values a′(i) are values of observable A′, which does not commute with A, and similarly for values b(i) and b′(i) of noncommuting observables B and B′.
When all this is the case, the following statements hold:
Suppose A is measured on the first particle and value a(k) is found. Then the probability of finding value b(k), if B is measured on the second particle, equals 1.
Suppose B′ is measured on the second particle and value b′(k) is found. Then the probability of finding value a′(k), if A′ is measured on the first particle, equals 1.
In view of this, one could propose the following procedure: measure A on the first particle and B′ on the second—if values a(k) and b′(m) are found, declare outcome of a joint measurement of A and A′ on the first particle.
Just like with the time-of-flight example, we can cite empirical justification for the claim that this procedure is reliable, for the theory predicts a very stable distribution for the actually found outcome pairs a(k), b′(m) for any given prepared joint state of this sort and, hence, also for the ‘inferred’ a(k), a′(k) outcome pairs arrived at by direct measurement plus inference. But from the point of view of the theory, that complex procedure of measurement plus ‘inference’ is not a measurement procedure at all, for there just is no observable that is being measured.Footnote 3
Conclusion: Counting as Measurement at All Is Theory Relative
Relative to quantum theory, therefore, we can draw the following conclusion:
a) The quantities of the theory are those which appear as parameters or variables within models provided by the theory for the representation of phenomena.
b) Whether a given procedure counts as a measurement procedure (and whether the physical apparatus in use counts as a measurement apparatus) depends on whether there is a quantity of the theory for which this procedure, as modeled within the theory, meets the above criteria.
Thus, not only what a procedure measures, if it is a measurement procedure, but whether it is a measurement in the first place is a question whose answer is in general determined by theory, not solely by operational or empirical characteristics.
4. Empirical Grounding
In conclusion I will then locate this view of measurement in the larger picture of science subject to a demand for empirical grounding. Proper understanding of this criterion will be a corrective to worries about ‘theory infection’ of measurement. To the extent that they presume or presuppose independence between theory and evidence, traditional ideas about justification or confirmation of scientific theories are indeed threatened. A different view concerning the demands and norms pertaining to measurement operative in scientific practice was clearly, if briefly, spelled out by Hermann Weyl (Reference Weyl1927/1963, 121–22; see Glymour Reference Glymour1975, Reference Glymour1980; van Fraassen Reference van Fraassen2009). In slogan form, the demand on theories is that they be empirically grounded, which involves both theoretical and empirical tasks. The crafting of a relationship between theory and phenomena is an interplay of theory, modeling, and experiment during which both the identification of parameters and the physical operations suitable for measuring them are determined. I will present the main features here briefly to relate them to what counts as measurement or counts as measurement of what.
One epistemological point may sound quite paradoxical:
a) a theory cannot be less likely to be true (or empirically adequate) than any of its stronger extensions,
b) but when a theory is still weak there can in general be very little or even no evidence relevant to its support.
The reason is that, if there is to be relevant evidence at all, it must be possible to design experiments whose outcomes can furnish evidence. To design such an experiment, one has to draw on the implications of the theory, and a weak theory does not imply very much.
Specifically, when first introduced, a model or theory may involve theoretically postulated physical quantities for which there is as yet no measurement procedure available. Thus with the advent of atomic theory in the early nineteenth-century, mass ratios of the atoms or molecules played a significant part in the models offered for chemical processes but could not be determined from the measurement data. During that century the theory was extended by adding hypotheses (beginning with Avogadro's), and it became possible to connect theoretical quantities to measurable ones. Such development, simultaneously strengthening the theory and introducing new measurement procedures, is not adventitious or optional: it is a fundamental demand in the empirical sciences. But in view of a above, the extended theory cannot be more likely to be true than the original, relative to any given body of evidence. The air of paradox may disappear if we reflect that it was the strengthening of the theory that made it possible for new procedures to count as measurements, thus producing evidence relevant to the extended theory that would have no significance relative to the weaker original.
Empirical grounding is this process of simultaneously, harmoniously extending both the theory and the range of relevant evidence. There are three parts to it, two emphasized by Weyl and a third by Glymour (Reference Glymour1975):
Determinability: any theoretically significant parameter must be such that there are conditions under which its value can be determined on the basis of measurement.
Concordance, which has two aspects:
Theory Relativity: this determination can, may, and generally must be made on the basis of the same theoretically posited connections.
Uniqueness: the quantities must be ‘uniquely coordinated’; there needs to be concordance in the values thus determined by different means.
Refutability, which is also relative to the theory itself: there must be an alternative possible outcome for the same measurements that would have refuted the hypothesis on the basis of the same theoretically posited connections.
The main aim of the current article was to explore the necessity, indeed inevitability, of the clause “on the basis of the same theoretically posited connections” that appears twice in the above components of the demand for empirical grounding. Determination of the value of a physical quantity, represented in a model of certain phenomena, must be by measurements performed on those phenomena—but with the outcomes related to the model by calculations within the theory itself. That is precisely what we saw in the examples examined above; the point is brought to light by showing the alternatives in the meaning of measurement outcomes relative to different theories.