There is a sense that the text written by Miksza and Elpus is a model of research thinking. The authors identify problems/errors that may or often appear in quantitative research in music education, ranging from simple to complex. By simple, I mean that the authors establish a framework by describing scientific inquiry, theories, constructs, and even systematic bias. Having a mental picture of music education, its problems, telling questions, key concepts, values, goals, and objectives is helpful in approaching this text. The authors appeared to develop the material based on the profession's research needs and not to merely improve upon present music education and/or education research textbooks. I had to reject thinking of the book as a possible text for a one or two semester music education research methods course; it is a reference. The book is extensive, consisting of not only 16 chapters of rich, detailed, information on the major concepts of quantitative research but has a password protected companion website. The authors state that the website has data files, video demonstrations of several common data analysis procedures using SPSS, supplementary conceptual and analytical exercises and resources aligned with the primary concepts of the chapters and data files for analyses necessary to complete the supplementary exercises. That's a lot.
The first two chapters are titled Prelude and Characteristics of Scientific Inquiry that are necessary chapters to inform the reader that this text is limited to scientific approaches to music education and that qualitative research, narrative research, and the multiple variations that can be found in today's dissertations along with philosophical, historical, and even speculative research are not addressed in this text. Part I contains chapters on descriptive research design and analysis, inferential analysis, and correlational design and analysis. The depth of these chapters forced me to think how this text might differ from a text used in a statistics class. I grabbed the closest statistics book which happened to be Robert Fried's Introduction to Statistics which focused on the nature of empirical observations, populations and samples, application of statistical procedures, descriptive, inferential, and correlation. Other topics were frequency distributions, histograms, elementary probability, measure of central tendency and variability, testing hypotheses, analysis of variance and measures of association. Voila!—a match. Having taken a basic statistics class would be advantageous in reading part 1 of Mikza and Elpus. The authors recognize this cross-over and there is an expectation that students have a rich background including familiarity with canned computer programs for basic statistics. There are multiple references to helpful statistical software and readers are often referred to more in-depth statistics texts – for example on advice for how to proceed if assumptions are not met, or if they are interested in how a t test is calculated. Randomization is basic in much quantitative research and is emphasized by the authors, reinforced by statements such as “in nearly every introductory statistics class the basic inferential statistical procedures are introduced with a clear caveat—assume a simple random sample from a population” (p. 163).
The writing is clear and complete, if somewhat complex. When the authors introduce a concept or a formula, they expect that the material will be remembered as the concepts and abbreviations appear later in the text, without a second explanation. A table of formulas would be helpful. The authors' rebuttal to this comment would be that in the modern world formulas and critical value tables are wholly unnecessary. “All statistical software packages will not only calculate the t value for you they will also provide an exact estimate of statistical significance”, (p. 105). The appendix “Inferential analysis with nonparametric tests” is only three pages long with a one page table that is concise and helpful. The intent of the authors is to present quantitative concepts in the context of music education research; unfortunately there is not much quality music education research employing the concepts addressed other than that of the authors.
There is little to be gained by a review commenting on the statistics and designs as the authors are solid in their explanations, if brief. The authors, in discussing multiple comparisons with more than two groups, inform the reader that the statistic solution is to use the Bonferroni correction although it corrects harshly for a type I error. Their explanation on how to do this is one sentence in length. Students often fail to distinguish comparisons within groups and comparisons between groups. This distinction is covered mainly in footnotes 13 and 14 on page 114. Footnotes in this text are not to make the authors more scholarly; the footnotes are essential for understanding. The authors emphasize that in addition to randomization, that research design, validity and interpretation of research data in an appropriate context are requisite in every research effort. Bravo. The authors take a positive approach and do not use examples of poor or inappropriate research and do not critique the related research in the examples chosen. Quantitative research requires valid dependent measures but measurement receives a light treatment despite its long history of concerns about validity, reliability and usability. The authors do ask the reader to recall from “your studies of educational measurement that, according to classical test theory, any measurement instrument only yields an approximation of an individuals’ unknowable true score on the construct being measured. The observed score is the true score plus or minus an unknowable amount of random measurement error” (p. 258).
A little further thought matched the concepts in Mikza and Elpus with the possible material in a course on evaluation that includes material on tests and measurements; another match. Evaluation has many of the same problems caused by a lack of randomization. Evaluation has a major subjective component that requires one to make judgments about the merit, value, significance, credibility, and utility of the research or a program, policy, or performance. Rigour in evaluation means randomized controlled trials. Evaluation is also faced with circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. Quantitative research, with its dependent variables, must become more common if results are to have any impact on teaching and learning. Evaluation is what separates good designs from bad ones; good designs compatible with theory is a major thrust of this volume. Not solvable is just how one recognizes, detects, and exposes bad and sloppy science? Science true at one moment is subject to change. Statistics is replacing calculus as the desired high school competency and the Next Generation Science standards are teaching data analysis. Miksza and Elpus would applaud.
Mikzsa and Elpus is unique in its thoroughness although a simplified approach for beginning coursework has been advocated among music education researchers for decades. Donald Campbell and Julian Stanley's Experimental and Quasi-experimental Designs for Research was originally published in the 1963 Handbook of Research on Teaching edited by Nate Gage. The chapter became in 1966 a separate publication and was probably the best selling research publication in education, selling more than a million copies. Campbell and Stanley list some 10 sources of internal invalidity and four of external. Miksza and Elpus may make a major impact. In 1974, Jason Millman & D. Bob Gowin published Appraising Educational Research: A Case Study Approach describing eight research articles with critiques. It would be an excellent elementary approach to the Miksza and Elpus as the purpose of the book is to describe the nature of criticism with suggested questions, student responses from over 800 students and 27 colleges and universities followed by comments on the 8 studies by the book's authors. Campbell and Stanley provides the same format with author comments on strengths and weaknesses of research design. Miksza and Elpus discuss all of the issues raised by Millman and Gowin as well as Stanley and Campbell with a step-by step approach to conducting quantitative research with today's computer programs and forty years of experience of data gathering, statistical analysis, judgments on worth and value to the profession and appropriate interpretation of research findings.
Miksza and Elpus have been successful in publishing a document that can be a continuing resource and reference throughout a student's education and career. The writing is almost terse in their attempt to be concise leaving scant room to describe situations that support their position. The need for replication is mentioned in several places such as on page 8: “It is essential that the scientific community be able to independently evaluate and replicate the processes involved in any given research study”, but examples would be helpful. (There are few examples of true replication in music education.) Replication is a major research concern with encouragement to account for studies that have negative or null results. Drawing on findings from the Campbell Collaboration, ClinicalTrials.gov Registry, and the Open Science Framework initiated by Brian Nosek we know that about 70 per cent of published research studies could not be replicated. Nosek's investigation of 100 research studies published in three journals in psychology revealed that 36 percent of the replications found p values below .05 contrasted with 97 per cent of the original studies reporting significant effects. Information on replication justifies in my mind the attention to research design detail in Miksza and Elpus and their emphasis that no degree of methodological or statistical sophistication can make up for a lack of meaningful research questions and sound reasoning. They do admit that scientific knowledge is always subject to correction and revision with research having some degree of uncertainty (p. 11). The material on correlation and surveys is especially strong as the authors recommend solutions to missing data in survey research. (They would not find response rates of 20 or 30 percent acceptable.) Unfortunately, according to the Economist (May 26) response rates to the labor force survey declined last year in Great Britain to 43%, down from 70% in 2001. Statisticians who have used a variety of corrections have found that even these are failing.
The book is chock-full of illustrations of use of the t and F test, ANCOVA, regression, factor analysis, structural equation modelling and more which defies describing in this review when it took the authors some 200 pages. I can only mention a few caveats. Idiosyncratic and systemic bias is to be anticipated, acknowledged, and mitigated. Research will undoubtedly include the researcher’s subjective values and perspective and one's personal life and social realities brings some bias (p. 9). Researchers are often interested in less tangible and concrete perceptions, beliefs, motivations, attitudes, and so on. The authors name these constructs. Any measurements of these constructs become latent in nature. Variation is the sine qua non of statistical analyses with the purpose being to summarize, organize, and help to explain the variation that exists among a great deal of data (p. 3). The authors do state that a collection of related studies whose findings converge in systematic way could lead to insights (p. 9) but this vague sentence does not do justice to the importance of a critical review of all related research. It is only in the concluding, uplifting chapter that the authors admonish the reader to evaluate research critically but not cynically, although how this knowledge is to be applied is left to the judgment of the student. A few sentences stuck with me: Failure to reject a null hypothesis cannot be interpreted as accepting a null hypothesis (p. 58). Statistical significance is not a measure of truth, a measure of generalizability nor a direct statement about the likelihood in a replication. Bayesian statistics are gaining traction and that the inferences drawn differ substantially from those that are logically possible from the frequentist NHST analyses which is especially true in music education research where student characteristics are often critical. And don't forget to consider Cohen d for effect size. As the authors suggest, a student might have to draw on knowledge from other course work—good research is careful and thoughtful.