We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Stefanie Markovits’ chapter thinks about counting and accountability, and how they inform literary representations of the military man, one of the most visible of war’s outcomes in mid-Victorian Britain. Markovits reflects on this period as one which saw ‘the rise of statistics as a discipline of social science and a method of statecraft in Britain’, and with it the growing need for accountability in public affairs. The figure of the soldier is both hero and statistic, individual and number, in a period where fiction, philosophy, and popular commentary, were preoccupied with how individuals realised their fully individualised potential. The soldier’s cultural and political potency is enabled because his being is aligned with the numbers that account for him. In the work of Tennyson, Harriet Martineau, and Dickens we see how ‘the mid-century soldier becomes such a potent figure precisely because his “type” aligns so closely with numbers’.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
Three Stanford-educated Chicanos took the stand in support of MAS, and these witnesses were central in Judge Tashima’s final ruling. Specifically, they detailed in a scholarly way the academic integrity of the department, the efficacy of taking the classes, and also demonstrated how state representatives used racist “code words” in cementing their opposition to the program. We detail their times testifying, how the state desperately tried to trip them up.
This chapter links the creation of MAS to the historical creation of Ethnic Studies – setting the record straight on the nature of this type of education amidst massive amounts of local and national misinformation. It details what MAS was, the effects of the program on student academic success, while examining how critically engaged, educated Mexican American students came to be seen as such a “threat” to the state.
In this article, we present the findings of an oral history project on the past, present, and future of psychometrics, as obtained through structured interviews with twenty past Psychometric Society presidents. Perspectives on how psychometrics should be practiced vary strongly. Some presidents are psychology-oriented, whereas others have a more mathematical or statistical approach. The originally strong relationship between psychometrics and psychology has weakened, and contemporary psychometrics has become a diverse and multifaceted discipline. The presidents are confident psychometrics will continue to be relevant but believe psychometrics needs to become better at selling its strong points to relevant research areas. We recommend for psychometrics to cherish its plurality and make its goals and priorities explicit.
Focusing on methods for data that are ordered in time, this textbook provides a comprehensive guide to analyzing time series data using modern techniques from data science. It is specifically tailored to economics and finance applications, aiming to provide students with rigorous training. Chapters cover Bayesian approaches, nonparametric smoothing methods, machine learning, and continuous time econometrics. Theoretical and empirical exercises, concise summaries, bolded key terms, and illustrative examples are included throughout to reinforce key concepts and bolster understanding. Ancillary materials include an instructor's manual with solutions and additional exercises, PowerPoint lecture slides, and datasets. With its clear and accessible style, this textbook is an essential tool for advanced undergraduate and graduate students in economics, finance, and statistics.
It was identified in the largest graduate unit of the Faculty of Medicine of a major Canadian University that there was a critical unmet curricular need for an introductory statistics and study design course. Based on the collective findings of an external institute review, both quantitative and qualitative data were used to design, develop, implement, evaluate, and refine such a course.
Methods
In response to the identified need and inherent challenges to streamlining curriculum development and instructional design in research-based graduate programs representing many biomedical disciplines, the institute used the analyze, design, develop, implement and evaluate instructional design model to guide the data-driven development and ongoing monitoring of a new study design and statistics course.
Results
The results demonstrated that implementing recommendations from the first iteration of the course (Fall 2021) into the second iteration (Winter 2023) led to improved student learning experience (3.18/5 weighted average (Fall 2021) to 3.87/5 (Winter 2023)). In the second iteration of the course, a self-perceived statistics anxiety test was administered, showing a reduction in statistics anxiety levels after completing the course (2.41/4 weighted average before the course to 1.65/4 after the course).
Conclusion
Our experiences serve as a valuable resource for educators seeking to implement similar improvement approaches in their educational settings. Furthermore, our findings offer insights into tailoring course development and teaching strategies to optimize student learning.
The use of programming languages in archaeological research has witnessed a notable surge in the last decade, particularly with R, a versatile statistical computing language that fosters the development of specialized packages. This article introduces the tesselle project (https://www.tesselle.org/), a comprehensive collection of R packages tailored for archaeological research and education. The tesselle packages are centered on quantitative analysis methods specifically crafted for archaeology. They are designed to complement both general-purpose and other specialized statistical packages. These packages serve as a versatile toolbox, facilitating the exploration and analysis of common data types in archaeology—such as count data, compositional data, or chronological data—and enabling the construction of reproducible workflows. Complementary packages for visualization, data preparation, and educational resources augment the tesselle ecosystem. This article outlines the project's inception, its objectives, design principles, and key components, along with reflections on future directions.
This chapter is written for conversation analysts and is methodological. It discusses, in a step-by-step fashion, how to code practices of action (e.g., particles, gaze orientation) and/or social actions (e.g., inviting, information seeking) for purposes of their statistical association in ways that respect conversation-analytic (CA) principles (e.g., the prioritization of social action, the importance of sequential position, order at all points, the relevance of codes to participants). As such, this chapter focuses on coding as part of engaging in basic CA and advancing its findings, for example as a tool of both discovery and proof (e.g., regarding action formation and sequential implicature). While not its main focus, this chapter should also be useful to analysts seeking to associate interactional variables with demographic, social-psychological, and/or institutional-outcome variables. The chapter’s advice is grounded in case studies of published CA research utilizing coding and statistics (e.g., those of Gail Jefferson, Charles Goodwin, and the present author). These case studies are elaborated by discussions of cautions when creating code categories, inter-rater reliability, the maintenance of a codebook, and the validity of statistical association itself. Both misperceptions and limitations of coding are addressed.
Much research shows that the ratings that critics, judges, and consumers assign to wines are heteroscedastic. A rating observed is one draw from a latent distribution that is wine- and judge-specific. Estimating the shape of a rating’s distribution by minimizing a sum of cross entropies has been proposed and tested. This article proposes a method of improving the accuracy of that estimate by using information about the context of a wine competition or cross-section ratings data. Tests using the distributions implied by 90 blind triplicate ratings show that the sum of squared errors for the solution using context or cross-section information is 50% more accurate than not using such information and over 99% more accurate than ignoring the uncertainty about a rating.
The goal of public health is to improve the overall health of a population by reducing the burden of disease and premature death. In order to monitor our progress towards eliminating existing problems and to identify the emergence of new problems, we need to be able to quantify the levels of ill health or disease in a population. Researchers and policy makers use many different measures to describe the health of populations. In this chapter we introduce more of the most commonly used measures so that you can use and interpret them correctly. We first discuss the three fundamental measures that underlie both the attack rate and most of the other health statistics that you will come across in health-related reports, the incidence rate, incidence proportion (also called risk or cumulative incidence) and prevalence, and then look at how they are calculated and used in practice. We finish by considering other, more elaborate measures that attempt to get closer to describing the overall health of a population. As you will see, this is not always as straightforward as it might seem.
Focusing on the physics of the catastrophe process and addressed directly to advanced students, this innovative textbook quantifies dozens of perils, both natural and man-made, and covers the latest developments in catastrophe modelling. Combining basic statistics, applied physics, natural and environmental sciences, civil engineering, and psychology, the text remains at an introductory level, focusing on fundamental concepts for a comprehensive understanding of catastrophe phenomenology and risk quantification. A broad spectrum of perils are covered, including geophysical, hydrological, meteorological, climatological, biological, extraterrestrial, technological and socio-economic, as well as events caused by domino effects and global warming. Following industry standards, the text provides the necessary tools to develop a CAT model from hazard to loss assessment. Online resources include a CAT risk model starter-kit and a CAT risk modelling 'sandbox' with Python Jupyter tutorial. Every process, described by equations, (pseudo)codes and illustrations, is fully reproducible, allowing students to solidify knowledge through practice.
Taking a simplified approach to statistics, this textbook teaches students the skills required to conduct and understand quantitative research. It provides basic mathematical instruction without compromising on analytical rigor, covering the essentials of research design; descriptive statistics; data visualization; and statistical tests including t-tests, chi-squares, ANOVAs, Wilcoxon tests, OLS regression, and logistic regression. Step-by-step instructions with screenshots are used to help students master the use of the freely accessible software R Commander. Ancillary resources include a solutions manual and figure files for instructors, and datasets and further guidance on using STATA and SPSS for students. Packed with examples and drawing on real-world data, this is an invaluable textbook for both undergraduate and graduate students in public administration and political science.
This chapter considers the role of neuropsychology in the diagnostic process. It covers who can undertake a neuropsychological assessment, when to undertake an assessment, and some of the assumptions underlying neuropsychological assesssment. Basic psychometrics are covered, using the premise that undertanding a few basic concepts is sufficient for most practioners as more complex ideas are developed from these basics. This includes the normal distribution, different types of average, the standard deviation, and the correlation. Next, the relationship between different tyes of metrics is discussed, focusing on IQ/Index scores, T-scores, scaled scores, and percentiles.
Chapter 4 examines efforts in the nineteenth and early twentieth centuries to better understand Britain’s rain. Meteorologists had attempted to investigate the distribution of rain prior to the 1850s, but observation points remained few and inadequately distributed. The solution to answering questions about the geographies of the rain was the establishment of a rainfall observatory network that covered the entirety of the British Isles. The network of rainfall observing stations was established by George Symons and became known as the British Rainfall Organisation. It relied almost exclusively on volunteer labour. The first section of the chapter details the early years of Symons’s Rainfall Organisation and its key administrative features, before moving on to discussions about rain gauges and station exposures. The chapter then examines a series of experimental trials that ran from 1863 to 1890 and discusses the ensuing controversy regarding the value of the experiments and of the observatory network more generally. The chapter then looks at contemporary discussions about the value of various statistical treatments of rain data, before finishing with Alexander Buchan’s and Hugh Robert Mill’s rainfall maps and the maps’ contributions to data management and public utility.
This chapter describes the basics of scientific figures. It provides tips for identifying different types of figures, such as experimental protocol figures, data figures, and summary figures. There is a description of ways to compare groups and of different types of variables. A short discussion of statistics is included, describing elements such as central tendency, dispersion, uncertainty, outliers, distributions, and statistical tests to assess differences. Following that is a short overview of a few of the more common graph types, such as bar graphs, boxplots, violin plots, and raincloud plots, describing the advantages that each provides. The end of the chapter is an “Understanding Graphs at a Glance” section which gives the reader a step-by-step outline for interpreting many of the graphs commonly used in neuroscience research, applicable independently of the methodology used to collect those data.
Network science is a broadly interdisciplinary field, pulling from computer science, mathematics, statistics, and more. The data scientist working with networks thus needs a broad base of knowledge, as network data calls for—and is analyzed with—many computational and mathematical tools. One needs good working knowledge in programming, including data structures and algorithms to effectively analyze networks. In addition to graph theory, probability theory is the foundation for any statistical modeling and data analysis. Linear algebra provides another foundation for network analysis and modeling because matrices are often the most natural way to represent graphs. Although this book assumes that readers are familiar with the basics of these topics, here we review the computational and mathematical concepts and notation that will be used throughout the book. You can use this chapter as a starting point for catching up on the basics, or as reference while delving into the book.
Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.
This chapter introduces the reader to facial recognition technology (FRT) history and the development of FRT from the perspective of science and technologies studies. Beginning with the traditionally accepted origins of FRT in 1964–1965, developed by Woody Bledsoe, Charles Bisson, and Helen Wolf Chan in the United States, Simon Taylor discusses how FRT builds on earlier applications in mug shot profiling, imaging, biometrics, and statistical categorisation. Grounded in the history of science and technology, the chapter demonstrates how critical aspects of FRT infrastructure are aided by scientific and cultural innovations from different times of locations: that is, mugshots in eighteenth-century France; mathematical analysis of caste in nineteenth-century British India; innovations by Chinese closed-circuit television companies and computer vision start-ups conducting bio-security experiments on farm animals. This helps to understand FRT development beyond the United States-centred narrative. The aim is to deconstruct historical data, mathematical, and digital materials that act as ‘back-stage elements’ to FRT and are not so easily located in infrastructure yet continue to shape uses today. Taylor’s analysis lays a foundation for the kinds of frameworks that can better help regulate and govern FRT as a means for power over populations in the following chapters.