We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The use of differential equations on graphs as a framework for the mathematical analysis of images emerged about fifteen years ago and since then it has burgeoned, and with applications also to machine learning. The authors have written a bird's eye view of theoretical developments that will enable newcomers to quickly get a flavour of key results and ideas. Additionally, they provide an substantial bibliography which will point readers to where fuller details and other directions can be explored. This title is also available as open access on Cambridge Core.
We report the results of a field experiment designed to increase honest disclosure of claims at a U.S. state unemployment agency. Individuals filing claims were randomized to a message (‘nudge’) intervention, while an off-the-shelf machine learning algorithm calculated claimants’ risk for committing fraud (underreporting earnings). We study the causal effects of algorithmic targeting on the effectiveness of nudge messages: Without algorithmic targeting, the average treatment effect of the messages was insignificant; in contrast, the use of algorithmic targeting revealed significant heterogeneous treatment effects across claimants. Claimants predicted to behave unethically by the algorithm were more likely to disclose earnings when receiving a message relative to a control condition, with claimants predicted to most likely behave unethically being almost twice as likely to disclose earnings when shown a message. In addition to providing a potential blueprint for targeting more costly interventions, our study offers a novel perspective for the use and efficiency of data science in the public sector without violating citizens’ agency. However, we caution that, while algorithms can enable tailored policy, their ethical use must be ensured at all times.
Weed infestations have been identified as a major cause of yield reductions in rapeseed (Brassica napus L.), a vital oil crop that has gained significant prominence in Iran, especially within Fars Province. Weed management using machine learning algorithms has become a crucial approach within the framework of precision agriculture for enhancing the efficacy and efficiency of weed control strategies. The evolution of habitat suitability models for weeds represents a significant advancement in agricultural technology, offering the capability to predict weed occurrence and proliferation accurately and reliably. This study focuses on the issue of dominant weed infestation in rapeseed cultivation, particularly emphasizing the prevalence and impact of wild oat (Avena fatua L.) as the dominant weed species in rapeseed farming in 2023. We collected data on 12 environmental variables related to topography, climate, and soil properties to develop habitat suitability models. Three “machine learning techniques”, including “random forest (RF)”, “support vector machine (SVM)”, and “boosted regression tree (BRT)”, were estimated based on the “receiver operating characteristic (ROC) and area under the curve (AUC)” to model the distribution of A. fatua. Model performance was quantified using the “ROC curve and AUC” metrics to identify the best predictive algorithm. The findings indicated that “Random Forest (RF), boosted regression tree (BRT), and support vector machine (SVM)” models exhibited accuracies of 99%, 97%, and 96% for the habitat suitability of A. fatua, respectively. The Boruta feature selection method identified the slope variable as significantly influential in wild oat habitat suitability modeling, followed by plan curvature, clay, temperature, and silt. This study serves as a case study that highlights the utility of machine learning for habitat suitability predictions when information on multiple environmental variables is available. This approach supports effective weed management strategies, potentially enhancing rapeseed productivity and mitigating the ecological impacts associated with weed infestation.
The Asian corn borer, Ostrinia furnacalis (Guenée), emerges as a significant threat to maize cultivation, inflicting substantial damage upon the crops. Particularly, its larval stage represents a critical point characterised by significant economic consequences on maize yield. To manage the infestation of this pest effectively, timely and precise identification of its larval stages is required. Currently, the absence of techniques capable of addressing this urgent need poses a formidable challenge to agricultural practitioners. To mitigate this issue, the current study aims to establish models conducive to the identification of larval stages. Furthermore, this study aims to devise predictive models for estimating larval weights, thereby enhancing the precision and efficacy of pest management strategies. For this, 9 classification and 11 regression models were established using four feature datasets based on the following features geometry, colour, and texture. Effectiveness of the models was determined by comparing metrics such as accuracy, precision, recall, F1-score, coefficient of determination, root mean squared error, mean absolute error, and mean absolute percentage error. Furthermore, Shapley Additive exPlanations analysis was employed to analyse the importance of features. Our results revealed that for instar identification, the DecisionTreeClassifier model exhibited the best performance with an accuracy of 84%. For larval weight, the SupportVectorRegressor model performed best with R2 of 0.9742. Overall, these findings present a novel and accurate approach to identify instar and predict the weight of O. furnacalis larvae, offering valuable insights for the implementation of management strategies against this key pest.
Human activity recognition (HAR) is a vital component of human–robot collaboration. Recognizing the operational elements involved in an operator’s task is essential for realizing this vision, and HAR plays a key role in achieving this. However, recognizing human activity in an industrial setting differs from recognizing daily living activities. An operator’s activity must be divided into fine elements to ensure efficient task completion. Despite this, there is relatively little related research in the literature. This study aims to develop machine learning models to classify the sequential movement elements of a task. To illustrate this, three logistic operations in an integrated circuit (IC) design house were studied, with participants wearing 13 inertial measurement units manufactured by XSENS to mimic the tasks. The kinematics data were collected to develop the machine learning models. The time series data preprocessing involved applying two normalization methods and three different window lengths. Eleven features were extracted from the processed data to train the classification models. Model validation was carried out using the subject-independent method, with data from three participants excluded from the training dataset. The results indicate that the developed model can efficiently classify operational elements when the operator performs the activity accurately. However, incorrect classifications occurred when the operator missed an operation or awkwardly performed the task. RGB video clips helped identify these misclassifications, which can be used by supervisors for training purposes or by industrial engineers for work improvement.
This article presents an ultrawide bandpass filter structure developed along a notch band using a small rectangular impedance resonator. The proposed filter structure consists of a coupled rectangular resonator (CRR), open stub, and composited split ring resonator (CSRR) at the bottom of the structure. In-band and out-of-band properties are improved by the CRR and open stub. The notch band is obtained by placing CSRR below the rectangular resonator. A filter with a compact size of 0.15 × 0.10 λg is obtained at a lowered cutoff frequency of 3.0 GHz, where λg is the corresponding guided wavelength. The proposed structure has been constructed on 5880 Rogers substrate with a thickness of 0.787 mm and a dielectric constant of 2.2. Additionally, equivalent lumped parameters were obtained, and a lumped equivalent circuit was created to explain how the suggested filter operated. The Electromagnetic (EM)-simulated results are in good agreement with the circuit-simulated and measured result. The various machine learning approaches such as artificial neural network, K-nearest neighbour, decision tree, random forest (RF), and extreme gradient boosting algorithms are applied to optimize the design, in which RF algorithms achieve more than 90% accuracy for predicting the S parameters of the ultrawideband filter.
Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world’s freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics and process knowledge into classical, deep learning, and transfer learning methodologies. The analysis here suggests most prior efforts have been focused on deep learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
In working towards meeting the rapidly rising demand for livestock products in the face of challenges such as climate change, limited forage land availability and inadequacies in water availability and quality, it is imperative to consider sustainability in farm or grazing land management and water resources conservation as well as biodiversity management and conservation, etc. Geophysics, GIS, remote sensing, etc., have been useful tools. Emerging technologies such as biotechnology, advanced sensor technologies, machine learning algorithms, internet of things, artificial intelligence, unmanned aerial vehicles, robotics, etc., are also being employed in agriculture and other aspects of human concerns. There are potentials for better utilization of these emerging technologies and more in livestock production and management. However, a limitation is that relevant knowledge and skills are still relatively inadequate, especially in developing countries; hence the need for this review, which is an enhancement of knowledge for research and improved productivity. Efforts should be made to advance in knowledge and skills acquisition so as to optimize this development for improved livestock production and management.
Vibration-based structural health monitoring (SHM) of (large) infrastructure through operational modal analysis (OMA) is a commonly adopted strategy. This is typically a four-step process, comprising estimation, tracking, data normalization, and decision-making. These steps are essential to ensure structural modes are correctly identified, and results are normalized for environmental and operational variability (EOV). Other challenges, such as nonstructural modes in the OMA, for example, rotor harmonics in (offshore) wind turbines (OWTs), further complicate the process. Typically, these four steps are considered independently, making the method simple and robust, but rather limited in challenging applications, such as OWTs. Therefore, this study aims to combine tracking, data normalization, and decision-making through a single machine learning (ML) model. The presented SHM framework starts by identifying a “healthy” training dataset, representative of all relevant EOV, for all structural modes. Subsequently, operational and weather data are used for feature selection and a comparative analysis of ML models, leading to the selection of tree-based learners for natural frequency prediction. Uncertainty quantification (UQ) is introduced to identify out-of-distribution instances, crucial to guarantee low modeling error and ensure only high-fidelity structural modes are tracked. This study uses virtual ensembles for UQ through the variance between multiple truncated submodel predictions. Practical application to monopile-supported OWT data demonstrates the tracking abilities, separating structural modes from rotor dynamics. Control charts show improved decision-making compared to traditional reference-based methods. A synthetic dataset further confirms the approach’s robustness in identifying relevant natural frequency shifts. This study presents a comprehensive data-driven approach for vibration-based SHM.
Displacement continues to increase at a global scale and is increasingly happening in complex, multicrisis settings, leading to more complex and deeper humanitarian needs. Humanitarian needs are therefore increasingly outgrowing the available humanitarian funding. Thus, responding to vulnerabilities before disaster strikes is crucial but anticipatory action is contingent on the ability to accurately forecast what will happen in the future. Forecasting and contingency planning are not new in the humanitarian sector, where scenario-building continues to be an exercise conducted in most humanitarian operations to strategically plan for coming events. However, the accuracy of these exercises remains limited. To address this challenge and work with the objective of providing the humanitarian sector with more accurate forecasts to enhance the protection of vulnerable groups, the Danish Refugee Council has already developed several machine learning models. The Anticipatory Humanitarian Action for Displacement uses machine learning to forecast displacement in subdistricts in the Liptako-Gourma region in Sahel, covering Burkina Faso, Mali, and Niger. The model is mainly built on data related to conflict, food insecurity, vegetation health, and the prevalence of underweight to forecast displacement. In this article, we will detail how the model works, the accuracy and limitations of the model, and how we are translating the forecasts into action by using them for anticipatory action in South Sudan and Burkina Faso, including concrete examples of activities that can be implemented ahead of displacement in the place of origin, along routes and in place of destination.
We propose a framework for identifying discrete behavioural types in experimental data. We re-analyse data from six previous studies of public goods voluntary contribution games. Using hierarchical clustering analysis, we construct a typology of behaviour based on a similarity measure between strategies. We identify four types with distinct stereotypical behaviours, which together account for about 90% of participants. Compared to the previous approaches, our method produces a classification in which different types are more clearly distinguished in terms of strategic behaviour and the resulting economic implications.
Detecting and removing hate speech content in a timely manner remains a challenge for social media platforms. Automated techniques such as deep learning models offer solutions which can keep up with the volume and velocity of user content production. Research in this area has mainly focused on either binary classification or on classifying tweets into generalised categories such as hateful, offensive, or neither. Less attention has been given to multiclass classification of online hate speech into the type of hate or group at which it is directed. By aggregating and re-annotating several relevant hate speech datasets, this study presents a dataset and evaluates several models for classifying tweets into the categories ethnicity, gender, religion, sexuality, and non-hate. We evaluate the dataset by training several models: logistic regression, LSTM, BERT, and GPT-2. For the LSTM model, we assess a range of NLP features using a multi-classification LSTM model, and conclude that the highest performing feature combination consists of word $n$-grams, character $n$-grams, and dependency tuples. We show that while more recent larger models can achieve a slightly higher performance, increased model complexity alone is not sufficient to achieve significantly improved models. We also compare this approach with a binary classification approach and evaluate the effect of dataset size on model performance.
Machine learning models have been used extensively in hydrology, but issues persist with regard to their transparency, and there is currently no identifiable best practice for forcing variables in streamflow or flood modeling. In this paper, using data from the Centre for Ecology & Hydrology’s National River Flow Archive and from the European Centre for Medium-Range Weather Forecasts, we present a study that focuses on the input variable set for a neural network streamflow model to demonstrate how certain variables can be internalized, leading to a compressed feature set. By highlighting this capability to learn effectively using proxy variables, we demonstrate a more transferable framework that minimizes sensing requirements and that enables a route toward generalizing models.
Environmental data science for spatial extremes has traditionally relied heavily on max-stable processes. Even though the popularity of these models has perhaps peaked with statisticians, they are still perceived and considered as the “state of the art” in many applied fields. However, while the asymptotic theory supporting the use of max-stable processes is mathematically rigorous and comprehensive, we think that it has also been overused, if not misused, in environmental applications, to the detriment of more purposeful and meticulously validated models. In this article, we review the main limitations of max-stable process models, and strongly argue against their systematic use in environmental studies. Alternative solutions based on more flexible frameworks using the exceedances of variables above appropriately chosen high thresholds are discussed, and an outlook on future research is given. We consider the opportunities offered by hybridizing machine learning with extreme-value statistics, highlighting seven key recommendations moving forward.
Herbicide-resistant weeds are fast becoming a substantial global problem, causing significant crop losses and food insecurity. Late detection of resistant weeds leads to increasing economic losses. Traditionally, genetic sequencing and herbicide dose-response studies are used to detect herbicide-resistant weeds, but these are expensive and slow processes. To address this problem, an artificial intelligence (AI)-based herbicide-resistant weed identifier program (HRIP) was developed to quickly and accurately distinguish acetolactate synthetase inhibitor (ALS)-resistant from -susceptible common chickweed plants. A regular camera was converted to capture light wavelengths from 300 to 1,100 nm. Full spectrum images from a two-year experiment were used to develop a hyperparameter-tuned convolutional neural network (CNN) model utilizing a “train from scratch” approach. This novel approach exploits the subtle differences in the spectral signature of ALS-resistant and -susceptible common chickweed plants as they react differently to the ALS herbicide treatments. The HRIP was able to identify ALS-resistant common chickweed as early as 72 hours after treatment at an accuracy of 88%. It has broad applicability due to its ability to distinguish ALS-resistant from -susceptible common chickweed plants regardless of the type of ALS herbicide or dose used. Utilizing tools such as the HRIP will allow farmers to make timely interventions to prevent the herbicide-escape plants from completing their life cycle and adding to the weed seedbank.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
There has been a growing recognition of the significant role played by the human gut microbiota in altering the bioavailability as well as the pharmacokinetic and pharmacodynamic aspects of orally ingested xenobiotic and biotic molecules. The determination of species-specific contributions to the metabolism of biotic and xenobiotic molecules has the potential to aid in the development of new therapeutic and nutraceutical molecules that can modulate human gut microbiota. Here we present “GutBugDB,” an open-access digital repository that provides information on potential gut microbiome-mediated biotransformation of biotic and xenobiotic molecules using the predictions from the GutBug tool. This database is constructed using metabolic proteins from 690 gut bacterial genomes and 363,872 protein enzymes assigned with their EC numbers (with representative Expasy ID and domains present). It provides information on gut microbiome enzyme-mediated metabolic biotransformation for 1439 FDA-approved drugs and nutraceuticals. GutBugDB is publicly available at https://metabiosys.iiserb.ac.in/gutbugdb/.
Super-resolution of turbulence is a term used to describe the prediction of high-resolution snapshots of a flow from coarse-grained observations. This is typically accomplished with a deep neural network and training usually requires a dataset of high-resolution images. An approach is presented here in which robust super-resolution can be performed without access to high-resolution reference data, as might be expected in an experiment. The training procedure is similar to data assimilation, wherein the model learns to predict an initial condition that leads to accurate coarse-grained predictions at later times, while only being shown coarse-grained observations. Implementation of the approach requires the use of a fully differentiable flow solver in the training loop to allow for time-marching of predictions. A range of models are trained on data generated from forced, two-dimensional turbulence. The networks have reconstruction errors which are similar to those obtained with ‘standard’ super-resolution approaches using high-resolution data. Furthermore, the methods are comparable to the performance of standard data assimilation for state estimation on individual trajectories, outperforming these variational approaches at initial time and remaining robust when unrolled in time where performance of the standard data-assimilation algorithm improves.
Rapid urbanization poses several challenges, especially when faced with an uncontrolled urban development plan. Therefore, it often leads to anarchic occupation and expansion of cities, resulting in the phenomenon of urban sprawl (US). To support sustainable decision–making in urban planning and policy development, a more effective approach to addressing this issue through US simulation and prediction is essential. Despite the work published in the literature on the use of deep learning (DL) methods to simulate US indicators, almost no work has been published to assess what has already been done, the potential, the issues, and the challenges ahead. By synthesising existing research, we aim to assess the current landscape of the use of DL in modelling US. This article elucidates the complexities of US, focusing on its multifaceted challenges and implications. Through an examination of DL methodologies, we aim to highlight their effectiveness in capturing the complex spatial patterns and relationships associated with US. This work begins by demystifying US, highlighting its multifaceted challenges. In addition, the article examines the synergy between DL and conventional methods, highlighting the advantages and disadvantages. It emerges that the use of DL in the simulation and forecasting of US indicators is increasing, and its potential is very promising for guiding strategic decisions to control and mitigate this phenomenon. Of course, this is not without major challenges, both in terms of data and models and in terms of strategic city planning policies.
Risk-based surveillance is now a well-established paradigm in epidemiology, involving the distribution of sampling efforts differentially in time, space, and within populations, based on multiple risk factors. To assess and map the risk of the presence of the bacterium Xylella fastidiosa, we have compiled a dataset that includes factors influencing plant development and thus the spread of such harmful organism. To this end, we have collected, preprocessed, and gathered information and data related to land types, soil compositions, and climatic conditions to predict and assess the probability of risk associated with X. fastidiosa in relation to environmental features. This resource can be of interest to researchers conducting analyses on X. fastidiosa and, more generally, to researchers working on geospatial modeling of risk related to plant infectious diseases.