Search

22 - Construction Grammar and Language Models
from Part VI - Constructional Applications
- By Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak
Edited by Mirjam Fried, Univerzita Karlova, Kiki Nikiforidou, University of Athens, Greece
Book:

The Cambridge Handbook of Construction Grammar

Published online:

30 January 2025

Print publication:

06 February 2025, pp 572-595
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register Recommend to librarian
- Export citation
Summary

Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields.

Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets – A feasibility study
Karl Gottfried, Karina Janson, Nathalie E. Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees, IMAC-Mind Consortium
Journal:

European Psychiatry / Volume 68 / Issue 1 / 2025

Published online by Cambridge University Press:

20 January 2025, e8
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background
Recent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models’ potential to detect opportunities for semantic harmonization.
Methods
Using models like SBERT and OpenAI’s ADA, we developed a prototype application (“Semantic Search Helper”) to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach’s feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.
Results
With the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.
Conclusions
This study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies.

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation
Björn E. Hommel, Franz-Josef M. Wollang, Veronika Kotova, Hannes Zacher, Stefan C. Schmukle
Journal:

Psychometrika / Volume 87 / Issue 2 / June 2022

Published online by Cambridge University Press:

01 January 2025, pp. 749-772
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Algorithmic automatic item generation can be used to obtain large quantities of cognitive items in the domains of knowledge and aptitude testing. However, conventional item models used by template-based automatic item generation techniques are not ideal for the creation of items for non-cognitive constructs. Progress in this area has been made recently by employing long short-term memory recurrent neural networks to produce word sequences that syntactically resemble items typically found in personality questionnaires. To date, such items have been produced unconditionally, without the possibility of selectively targeting personality domains. In this article, we offer a brief synopsis on past developments in natural language processing and explain why the automatic generation of construct-specific items has become attainable only due to recent technological progress. We propose that pre-trained causal transformer models can be fine-tuned to achieve this task using implicit parameterization in conjunction with conditional generation. We demonstrate this method in a tutorial-like fashion and finally compare aspects of validity in human- and machine-authored items using empirical data. Our study finds that approximately two-thirds of the automatically generated items show good psychometric properties (factor loadings above .40) and that one-third even have properties equivalent to established and highly curated human-authored items. Our work thus demonstrates the practical use of deep neural networks for non-cognitive automatic item generation.

Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages
Elisa M. Wirsching, Pedro L. Rodriguez, Arthur Spirling, Brandon M. Stewart
Journal:

Political Analysis , First View

Published online by Cambridge University Press:

27 December 2024, pp. 1-8
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Word embeddings are now a vital resource for social science research. However, obtaining high-quality training data for non-English languages can be difficult, and fitting embeddings therein may be computationally expensive. In addition, social scientists typically want to make statistical comparisons and do hypothesis tests on embeddings, yet this is nontrivial with current approaches. We provide three new data resources designed to ameliorate the union of these issues: (1) a new version of fastText model embeddings, (2) a multilanguage “a la carte” (ALC) embedding version of the fastText model, and (3) a multilanguage ALC embedding version of the well-known GloVe model. All three are fit to Wikipedia corpora. These materials are aimed at “low-resource” settings where the analysts lack access to large corpora in their language of interest or to the computational resources required to produce high-quality vector representations. We make these resources available for 40 languages, along with a code pipeline for another 117 languages available from Wikipedia corpora. We extensively validate the materials via reconstruction tests and other proofs-of-concept. We also conduct human crowdworker tests for our embeddings for Arabic, French, (traditional Mandarin) Chinese, Japanese, Korean, Russian, and Spanish. Finally, we offer some advice to practitioners using our resources.

Developing a method for creating structured representations of working of systems from natural language descriptions using the SAPPhIRE model of causality
Kausik Bhattacharya, Anubhab Majumder, Apoorv Naresh Bhatt, Sonal Keshwani, Ranjan BSC, Srinivasan Venkataraman, Amaresh Chakrabarti
Journal:

AI EDAM / Volume 38 / 2024

Published online by Cambridge University Press:

23 December 2024, e24
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Due to their significant role in creative design ideation, databases of causal ontology-based models for biological and technical systems have been developed. However, creating structured database entries through system models using a causal ontology requires the time and effort of experts. Researchers have worked toward developing methods that can automatically generate representations of systems from documents using causal ontologies by leveraging machine learning (ML) techniques. However, these methods use limited, hand-annotated data for building the ML models and have manual touchpoints that are not documented. While opportunities exist to improve the accuracy of these ML models, more importantly, it is required to understand the complete process of generating structured representations using causal ontology. This research proposes a new method and a set of rules to extract information relevant to the constructs of the SAPPhIRE model of causality from descriptions of technical systems in natural language and report the performance of this process. This process aims to understand the information in the context of the entire description. The method starts by identifying the system interactions involving material, energy and information and then builds the causal description of each system interaction using the SAPPhIRE ontology. This method was developed iteratively, verifying the improvements through user trials in every cycle. The user trials of this new method and rules with specialists and novice users of the SAPPhIRE modeling showed that the method helps in accurately and consistently extracting the information relevant to the constructs of the SAPPhIRE model from a given natural language description.

Walk the green talk? A textual analysis of pension funds' disclosures of sustainable investing
Rob Bauer, Dirk Broeders, Annick van Ool
Journal:

Journal of Pension Economics & Finance , First View

Published online by Cambridge University Press:

12 December 2024, pp. 1-29
- Article
- - You have access
- PDF
- HTML
- Export citation
We analyze the disclosures of sustainable investing by Dutch pension funds in their annual reports by introducing a novel textual analysis approach using state-of-the-art natural language processing techniques to measure the awareness and implementation of sustainable investing. We find that a pension fund's size increases both the awareness and implementation of sustainable investing. Moreover, we analyze the role of signing a sustainable investment initiative. Although signing this initiative increases the specificity of pension fund statements about sustainable investing, we do not find an effect on the implementation of sustainable investing.

ActuaryGPT: applications of large language models to insurance and actuarial work
Caesar Balona
Journal:

British Actuarial Journal / Volume 29 / 2024

Published online by Cambridge University Press:

21 November 2024, e15
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Recent advances in large language models (LLMs), such as GPT-4, have spurred interest in their potential applications across various fields, including actuarial work. This paper introduces the use of LLMs in actuarial and insurance-related tasks, both as direct contributors to actuarial modelling and as workflow assistants. It provides an overview of LLM concepts and their potential applications in actuarial science and insurance, examining specific areas where LLMs can be beneficial, including a detailed assessment of the claims process. Additionally, a decision framework for determining the suitability of LLMs for specific tasks is presented. Case studies with accompanying code showcase the potential of LLMs to enhance actuarial work. Overall, the results suggest that LLMs can be valuable tools for actuarial tasks involving natural language processing or structuring unstructured data and as workflow and coding assistants. However, their use in actuarial work also presents challenges, particularly regarding professionalism and ethics, for which high-level guidance is provided.

Detection of suicidality from medical text using privacy-preserving large language models
Isabella Catharina Wiest, Falk Gerrik Verhees, Dyke Ferber, Jiefu Zhu, Michael Bauer, Ute Lewitzka, Andrea Pfennig, Pavol Mikolas, Jakob Nikolas Kather
Journal:

The British Journal of Psychiatry / Volume 225 / Issue 6 / December 2024

Published online by Cambridge University Press:

05 November 2024, pp. 532-537

Print publication:

December 2024
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background
Attempts to use artificial intelligence (AI) in psychiatric disorders show moderate success, highlighting the potential of incorporating information from clinical assessments to improve the models. This study focuses on using large language models (LLMs) to detect suicide risk from medical text in psychiatric care.
Aims
To extract information about suicidality status from the admission notes in electronic health records (EHRs) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models.
Method
We compared the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from 100 psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity and F1 score across different prompting strategies.
Results
A German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83.0%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs.
Conclusions
The study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting information on suicidality from psychiatric records while preserving data privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research.

Realizing the potential of social determinants data in EHR systems: A scoping review of approaches for screening, linkage, extraction, analysis, and interventions
Chenyu Li, Danielle L. Mowery, Xiaomeng Ma, Rui Yang, Ugurcan Vurgun, Sy Hwang, Hayoung K. Donnelly, Harsh Bandhey, Yalini Senathirajah, Shyam Visweswaran, Eugene M. Sadhu, Zohaib Akhtar, Emily Getzen, Philip J. Freda, Qi Long, Michael J. Becich
Journal:

Journal of Clinical and Translational Science / Volume 8 / Issue 1 / 2024

Published online by Cambridge University Press:

10 October 2024, e147
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Background:
Social determinants of health (SDoH), such as socioeconomics and neighborhoods, strongly influence health outcomes. However, the current state of standardized SDoH data in electronic health records (EHRs) is lacking, a significant barrier to research and care quality.
Methods:
We conducted a PubMed search using “SDOH” and “EHR” Medical Subject Headings terms, analyzing included articles across five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions.
Results:
Of 685 articles identified, 324 underwent full review. Key findings include implementation of tailored screening instruments, census and claims data linkage for contextual SDoH profiles, NLP systems extracting SDoH from notes, associations between SDoH and healthcare utilization and chronic disease control, and integrated care management programs. However, variability across data sources, tools, and outcomes underscores the need for standardization.
Discussion:
Despite progress in identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical for SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately, widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.

PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag
Lukas Erhard, Sara Hanke, Uwe Remer, Agnieszka Falenska, Raphael Heiko Heiberger
Journal:

Political Analysis / Volume 33 / Issue 1 / January 2025

Published online by Cambridge University Press:

01 October 2024, pp. 1-17
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The rise of populism concerns many political scientists and practitioners, yet the detection of its underlying language remains fragmentary. This paper aims to provide a reliable, valid, and scalable approach to measure populist rhetoric. For that purpose, we created an annotated dataset based on parliamentary speeches of the German Bundestag (2013–2021). Following the ideational definition of populism, we label moralizing references to “the virtuous people” or “the corrupt elite” as core dimensions of populist language. To identify, in addition, how the thin ideology of populism is “thickened,” we annotate how populist statements are attached to left-wing or right-wing host ideologies. We then train a transformer-based model (PopBERT) as a multilabel classifier to detect and quantify each dimension. A battery of validation checks reveals that the model has a strong predictive accuracy, provides high qualitative face validity, matches party rankings of expert surveys, and detects out-of-sample text snippets correctly. PopBERT enables dynamic analyses of how German-speaking politicians and parties use populist language as a strategic device. Furthermore, the annotator-level data may also be applied in cross-domain applications or to develop related classifiers.

Stance detection: a practical guide to classifying political beliefs in text
Michael Burnham
Journal:

Political Science Research and Methods , First View

Published online by Cambridge University Press:

19 September 2024, pp. 1-18
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Stance detection is identifying expressed beliefs in a document. While researchers widely use sentiment analysis for this, recent research demonstrates that sentiment and stance are distinct. This paper advances text analysis methods by precisely defining stance detection and outlining three approaches: supervised classification, natural language inference, and in-context learning. I discuss how document context and trade-offs between resources and workload should inform your methods. For all three approaches I provide guidance on application and validation techniques, as well as coding tutorials for implementation. Finally, I demonstrate how newer classification approaches can replicate supervised classifiers.

Detecting suicide risk among U.S. servicemembers and veterans: a deep learning approach using social media data
Kelly L. Zuromski, Daniel M. Low, Noah C. Jones, Richard Kuzma, Daniel Kessler, Liutong Zhou, Erik K. Kastman, Jonathan Epstein, Carlos Madden, Satrajit S. Ghosh, David Gowel, Matthew K. Nock
Journal:

Psychological Medicine / Volume 54 / Issue 12 / September 2024

Published online by Cambridge University Press:

09 September 2024, pp. 3379-3388
- Article
- - You have access
- PDF
- HTML
- Export citation
Background
Military Servicemembers and Veterans are at elevated risk for suicide, but rarely self-identify to their leaders or clinicians regarding their experience of suicidal thoughts. We developed an algorithm to identify posts containing suicide-related content on a military-specific social media platform.
Methods
Publicly-shared social media posts (n = 8449) from a military-specific social media platform were reviewed and labeled by our team for the presence/absence of suicidal thoughts and behaviors and used to train several machine learning models to identify such posts.
Results
The best performing model was a deep learning (RoBERTa) model that incorporated post text and metadata and detected the presence of suicidal posts with relatively high sensitivity (0.85), specificity (0.96), precision (0.64), F1 score (0.73), and an area under the precision-recall curve of 0.84. Compared to non-suicidal posts, suicidal posts were more likely to contain explicit mentions of suicide, descriptions of risk factors (e.g. depression, PTSD) and help-seeking, and first-person singular pronouns.
Conclusions
Our results demonstrate the feasibility and potential promise of using social media posts to identify at-risk Servicemembers and Veterans. Future work will use this approach to deliver targeted interventions to social media users at risk for suicide.

Comparing the Clinical Trial Characteristics of Industry-Funded Trials and Non Industry-Funded Trials
Emily Hughes, Tamara Van Bakel, Ashley Raudanskis, Prachi Ray, Benazir Hodzic-Santor, Ushma Purohit, Chana A. Sacks, Michael Fralick
Journal:

Journal of Law, Medicine & Ethics / Volume 52 / Issue 3 / Fall 2024

Published online by Cambridge University Press:

16 December 2024, pp. 693-700

Print publication:

Fall 2024
- Article
- - You have access
- PDF
- HTML
- Export citation
We compared study characteristics of randomized controlled trials funded by industry (N=697) to those not funded by industry (N=835). RCTs published in high-impact journals are more likely to be blinded, more likely to include a placebo, and more likely to post trial results on ClinicalTrials.gov. Our findings emphasize the importance of evaluating the quality of an RCT based on its methodological rigor, not its funder type.

Quality of legislation and compliance: a natural language processing approach
Moritz Osnabrügge, Matia Vannoni
Journal:

Political Science Research and Methods , First View

Published online by Cambridge University Press:

15 August 2024, pp. 1-9
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Several disciplines, such as economics, law, and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states’ compliance with legislation. The finding on vagueness is robust to controlling for member states’ preferences, administrative resources, length of texts, and discretion. However, the results for syntactic complexity are less robust.

EHMMQA: English, Hindi, and Marathi multilingual question answering framework using deep learning
Pawan Lahoti, Namita Mittal, Girdhari Singh
Journal:

Natural Language Processing ,

Published online by Cambridge University Press:

24 May 2024, pp. 1-29
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Multilingual question answering (MQA) is an effective access to multilingual data to provide accurate and precise answers, irrespective of language. Although a wide range of datasets is available for monolingual QA systems in natural language processing, benchmark datasets specifically designed for MQA are considerably limited. The absence of comprehensive and benchmark datasets hinders the development and evaluation of MQA systems. To overcome this issue, the proposed work attempts to develop the EHMQuAD dataset, an MQA dataset for low-resource languages such as Hindi and Marathi accompanying the English language. The EHMQuAD dataset is developed using a synthetic corpora generation approach, and an alignment is performed after translation to make the dataset more accurate. Further, the EHMMQA model is proposed to create an abstract framework that uses a deep neural network that accepts pairs of questions and context and returns an accurate answer based on those questions. The shared question and shared context representation have been designed separately to develop this system. The experiments of the proposed model are conducted on the MMQA, Translated SQuAD, XQuAD, MLQA, and EHMQuAD datasets, and EM and F1-score are used as performance measures. The proposed model (EHMMQA) is collated with state-of-the-art MQA baseline models for all possible monolingual and multilingual settings. The results signify that EHMMQA is a considerable step toward the MQA system utilizing Hindi and Marathi languages. Hence, it becomes a new state-of-the-art model for Hindi and Marathi languages.

35 - Natural Language Processing
from Part 6 - Experimental and Quantitative Approaches
- By Tomaž Erjavec
Edited by Danko Šipka, Arizona State University, Wayles Browne, Cornell University, New York
Book:

The Cambridge Handbook of Slavic Linguistics

Published online:

16 May 2024

Print publication:

23 May 2024, pp 732-750
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register Recommend to librarian
- Export citation
Summary

This chapter surveys the history and main directions of natural language processing research in general, and for Slavic languages in particular. The field has grown enormously since its beginning. Especially since 2010, the amount of digital texts has been rapidly growing; furthermore, research has yielded an ever-greater number of highly usable applications. This is reflected in the increasing number and attendance of NLP conferences and workshops. Slavic countries are no exception; several have been organising international conferences for decades, and their proceedings are the best place to find publications on Slavic NLP research. The general trend of the evolution of NLP is difficult to predict. It is certain that deep learning, including various new types (e.g. contextual, multilingual) of word embeddings and similar ‘deep’ models will play an increasing role, while predictions also mention the increasing importance of the Universal Dependencies framework and treebanks and research into the theory, not only the practice, of deep learning, coupled with attempts at achieving better explainability of the resulting models.

The ENACT network is acting on housing instability and the unhoused using the open health natural language processing toolkit
Daniel R. Harris, Sunyang Fu, Andrew Wen, Alexandria Corbeau, Darren Henderson, Jordan Hilsman, David Oniani, Yanshan Wang
Journal:

Journal of Clinical and Translational Science / Volume 8 / Issue 1 / 2024

Published online by Cambridge University Press:

16 May 2024, e98
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Housing is an environmental social determinant of health that is linked to mortality and clinical outcomes. We developed a lexicon of housing-related concepts and rule-based natural language processing methods for identifying these housing-related concepts within clinical text. We piloted our methods on several test cohorts: a synthetic cohort generated by ChatGPT for initial infrastructure testing, a cohort with substance use disorders (SUD), and a cohort diagnosed with problems related to housing and economic circumstances (HEC). Our methods successfully identified housing concepts in our ChatGPT notes (recall = 1.0, precision = 1.0), our SUD population (recall = 0.9798, precision = 0.9898), and our HEC population (recall = N/A, precision = 0.9160).

Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing
Kinga Dobolyi, Sidra Hussain, Grady McPeak
Journal:

Disaster Medicine and Public Health Preparedness / Volume 18 / 2024

Published online by Cambridge University Press:

10 May 2024, e103
- Article
- - You have access
- PDF
- HTML
- Export citation
Objective:
Not all scientific publications are equally useful to policy-makers tasked with mitigating the spread and impact of diseases, especially at the start of novel epidemics and pandemics. The urgent need for actionable, evidence-based information is paramount, but the nature of preprint and peer-reviewed articles published during these times is often at odds with such goals. For example, a lack of novel results and a focus on opinions rather than evidence were common in coronavirus disease (COVID-19) publications at the start of the pandemic in 2019. In this work, we seek to automatically judge the utility of these scientific articles, from a public health policy making persepctive, using only their titles.
Methods:
Deep learning natural language processing (NLP) models were trained on scientific COVID-19 publication titles from the CORD-19 dataset and evaluated against expert-curated COVID-19 evidence to measure their real-world feasibility at screening these scientific publications in an automated manner.
Results:
This work demonstrates that it is possible to judge the utility of COVID-19 scientific articles, from a public health policy-making perspective, based on their title alone, using deep natural language processing (NLP) models.
Conclusions:
NLP models can be successfully trained on scienticic articles and used by public health experts to triage and filter the hundreds of new daily publications on novel diseases such as COVID-19 at the start of pandemics.

7 - Computational Approaches to Language
Ryan M. Nefdt, University of Cape Town
Book:

The Philosophy of Theoretical Linguistics

Published online:

25 April 2024

Print publication:

02 May 2024, pp 157-182
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register Recommend to librarian
- Export citation
Summary

In this chapter, a case is made for the inclusion of computational approaches to linguistics within the theoretical fold. Computational models aimed at application are a special case of predictive models. The status quo in the philosophy of linguistics is that explanation is scientifically prior to prediction. This is a mistake. Once corrected, the theoretical place of prediction is restored and, with it, computational models of language. The chapter first describes the history behind the emergence of explanation over prediction views in the general philosophy of science. It’s then suggested that this post-positivist intellectual milieu influenced the rejection of computational linguistics in the philosophy of theoretical linguistics. A case study of the predictive power already embedded in contemporary linguistic theory is presented through some work on negative polarity items. The discussion moves to the competence–performance divide informed by the so-called Galilean style in linguistics that retains the explanatory over prediction ideal. In the final sections of the chapter, continuous methods, such as probabilistic linguistics, are used to showcase the explanatory and predictive possibilities of nondiscrete approaches, before a discussion of the contemporary field of deep learning in natural language processing (NLP), where these predictive possibilities are further amplified.

The innovation paradox: concept space expansion with diminishing originality and the promise of creative artificial intelligence
Serhad Sarica, Jianxi Luo
Journal:

Design Science / Volume 10 / 2024

Published online by Cambridge University Press:

19 April 2024, e11
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Innovation, typically spurred by reusing, recombining and synthesizing existing concepts, is expected to result in an exponential growth of the concept space over time. However, our statistical analysis of TechNet, which is a comprehensive technology semantic network encompassing over 4 million concepts derived from patent texts, reveals a linear rather than exponential expansion of the overall technological concept space. Moreover, there is a notable decline in the originality of newly created concepts. These trends can be attributed to the constraints of human cognitive abilities to innovate beyond an ever-growing space of prior art, among other factors. Integrating creative artificial intelligence into the innovation process holds the potential to overcome these limitations and alter the observed trends in the future.

Search Results

Refine search

Refine search

Actions for selected content:

116 results

22 - Construction Grammar and Language Models

Summary

Semantic search helper: A tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets – A feasibility study

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages

Developing a method for creating structured representations of working of systems from natural language descriptions using the SAPPhIRE model of causality

Walk the green talk? A textual analysis of pension funds' disclosures of sustainable investing

ActuaryGPT: applications of large language models to insurance and actuarial work

Detection of suicidality from medical text using privacy-preserving large language models

Realizing the potential of social determinants data in EHR systems: A scoping review of approaches for screening, linkage, extraction, analysis, and interventions

PopBERT. Detecting Populism and Its Host Ideologies in the German Bundestag

Stance detection: a practical guide to classifying political beliefs in text

Detecting suicide risk among U.S. servicemembers and veterans: a deep learning approach using social media data

Comparing the Clinical Trial Characteristics of Industry-Funded Trials and Non Industry-Funded Trials

Quality of legislation and compliance: a natural language processing approach

EHMMQA: English, Hindi, and Marathi multilingual question answering framework using deep learning

35 - Natural Language Processing

Summary

The ENACT network is acting on housing instability and the unhoused using the open health natural language processing toolkit

Predicting the Utility of Scientific Articles for Emerging Pandemics Using Their Titles and Natural Language Processing

7 - Computational Approaches to Language

Summary

The innovation paradox: concept space expansion with diminishing originality and the promise of creative artificial intelligence

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

116 results

Summary

Summary

Summary