Natural language processing and the Now-or-Never bottleneck

Carlos Gómez-Rodríguez

doi:10.1017/S0140525X15000795

Natural language processing and the Now-or-Never bottleneck

Published online by Cambridge University Press: 02 June 2016

Carlos Gómez-Rodríguez

Show author details

Carlos Gómez-Rodríguez*: Affiliation:
LyS (Language and Information Society) Research Group, Departamento de Computación, Universidade da Coruña, Campus de Elviña, 15071, A Coruña, Spain. cgomezr@udc.eshttp://www.grupolys.org/~cgomezr

Article contents

Abstract
References

Rights & Permissions

Abstract

Researchers, motivated by the need to improve the efficiency of natural language processing tools to handle web-scale data, have recently arrived at models that remarkably match the expected features of human language processing under the Now-or-Never bottleneck framework. This provides additional support for said framework and highlights the research potential in the interaction between applied computational linguistics and cognitive science.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 39 , 2016 , e74

DOI: https://doi.org/10.1017/S0140525X15000795 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Christiansen & Chater (C&C) describe how the brain's limitations to retain language input (the Now-or-Never bottleneck) constrain and shape human language processing and acquisition.

Interestingly, there is a very strong coincidence between the characteristics of processing and learning under the Now-or-Never bottleneck and recent computational models used in the field of natural language processing (NLP), especially in syntactic parsing. C&C provide some comparison with classic cognitively inspired models of parsing, noting that they are in contradiction with the constraints of the Now-or-Never bottleneck. However, a close look at the recent NLP and computational linguistics literature (rather than the cognitive science literature) shows a clear trend toward systems and models that fit remarkably well with C&C's framework.

It is worth noting that most NLP research is driven by purely pragmatic, engineering-oriented requirements: The primary goal is not to find models that provide plausible explanations of the properties of language and its processing by humans, but rather to design systems that can parse text and utterances as accurately and efficiently as possible for practical applications like opinion mining, machine translation, or information extraction, among others.

In recent years, the need to develop faster parsers that can work on web-scale data has led to much research interest in incremental, data-driven parsers; mainly under the so-called transition-based (or shift-reduce) framework (Nivre Reference Nivre2008). This family of parsers has been implemented in systems such as MaltParser (Nivre et al. Reference Nivre, Hall, Nilsson, Chanev, Eryigit, Kübler, Marinov and Marsi2007), ZPar (Zhang & Clark Reference Zhang and Clark2011), ClearParser (Choi & McCallum Reference Choi, McCallum, Fung and Poesio2013), or Stanford CoreNLP (Chen & Manning Reference Chen, Manning, Moschitti, Pang and Daelemans2014), and it is increasingly popular because they are easy to train from annotated data and provide a very good trade-off between speed and accuracy.

Strikingly, these parsing models present practically all of the characteristics of processing and acquisition that C&C describe as originating from the Now-or-Never bottleneck in human processing:

Incremental processing (sect. 3.1): A defining feature of transition-based parsers is that they build syntactic analyses incrementally as they receive the input, from left to right. These systems can build analyses even under severe working memory constraints: Although the issue of “stacking up” with right-branching languages mentioned by C&C exists for so-called arc-standard parsers (Nivre Reference Nivre, Keller, Clark, Crocker and Steedman2004), parsers based on the arc-eager model (e.g., Gómez-Rodríguez & Nivre Reference Gómez-Rodríguez and Nivre2013; Nivre Reference Nivre, Bunt and Noord2003) do not accumulate right-branching structures in their stack; as they build dependency links as soon as possible. In these parsers, we only need to keep a word in the stack while we wait for its head or its direct dependents, so the time that linguistic units need to be retained in memory is kept to the bare minimum.
Multiple levels of linguistic structure (sect. 3.2): As C&C mention, the organization of linguistic representation in multiple levels is “typically assumed in the language sciences”; this includes computational linguistics and transition-based parsing models. Traditionally, each of these levels was processed sequentially in a pipeline, contrasting with the parallelism of the Chunk-and-Pass framework. However, the appearance of general incremental processing frameworks spanning various levels, from segmentation to parsing (Zhang & Clark Reference Zhang and Clark2011), has led to recent research on joint processing where the processing of several levels takes place simultaneously and in parallel, passing information between levels (Bohnet & Nivre Reference Bohnet, Nivre, Tsujii, Henderson and Pasca2012; Hatori et al. Reference Hatori, Matsuzaki, Miyao, Tsujii, Li, Lin, Osborne, Lee and Park2012). These models, which improve accuracy over pipeline models, are very close to the Chunk-and-Pass framework.
Predictive language processing (sect. 3.3): The joint processing models just mentioned are hypothesized to provide accuracy improvements precisely because they allow for a degree of predictive processing. Contrary to pipeline approaches where information only flows in a bottom-up way, these systems allow top-down information from higher levels “to constrain the processing of the input at lower levels,” just as C&C describe.
Acquisition as learning to process (sect. 4): Transition-based parsers learn a sequence of processing actions (transitions), rather than a grammar (Gómez-Rodríguez et al. Reference Gómez-Rodríguez, Sartorio, Satta, Moschitti, Pang and Daelemans2014; Nivre Reference Nivre2008), making the learning process simple and flexible.
Local learning (sect. 4.2): This is also a general characteristic of all transition-based parsers. Because they do not learn grammar rules but processing actions to take in specific situations, adding a new example to the training data will create only local changes to the inherent language model. At the implementation level, this typically corresponds to small weight changes in the underlying machine learning model – be it a support vector machine (SVM) classifier (Nivre et al. Reference Nivre, Hall, Nilsson, Chanev, Eryigit, Kübler, Marinov and Marsi2007), perceptron (Zhang & Clark Reference Zhang and Clark2011), or neural network (Chen & Manning Reference Chen, Manning, Moschitti, Pang and Daelemans2014), among other possibilities.
Online learning and learning to predict (sect. 4.1 and 4.3): Evaluation of NLP systems usually takes place in standard, fixed corpora, and so recent NLP literature has not placed much emphasis on online learning. However, some systems and frameworks do use online learning models with error-driven learning, like the perceptron (Zhang & Clark Reference Zhang and Clark2011). The recent surge of interest in parsing with neural networks (e.g., Chen & Manning Reference Chen, Manning, Moschitti, Pang and Daelemans2014; Dyer et al. Reference Dyer, Ballesteros, Ling, Matthews, Smith, Zong and Strube2015) also seems to point future research in this direction.

Putting it all together, we can see that researchers whose motivating goal was not psycholinguistic modeling, but only raw computational efficiency, have nevertheless arrived at models that conform to the description in the target article. This fact provides further support for the views C&C express.

A natural question arises about the extent to which this coincidence is attributable to similarities between the efficiency requirements of human and automated processing – or rather to the fact that because evolution shapes natural languages to be easy to process by humans (constrained by the Now-or-Never bottleneck), computational models that mirror human processing will naturally work well on them. Relevant differences between the brain and computers, such as in short-term memory capacity, seem to suggest the latter. Either way, there is clearly much to be gained from cross-fertilization between cognitive science and computational linguistics: For example, computational linguists can find inspiration in cognitive models for designing NLP tools that work efficiently with limited resources, and cognitive scientists can use computational tools as models to test their hypotheses. Bridging the gap between these areas of research is essential to further our understanding of language.

ACKNOWLEDGMENTS

This work was funded by the Spanish Ministry of Economy and Competitiveness/ERDF (grant FFI2014-51978-C2-2-R) and Xunta de Galicia (grant R2014/034). I thank Ramon Ferrer i Cancho for helpful comments on an early version of this commentary.

References

Bohnet, B. & Nivre, J. (2012) A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, July 12–14, 2012, ed. Tsujii, J., Henderson, J. & Pasca, M., pp. 1455–65. Association for Computational Linguistics.Google Scholar

Chen, D. & Manning, C. D. (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing, Doha, Qatar, October 25–29, 2014, ed. Moschitti, A., Pang, B. & Daelemans, W., pp. 740–50. Association for Computational Linguistics.CrossRef Google Scholar

Choi, J. D. & McCallum, A. (2013) Transition-based dependency parsing with selectional branching. In: Proceedings of the 51 ^st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, August 4–9, 2013, ed. Fung, P. & Poesio, M., pp. 1052–62. Association for Computational Linguistics.Google Scholar

Dyer, C., Ballesteros, M., Ling, W., Matthews, A. & Smith, N. (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Beijing, China, July 26–31, 2015, ed. Zong, C. & Strube, M., pp. 334–43. Association for Computational Linguistics.CrossRef Google Scholar

Gómez-Rodríguez, C. & Nivre, J. (2013) Divisible transition systems and multiplanar dependency parsing. Computational Linguistics 39(4):799–45.CrossRef Google Scholar

Gómez-Rodríguez, C., Sartorio, F. & Satta, G. (2014) A polynomial-time dynamic oracle for non-projective dependency parsing. In: Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing, Doha, Qatar, October 25–29, 2014, ed. Moschitti, A., Pang, B. & Daelemans, W., pp. 917–27. Association for Computational Linguistics.CrossRef Google Scholar

Hatori, J., Matsuzaki, T., Miyao, Y. & Tsujii, J. (2012) Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jeju Island, Korea, July 8–14, 2012, ed. Li, H., Lin, C-Y., Osborne, M., Lee, G. G. & Park, J. C., pp. 1216–24. Association for Computational Linguistics.Google Scholar

Nivre, J. (2003) An efficient algorithm for projective dependency parsing. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), Nancy, France, April 23–25, 2003, ed. Bunt, H. & Noord, G. van, pp. 149–60. Association for Computational Linguistics.Google Scholar

Nivre, J. (2004) Incrementality in deterministic dependency parsing. In: Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, ed. Keller, F., Clark, S., Crocker, M. & Steedman, M., pp. 50–57. Association for Computational Linguistics.CrossRef Google Scholar

Nivre, J. (2008) Algorithms for deterministic incremental dependency parsing. Computational Linguistics 34(4):513–53.CrossRef Google Scholar

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S. & Marsi, E. (2007) MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(2):95–35.CrossRef Google Scholar

Zhang, Y. & Clark, S. (2011) Syntactic processing using the generalized perceptron and beam search. Computational Linguistics 37(1):105–51.CrossRef Google Scholar