Human languages have evolved many fascinating solutions to complex communicative problems through the use of words and grammatical structures. And they keep on evolving: language is an open system, a unique ability that brings infinite variety to the ways in which we communicate with others about our experiences in life. How is this possible? Can we understand this linguistic creativity? In my research, I try to answer these questions by developing powerful cognitive language technologies, which can be used to study open-ended and robust language processing, to explore innovative linguistic applications, and to function in large open collaborative communities.
Human language users are capable of proficiently learning new constructions and using a language for everyday communication even if they have only acquired a basic linguistic inventory. This paper argues that such robustness can best be achieved through a constructional processing model in which grammatical structures may emerge spontaneously as a side effect of how constructions are combined with each other. This claim is substantiated by a fully operational precision model for Basic English in Fluid Construction Grammar, which is available for online testing. The precision model is the first ever to incorporate key properties from construction grammar in a large-scale setting, such as argument structure constructions and the surface generalization hypothesis, and is therefore a milestone achievement in the field of construction grammar.
After several decades in scientific purgatory, language evolution has reclaimed its place as one of the most important branches in linguistics. This renewed interest is accompanied by powerful new methods for making empirical observations. At the same time, construction grammar is increasingly embraced in all areas of linguistics as a fruitful way of making sense of all these new data, and it has enthused formal and computational linguists, who have developed sophisticated tools for exploring issues in language processing and learning. Separately, linguists and computational linguists are able to explain which changes take place in language and how these changes are possible. When working together, however, they can also address the question of why language evolves over time and how it emerged in the first place. This special issue therefore brings together key contributions from both fields to put evidence and methods from both perspectives on the table.
Word order, argument structure and unbounded dependencies are among the most important topics in linguistics because they touch upon the core of the syntax-semantics interface. One question is whether ?marked? word order patterns, such as The man I talked to vs. I talked to the man, require special treatment by the grammar or not. Mainstream linguistics answers this question affirmatively: in the marked order, some mechanism is necessary for ?extracting? the man from its original argument position, and a special placement rule (e.g. topicalization) is needed for putting the constituent in clause-preceding position. This paper takes an opposing view and argues that such formal complexity is only required for analyses that are based on syntactic trees. A tree is a rigid data structure that only allows information to be shared between local nodes, hence it is inadequate for non-local dependencies and can only allow restricted word order variations. A construction, on the other hand, offers a more powerful representation device that allows word order variations ? even unbounded dependencies ? to be analyzed as the side-effect of how language users combine the same rules in different ways in order to satisfy their communicative needs. This claim is substantiated through a computational implementation of English argument structure constructions in Fluid Construction Grammar that can handle both comprehension and formulation.
Human languages have multiple strategies that allow us to discriminate objects in a vast variety of contexts. Colours have been extensively studied from this point of view. In particular, previous research in artificial language evolution has shown how artificial languages may emerge based on specific strategies to distinguish colours. Still, it has not been shown how several strategies of diverse complexity can be autonomously managed by artificial agents . We propose an intrinsic motivation system that allows agents in a population to create a shared artificial language and progressively increase its expressive power. Our results show that with such a system agents successfully regulate their language development, which indicates a relation between population size and consistency in the emergent communicative systems.
Long-distance dependencies belong to the most controversial challenges in linguistics. These patterns seem to contain constituents that have left their original position in a sentence and that have landed in a different place. A typical example is the relative clause the person I have talked to yesterday, in which the direct object (the person) is not situated in an argument position following the verb, but instead is located at the beginning of the utterance. Upon closer inspection, however, all problems related to long-distance dependencies can be reduced to the limits of phrase structural analyses. A phrase structure tree is a rigid data structure in which information is shared between local nodes. These analyses therefore need to resort to more complex formal machinery in order to overcome this locality constraint, such as using transformations or positing filler-gap constructions. However, there exists a more intuitive alternative within the tradition of cognitive-functional linguistics in which long-distance dependencies do not require special treatment. Instead, these patterns are simply the side effect of how grammatical constructions combine with each other in order to satisfy the communicative needs of language users. Through a computational implementation in Fluid Construction Grammar, this article demonstrates that it is perfectly feasible to formalize this alternative in a model that is capable of both formulating and comprehending utterances.
Natural languages enable humans to engage in highly complex social and conversational interactions with each other. Alife approaches to the origins and emergence of language typically manage this complexity by carefully staging the learning paths that embodied artificial agents need to follow in order to bootstrap their own communication system from scratch. This paper investigates how these scaffolds introduced by the experimenter can be removed by allowing agents to autonomously set their own challenges when they are driven by intrinsic motivation and have the capacity to self-assess their own skills at achieving their communicative goals. The results suggest that intrinsic motivation not only allows agents to spontaneously develop their own learning paths, but also that they are able to make faster transitions from one learning phase to the next.
Sign languages (SL) require a fundamental rethinking of many basic assumptions about human language processing because instead of using linear speech, sign languages coarticulate facial expressions, shoulder and hand movements, eye gaze and usage of a three-dimensional space. SL researchers have therefore advocated SL-specific approaches that do not start from the biases of models that were originally developed for vocal languages. Unfortunately, there are currently no processing models that adequately achieve both language comprehension and formulation, and the SL-specific developments run the risk of becoming alienated from other linguistic research. This paper explores the hypothesis that a construction grammar architecture offers a solution to these problems because constructions are able to simultaneously access and manipulate information coming from many different sources. This claim is illustrated by a proof-of-concept implementation of a basic grammar for French Sign Language in Fluid Construction Grammar.
One of the most salient hallmarks of construction grammar is its approach to argument structure and coercion: rather than positing many different verb senses in the lexicon, the same lexical construction may freely interact with multiple argument structure constructions. This view has however been criticized from within the construction grammar movement for leading to overgeneration. This paper argues that this criticism falls flat for two reasons: (1) lexicalism, which is the alternative solution proposed by the critics, has already been proven to overgenerate itself, and (2) the argument of overgeneration becomes void if grammar is implemented as a problem-solving model rather than as a generative competence model; a claim that the paper substantiates through a computational operationalization of argument structure and coercion in Fluid Construction Grammar. The paper thus shows that the current debate on argument structure is hiding a much more fundamental rift between practitioners of construction grammar that touches upon the role of grammar itself.
Long-distance dependencies are notoriously diffi cult to analyze in a formally explicit way because they involve constituents that seem to have been extracted from their canonical position in an utterance. The most widespread solution is to identify a GAP at an EXTRACTION SITE and to communicate information about that gap to its FILLER, as in What_FILLER did you see_GAP? This paper rejects the filler?gap solution and proposes a cognitive-functional alternative in which long-distance dependencies spontaneously emerge as a side eff ect of how grammatical constructions interact with each other for expressing diff erent conceptualizations. The proposal is supported by a computational implementation in Fluid Construction Grammar that works for both parsing and production.
Computational experiments in cultural language evolution are important because they help to reveal the cognitive mechanisms and cultural processes that continuously shape and reshape the structure and knowledge of language. However, understanding the intricate relations between these mechanisms and processes can be a daunting challenge. This paper proposes to recruit the concept of fitness landscapes from evolutionary biology and computer science for visualizing the ?linguistic fitness? of particular language systems. Through a case study on the German paradigm of definite articles, the paper shows how such landscapes can shed a new and unexpected light on non-trivial cases of language evolution. More specifically, the case study falsifies the widespread assumption that the paradigm is the accidental by-product of linguistic erosion. Instead, it has evolved to optimize the cognitive and perceptual resources that language users employ for achieving successful communication.
Fluid Construction Grammar (FCG) is an open-source computational grammar formalism that is becoming increasingly popular for studying the history and evolution of language. This demonstration shows how FCG can be used to operationalise the cultural processes and cognitive mechanisms that underly language evolution and change.
Construction Grammar has reached a stage of maturity where many researchers are looking for an explicit formal grounding of their work. Recently, there have been exciting developments to cater for this demand, most notably in Sign-Based Construction Grammar (SBCG) and Fluid Construction Grammar (FCG). Unfortunately, like playing a music instrument, the formalisms used by SBCG and FCG take time and effort to master, and linguists who are unfamiliar with them may not always appreciate the far-reaching theoretical consequences of adopting this or that approach. This paper undresses SBCG and FCG to their bare essentials, and offers a linguist-friendly comparison that looks at how both approaches define constructions, linguistic knowledge and language processing.
The German definite article paradigm, which is notorious for its case syncretism, is widely considered to be the accidental by-product of diachronic changes. This paper argues instead that the evolution of the paradigm has been motivated by the needs and constraints of language usage. This hypothesis is supported by experiments that compare the current paradigm to its Old High German ancestor (OHG; 900?1100ad) in terms of linguistic assessment criteria such as cue reliability, processing efficiency and ease of articulation. Such a comparison has been made possible by ?bringing back alive? the OHG system through a computational reconstruction
in the form of a processing model.The experiments demonstrate that syncretism has made the New High German system more efficient for processing, pronunciation and perception than its historical predecessor, without harming the language?s strength at disambiguating utterances.
Despite centuries of research, the origins of grammatical case are more mysterious than ever. This paper addresses some unanswered questions through language game experiments in which a multi-agent population self-organizes a morphosyntactic case system. The experiments show how the formal part of grammatical constructions may pressure such emergent systems to become more economical.
Case has fascinated linguists for centuries without however revealing its most important secrets. This paper offers operational explanations for case through language game experiments in which autonomous agents describe real-world events to each other. The experiments demonstrate (a) why a language may develop a case system, (b) how a population can self-organize a case system, and (c) why and how an existing case system may take on new functions in a language.
German case syncretism is often assumed to be the accidental by-product of historical development. This paper contradicts this claim and argues that the evolution of German case is driven by the need to optimize the cognitive effort and memory required for processing and interpretation. This hypothesis is supported by a novel kind of computational experiments that reconstruct and compare attested variations of the German definite article paradigm. The experiments show how the intricate interaction between those variations and the rest of the German ?linguistic landscape? may direct language change.
Linguistic utterances are full of errors and novel expressions, yet linguistic communication is remarkably robust. This paper presents a double-layered architecture for open-ended language processing, in which ?diagnostics? and ?repairs? operate on a meta-level for detecting and solving problems that may occur during habitual processing on a routine layer. Through concrete operational examples, this paper demonstrates how such an architecture can directly monitor and steer linguistic processing, and how language can be embedded in a larger cognitive system.
Almost all languages in the world have a way to formulate commands. Commands specify actions that the body should undertake (such as “stand up”), possibly involving other objects in the scene (such as “pick up the red block”). Action language involves various competences, in particular (i) the ability to perform an action and recognize which action has been performed by others (the so-called mirror problem), and (ii) the ability to identify which objects are to participate in the action (e.g. “the red block” in “pick up the red block”) and understand what role objects play, for example whether it is the agent or undergoer of the action, or the patient or target (as in “put the red block on top of the green one”). This chapter describes evolutionary language game experiments exploring how these competences originate, can be carried out and acquired, by real robots, using evolutionary language games and a whole systems approach.
Cognitive linguistics has reached a stage of maturity where many researchers are looking for an explicit formal grounding of their work. Unfortunately, most current models of deep language processing incorporate assumptions from generative grammar that are at odds with the cognitive movement in linguistics. This demonstration shows how Fluid Construction Grammar (FCG), a fully operational and bidirectional unification-based grammar formalism, caters for this increasing demand. FCG features many of the tools that were pioneered in computational linguistics in the 70s-90s, but combines them in an innovative way. This demonstration highlights the main differences between FCG and related formalisms.
Language change is increasingly recognized as one of the most crucial sources of evidence for understanding human cognition. Unfortunately, despite sophisticated methods for documenting which changes have taken place, the question of why languages evolve over time remains open for speculation. This paper presents a
novel research method that addresses this issue by combining agent-based experiments with deep language processing, and demonstrates the approach through a case study on German definite articles. More specifically, two populations of autonomous agents are equipped with a model of Old High German (500?1100 AD) and Modern High
German definite articles respectively, and a set of self-assessment criteria for evaluating their own linguistic performances. The experiments show that inefficiencies detected in the grammar by the Old High German agents correspond to grammatical forms that have actually undergone the most important changes in the German language.
The results thus suggest that the question of language change can be reformulated as an optimization problem in which language users try to achieve their communicative goals while allocating their cognitive resources as efficiently as possible.
The question how a shared vocabulary can arise in a multi-agent population despite the fact that each agent autonomously invents and acquires words has been solved. The solution is based on alignment: Agents score all associations between words and meanings in their lexicons and update these preference scores based on communicative success. A positive feedback loop between success and use thus arises which causes the spontaneous self-organization of a shared lexicon. The same approach has been proposed for explaining how a population can arrive at a shared grammar, in which we get the same problem of variation because each agent invents and acquires their own grammatical constructions. However, a problem arises if constructions reuse parts that can also exist on their own. This happens particularly when frequent usage patterns, which are based on compositional rules, are stored as such. The problem is how to maintain systematicity. This paper identifies this problem and proposes a solution in the form of multilevel alignment. Multilevel alignment means that the updating of preference scores is not restricted to the constructions that were used in the utterance but also downward and upward in the subsumption hierarchy.
Becoming a proficient speaker of a language requires more than just learning a set of words and grammar rules, it also implies mastering the ways in which speakers of that language typically innovate: stretching the meaning of words, introducing new grammatical constructions, introducing a new category, and so on. This paper demonstrates that such meta-knowledge can be represented and applied by reusing similar representations and processing techniques as needed for routine linguistic processing, which makes it possible that language processing makes use of computational reflection.
This paper compares two prominent approaches in artificial language evolution: Iterated Learning and Social Coordination. More specifically, the paper contrasts experiments in both approaches on how populations of artificial agents can autonomously develop a grammatical case marking system for indicating event structure (i.e. ?who does what to whom?). The comparison demonstrates that only the Social Coordination approach leads to a shared communication system in a multi-agent population. The paper concludes with an analysis and discussion of the results, and argues that Iterated Learning in its current form cannot explain the emergence of more complex natural language-like phenomena.
This paper presents a design pattern for handling argument structure and offers a concrete operationalization of this pattern in Fluid Construction Grammar. Argument structure concerns the mapping between ?participant structure? (who did what to whom) and instances of ?argument realization? (the linguistic expression of participant structures). This mapping is multilayered and indirect, which poses great challenges for grammar design. In the proposed design pattern, lexico-phrasal constructions introduce their semantic and syntactic potential of linkage. Argument structure constructions, then, select from this potential the values that they require and implement the actual linking.
This paper illustrates the use of ?feature matrices?, a technique for handling ambiguity and feature indeterminacy in feature structure grammars using unification as the single mechanism for processing. Both phenomena involve forms that can be mapped onto multiple, often conflicting values. This paper illustrates their respective challenges through German case agreement, which has become the litmus test for demonstrating how well a grammar formalism deals with multifunctionality. After reviewing two traditional solutions, the paper demonstrates how complex grammatical categories can be represented as feature matrices instead of single-valued features. Feature matrices allow a free flow of constraints on possible feature-values coming from any part of an utterance, and they postpone commitment to any particular value until sufficient constraints have been identified. All examples in this paper are operationalized in Fluid Construction Grammar, but the design principle can be extended to other unification-grammars as well.
Natural languages are fluid. New conventions may arise and there is never absolute consensus in a population. How can human language users nevertheless have such a high rate of communicative success? And how do they deal with the incomplete sentences, false starts, errors and noise that is common in normal discourse? Fluidity, ungrammaticality and error are key problems for formal descriptions of language and for computational implementations of language processing because these seem to be necessarily rigid and mechanical. This chapter discusses how these issues are approached within the framework of Fluid Construction Grammar. Fluidity is not achieved by a single mechanism but through a combination of intelligent grammar design and flexible processing principles.
Pronouns form a particularly interesting part-of-speech for evolutionary linguistics because
their development is often lagging behind with respect to other changes in their language. Many
hypotheses on pronoun evolution exist ? both for explaining their initial resilience to change as
well as for why they eventually cave in to evolutionary pressures ? but so far, no one has proposed
a formal model yet that operationalizes these explanations in a unified theory. This paper
therefore presents a computational model of pronoun evolution in a multi-agent population;
and argues that pronoun evolution can best be understood as an interplay between the level
of language strategies, which are the procedures for learning, expanding and aligning particular
features of language, and the level of the specific language systems that instantiate these
strategies in terms of concrete words, morphemes and grammatical structures. This claim is
supported by a case study on Spanish pronouns, which are currently undergoing an evolution
from a case- to a referential-based system, the latter of which there exist multiple variations
(which are called le�smo, la�smo and lo�smo depending on the type of change).
Semantic maps have offered linguists an appealing and empirically rooted methodology for describing recurrent structural patterns in language development and the multifunctionality of grammatical categories. Although some researchers argue that semantic maps are universal and given, others provide evidence that there are no fixed or universal maps. This paper takes the position that semantic maps are a useful way to visualize the grammatical evolution of a language (particularly the evolution of semantic structuring) but that this grammatical evolution is a consequence of distributed processes whereby language users shape and reshape their language. So it is a challenge to find out what these processes are and whether they indeed generate the kind of semantic maps observed for human languages. This work takes a design stance towards the question of the emergence of linguistic structure and investigates how grammar can be formed in populations of autonomous artificial ?agents? that play ?language games? with each other about situations they perceive through a sensori-motor embodiment. The experiments reported here investigate whether semantic maps for case markers could emerge through grammaticalization processes without the need for a universal conceptual space.
Grammatical agreement is one of the most puzzling aspects found in natural language. Its acquisition requires intensive linguistic exposure and capacities to deal with outliers that break regular patterns. Other than relying on statistical methods to deal with agreement in a computational application, this paper demonstrates how agreement can be learned by artificial agents in a simulated environment in such a way that the openendedness of natural language can be captured by their language processing mechanisms.
Aspect is undoubtedly the most capricious grammatical category of the Russian language. It has often been asserted as a mystery accessible only to native speakers, leaving all the others lost in its apparently infinite clutter. Recent work in cognitive linguistics has tried to bring order to the seeming chaos of the Russian aspectual system. But these approaches have not been operationalized so far. This paper demonstrates how the aspectual derivation of Russian verbs can be handled successfully with Fluid Constructional Grammar, a computational formalism recently developed for the representation and processing of constructions.
In this paper, we propose a concrete operationalization which incorporates data from the FrameNet database into Fluid Construction Grammar, currently the only computational implementation of construction grammar that can achieve both production and parsing using the same set of constructions. As a proof of concept, we selected an annotated sentence from the FrameNet database and transcribed its frame annotation analysis into an FCG grammar. The paper illustrates the proposed constructions and discusses the value and results of these formalization efforts.
This paper is part of an ongoing research program to understand the cognitive and functional bases for the origins and evolution of spatial language. Following a cognitive-functional approach, we first investigate the cross-linguistic variety in spatial language, with special attention for spatial perspective. Based on this language-typological data, we hypothesize which cognitive mechanisms are needed to explain this variety and argue for an interdisciplinary approach to test these hypotheses. We then explain how experiments in artificial language evolution can contribute to that and give a concrete example.
This paper shows how experiments on artificial language evolution can provide highly relevant results for important debates in linguistic theories. It reports on a series of experiments that investigate how semantic roles can emerge in a population of artificial embodied agents and how these agents can build a network of constructions. The experiment also includes a fully operational implementation of how event-specific participant-roles can be fused with the semantic roles of argument-structure constructions and thus contributes to the linguistic debate on how the syntax-semantics interface is organized.