Concept formation as knowledge accumulation: A computational linguistics study

ANDY DONG

doi:10.1017/S0890060406060033

Concept formation as knowledge accumulation: A computational linguistics study

Published online by Cambridge University Press: 10 February 2006

ANDY DONG

Show author details

ANDY DONG: Affiliation:
Key Centre of Design Computing and Cognition, University of Sydney, Sydney, Australia

Article contents

Abstract
1. INTRODUCTION
2. CONCEPT FORMATION: A LINGUISTIC PERSPECTIVE
3. LEXICAL CHAIN ANALYSIS
4. EXPERIMENTS
5. RESULTS
6. CONCLUSION
ACKNOWLEDGMENTS
References

Rights & Permissions

Abstract

Language plays at least two roles in design. First, language serves as representations of ideas and concepts through linguistic behaviors that represent the structure of thought during the design process. Second, language also performs actions and creates states of affairs. Based on these two perspectives on language use in design, we apply the computational linguistics tools of latent semantic analysis and lexical chain analysis to characterize how design teams engage in concept formation as the accumulation of knowledge represented by lexicalized concepts. The accumulation is described in a data structure comprised by a set of links between elemental lexicalized concepts. The folding together of these two perspectives on language use in design with the information processing theories of the mind afforded by the computational linguistics tools applied creates a new means to evaluate concept formation in design teams. The method suggests that analysis at a linguistic level can characterize concept formation even where process-oriented critiques were limited in their ability to uncover a formal design method that could explain the phenomenon.

Keywords

Concept Formation Design Language Latent Semantic Analysis Lexical Chain Analysis

Type: Research Article
Information: AI EDAM , Volume 20 , Issue 1 , February 2006 , pp. 35 - 53

DOI: https://doi.org/10.1017/S0890060406060033 [Opens in a new window]
Copyright: 2006 Cambridge University Press

1. INTRODUCTION

Representations of design ideas take on forms that could be classified into two broad categories: visual forms (e.g., sketches, drawings, and physical or digital models), and linguistic forms (language-based representations). Contemporary design research amply accounts for how visual forms contribute to the interpretive and reinterpretive cycles that designers experience in practice (e.g., Oxman, 2002; van der Lugt, 2005). Visual reasoning as a process in which designers engage in a “reflective conversation with his or her ideas” (Schön, 1983) is considered a primary cognitive activity in design practice, an activity during which the designed artifact is at once realized but also made mental in the designer's mind. Buchanan (1989) extends this notion in declaring that design representations assist in “persuasively presenting and declaring that thought in products” to others. As a complement to studies that model design as a knowledge-based system in which designers operate on visual forms of knowledge representation, this research presents a linguistic view how language as a form of representation in design serves as a mechanism by which the development of design concepts is enacted.

Words as a form of design representation have normally been treated as the way that designers consciously encode their thoughts and make those thoughts accessible to the external world. This first perspective on language use in design is the basis of a rich body of research in design cognition based on verbal protocol analyses of designers thinking aloud. A key premise of design research employing language as the vehicle for understanding human behavior in design is that understanding the structure of designer's thinking processes as evidenced through language could illuminate the nature of design. The basic idea of the theory is

given that verbal communication portrays the cognitive processes of the designers and
given that certain types of knowledge are stored as semantic memory,
then semantic and grammatical structures of language-based communication encode designer's thoughts.

Analyses of individual designers' “think-aloud” sessions and design team discussions using protocol analysis have led to interesting insights into thinking behavior such as how a designer conjectures solutions (Lloyd et al., 1995), shared understanding in design (Valkenburg, 1998), and the basic cognitive processes in design groups (Stempfle & Badke-Schaub, 2002). Computational linguistic analyses using latent semantic analysis modeled the similarity of language use to the distributed cognition process of bridging indirect relations among components of knowledge stored in each designer's mind (Dong, 2005).

Outside the concern of design cognition, design rationale capture systems make use of language as linguistic residue of thought from the design process to facilitate the capture, archival, and retrieval of design documentation that record decisions made, who made those decisions, and the rationale behind those decisions (Regli et al., 2000). The linguistic form of documentation recording the design rationale spans the spectrum of formality and structure. Through a formal functional requirements language expressed as a pair of transitive verbs and nouns, Jacobsen et al. (1991) attempted to capture the functional–structural reasoning processes engineering designers conduct. Garcia and Howard's (1992) active design documents system specifies formal relations between parameters; when the parameters deviate from known ranges, the system requires the designer to document the knowledge, which may be expressed in natural language text. The conceptual design information server (Wood & Agogino, 1996) case-based information retrieval system creates indices into design cases represented as semistructured hypertext media. Regardless of the formality with which language is employed to encode design rationale, all of these systems share the perspective of language as representational, a reproduction of designer's thoughts.

Language also performs actions and creates states of affair in design practice. One such action is coordinating social activity. When designers employ linguistic strategies of persuasion to adopt design concepts (Brereton et al., 1996), language is used as a vehicle for engaging in and coordinating social activity, not as a way to represent cognitively held design concepts. Language may also be deployed to maintain positive relations between team members, accomplish socialization, and enforce norms of behavior in the group (Poole, 1999).

Language has operated as a formal design tool as was done by the Center for Design Research at Stanford University and Enterprise Integration Technologies. They developed a formal language called Knowledge Interchange Format (KIF) so that designers could communicate and transfer knowledge among different knowledge bases residing at various companies, thereby requesting information and services from each other (Cutkosky et al., 1993).

In action, language could also serve as a dialogic vehicle for mental activity. Recognizing that words have been an underemployed source of reasoning, de Vries et al. (2004) developed word graphs to stimulate architects' thinking during design. What is perhaps most intriguing about the word graphs is that linguistic interactions with word graphs and graphical modeling (i.e., visual forms) served equivalent functions in mediating interaction between the design representations and the architects' cognitive structures.

When language operates as an agent for mediated action as described in the four previous cases, language is used as a “tool” in design. Language use does things: it accomplishes reflection, performs actions and enables designers to project possibilities, forms design concepts, and negotiates the value of design concepts. Thinking about language use in design as a tool means seeing language as a mechanism for performing design practice.

In this article, we synthesize these two views of language use in design to facilitate the description of concept formation in synchronous group design. We propose that concept formation in group design could be described as knowledge accumulation driven by language use. Our perspective on concept formation as the accumulation of knowledge representations is not simply about the generation of ideas expressed as noun phrases (as studied by Mabogunje & Leifer, 1997). Elementary design concepts could be ideated in simple lexicalized concepts as noun phrases (e.g., gear, backpack, mountain bike). More developed and fully formed design concepts are accumulated from the elemental ideas.

Computationally, the accumulation could be described in a data structure comprising a set of links between elemental ideas (lexicalized concepts).

Suppose that the following conversation takes place between two (mechanical) design engineers.

Engineer 1: We're going to redesign the front suspension to replace the dated I-beam design. Management wants to reduce the number of parts. So, one idea is that the transverse structural member must support both the transmission and the suspension.

Engineer 2: There's been discussion that we should move toward an upper and lower control arm configuration, with the lower control arm being connected to a torsion bar to control vertical damping of the lower control arm.

Engineer 1: That's the idea for the front suspension. We're retaining the leaf spring design for the rear suspension at least through the next model year.

Engineer 2: So you want to sketch out a basic configuration for the front suspension today?

Engineer 1: Yes, including the brackets to see if we could package it within the allotted space.

To a trained mechanical designer with automotive suspension design experience, it would be clear that these designers would likely continue towards designing the suspension control arms and the connection between the torsion bar and either the vehicle frame or a torsion bar isolation bracket. Linguistically, what holds this conversation together is that each of the italicized phrases (elemental lexicalized concepts) relate to a more general (abstract) concept of structural members or supports. In addition, if we could generate a data structure that accumulates the semantic links between the following elemental lexicalized concepts, I-beam, transverse structural member, brackets, torsion bar, we could derive a general indication about the design concept these mechanical designers are generating. Thus, our aim is to reveal, computationally, this type of accumulation of knowledge leading to concept formation in group design.

We limit our investigation to group design because their cognitive activities are distributed across social spaces; thus, the structuring of language to communicate will affect the cognitive properties of the group. This phenomenon may not be evident for an individual designer during a think-aloud session. The teams in the experiments reported are not just talking for the sake of talking. They are engaging in a conversation to design an artifact. They interact in a goal-oriented conversation where their goal is to form design concepts.

We will use the machinery of computational linguistics to explore our thesis. The use of computational linguistics is significant. When Shannon (1948) published his theory of communication, he showed that it was possible to model the generation of communication as a probabilistic system based on relatively simple rules on the statistical co-occurrence of letters in English words. The two computational linguistics tools we will apply, latent semantic analysis and lexical chain analysis, follow in this philosophy. Latent semantic analysis (Landauer, 1999) claims that the statistical co-occurrence of words in discourse models the underlying knowledge representation of the communicator and that meaning emerges from the statistical co-occurrence. Our application of lexical chain analysis will suggest that the occurrence of semantic links in discourse reveals the way that ideas are thought of and connected between communicators. Concept formation is driven by the accumulation of knowledge, where the accumulation is evidenced linguistically by the amassing of semantic link connections between lexicalized concepts.

This approach to understanding human behavior in design differs from the cognitivism approaches by researchers in symbolic artificial intelligence. Rather than explaining concept formation by a prescribed cognitive structure, the statistical language processing perspective lets us see concept formation in group design as a bottom-up accumulation and integration of information and ideas. The research described here folds together computational linguistics tools derived from the statistical language processing perspective that simple rules based on the hierarchy of letters [rarr ] words [rarr ] semantic links are a useful way to model the production of communication and to establish a new computational method to examine concept formation in design through linguistic analysis.

Specifically, we use a statistical natural language processing technique, latent semantic analysis, to assess windows of thematic coherence, that is, periods of time during the designers' conversation when their discussion appears to focus on one set of ideas and the ideas “make sense” and to assess the overlap of ideas. Using computational linguistics derived data on thematic coherence, we then “drill down” into the design conversations using lexical chain analysis to examine the lexicosyntactic structure that enables the designers to offer, interrelate, and develop concepts. We then discuss how these two perspectives on language use in design, coupled with the machinery of computational linguistics, clarifies how language both encodes thoughts and enables design practice. Through our analysis, we propose that analyzing language use in design with these perspectives enables us to describe how teams of designers form concepts in teams through the accumulation of knowledge.

Our approach shares similarities with connectionist models of concept formation in design. One of the basic tenets of connectionist models as a model of the mind, in contrast to symbolic reasoning systems, is that the mind could be described in terms of a network of interconnected units rather than as a symbol processor. Concepts emerge from connections across schemas (Coyne et al., 1993) rather than through a prescribed reasoning process (e.g., Benami & Jin, 2002). A consequence of the connectionist view is to regard the actors in group design not as symbolic processors but rather as individuals in states of connectedness. The connections occur, as we shall describe, as the designers offer and interrelate lexicalized concepts. The designers transmit partial information about design concepts to one another, and in doing so they integrate and accumulate the knowledge represented as lexicalized concepts into fully developed concepts. Our research will illustrate how our linguistic perspective usefully addresses the emergent nature of concept formation. Second, our linguistic perspective allows us to describe concept formation in design as it occurs in design practice; perhaps of more practical and industrial significance, the computational linguistics tools open the potential to assess concept formation in near real time, which could lead to a new generation of design thinking critique tools.

2. CONCEPT FORMATION: A LINGUISTIC PERSPECTIVE

Designers bring individual knowledge and perspectives to a group design situation. Language and the meaning of words encode their knowledge and perspectives, thereby operating as facilitators that bridge gaps of knowledge between what individual team members know and the larger body of experience held by the team. The hypothesis in this research is that the accumulation of knowledge by design team members through the exchange and negotiation of lexicalized concepts is one way in which internal structures of knowledge for each designer scaffold to support concept formation. In group design, concept formation would manifest linguistically through the accumulation of each designer's knowledge represented as lexicalized concepts. The accumulation is enacted by referring to lexicalized concepts and by connecting lexicalized concepts through propositions that are operating at higher levels of abstraction. Thus, the linguistic features by which the accumulation would be evidenced are repetition, anaphora, and hypernyms, a lexical concept that is a generic class of concepts. Each of these linguistic features could be distinguished as types of semantic links between lexicalized concepts. By analyzing the designers' conversation for the type of semantic links, accumulation could then be described through a data structure comprised in the semantic links between lexicalized concepts. In this analysis, we will not perform anaphoric resolution because anaphora could be considered a type of repetition. This description of concept formation takes into account both perspectives of language use in design: language encoding thoughts, where the lexicalized concepts represent each designer's ideas, and language performing actions, where the designers operate on lexicalized concepts to accumulate knowledge through exchange and negotiation.

A lexicalized concept is a concept (idea) that has been expressed as a word in the vocabulary of a given language. A concept can be lexicalized by more than one word; thus “bicycle” and “mountain bike” are both elemental, lexicalized concepts. The underlying assumption of our technique is that a design concept can be represented by the set of word forms, that is, the accumulation of a set of lexicalized concepts. At the level of granularity of our analysis, lexicalized concepts are not the type of fully developed design concepts that would found in Pugh charts. The lexicalized concepts in our analysis are chunks of knowledge that, when accumulated by the design team, may form a fully developed design concept. The interest is to analyze the epistemology of concept formation in design teams by examining semantic features of the words that designers say to account for the accumulation of knowledge across each designer.

We illustrate the hypothesis in the diagram in Figure 1. Based on the language-encoding thoughts perspective, a knowledge representation is presumed to exist in each designer's mind (the large circles), and what the designer thinks is partially reflected in what the designer says through lexicalized concepts (indicated by the horizontal lines with a different line type for each designer). A conversation produces a collection of lexicalized concepts (the bold circle). However, a collection of words in and of themselves is not sufficient evidence of concept formation; the knowledge that the words (lexicalized concepts) represent needs to be integrated and accumulated by the designers (shown in the dashed line box to indicate the contribution of each designer's lexicalized concepts). We hypothesize that the principal mechanism for concept formation is through accumulation of lexicalized concepts of knowledge stored in each designer's mind. The accumulation is enacted by the designers' cognitive operation of connecting lexicalized concepts directly (repetition) and at higher levels of abstraction (hypernyms). It has been proposed that designers reason at functional, behavioral, and structural levels about an artifact (Gero, 1990), and it is likely that designers engage in this type of reasoning individually even in group design situations. Although we would agree with this assertion, we believe that accumulation is also necessary in group design to bring together relations that do not yet exist in the separate minds of the designers.

A diagram of accumulation. [A color version of this figure can be viewed online at www.journals.cambridge.org]

To investigate and model the linguistic process of accumulation, we first construct lexical chains using lexical chain analysis and then connect the chains through mutual words. The construction of lexical chains is based on the psycholinguistic representation of the organization of semantic concepts in WordNet (Fellbaum, 1998). WordNet is a lexical system that organizes English nouns, verbs, adjectives, and adverbs into lexicalized concepts connected by semantic links. WordNet does not claim that its structure is how people actually organize concepts in their minds; rather, WordNet models semantic links based on the lexicographic definitions of English language words. The intent of WordNet was to represent lexicalized concepts in a way that would enable researchers to study the psychology of how humans think about concepts, make connections between and among them, and use context to ascertain the appropriate sense of a lexicalized concept. To understand the structure of WordNet, it is important to define a few terms.

gloss: the definition of a lexical concept

sense: the idea that is intended by a lexical concept

synset: a set of one or more synonyms

hypernym: a lexical concept that is a generic class of concepts

hyponym: a lexical concept that is a member of a class of concepts

meronym: a lexical concept that designates a concept as a constituent component of another class

For example, the gloss of the lexical concept “gear” as a noun in an engineering sense would be “a mechanical component which transmits rotational motion from one body to another.” The lexical concept gear is a meronym of the concept “planetary gear train” as would be the terms “sun gear” and “arm,” all of which are constituent components of a planetary gear train. An “epicyclic gear train” or “derailleur gears” would be part of the synset of the lexical concept planetary gear train at the same semantic level. The concept “gear train” is a hypernym of epicyclic gear train, planetary gear train, and derailleur gear, and the concept of a “mechanism” would be a hypernym of gear train. The reverse would be hyponyms. These relationships are illustrated in Figure 2 where A stands for hypernym, B for a synonym, and C for a meronym.

The lexical relations between concepts.

To match our terminology with the graphical structure of WordNet, we define a hypernym as an upward link, a synonym as a horizontal link, a meronym as a part-of link, and a hyponym as a downward link. Examples of the upward (Fig. 3), horizontal (Fig. 4), part-of (Fig. 5), and downward (Fig. 6) relationships for words (lexicalized concepts) from the experimental data set are given below.

The hypernym (upward) relation between two words: the concepts “adjustment” and “move” are a kind of “change.”

For synonyms, a “base” is synonymous with a “stand” related by the concept “support.”

The meronym (part-of) relation between words: a “wheel” is a part of a “bicycle.”

The hyponym (downward) relation between words: a “circumference” is a more specific concept of both “length” and “size.”

To reveal the accumulation of each designer's ideas, expressed as lexicalized concepts, we need to show how their lexicalized concepts become interrelated and how the expression of a lexicalized concept influences subsequent lexicalized concepts. Then, we need to connect the chains through mutual lexicalized concepts to illustrate accumulation over the course of a design conversation.

3. LEXICAL CHAIN ANALYSIS

One computable technique to generate semantic links is lexical chain analysis. A lexical chain is a sequence of semantically related words in text (or conversation). Lexical chain analysis arises from the semantic connections between words that are typically derived from large lexical databases such as WordNet. A standard method for constructing lexical chains, such as for text summarization (Silber & McCoy, 2002), is to extract the set of lexicalized concepts from the document set and then iteratively add lexicalized concepts to a lexical chain when a semantic link exists between a lexicalized concept and lexicalized concepts in an existing chain. However, our intent is to examine if concept formation could be characterized by the accumulation of knowledge represented through lexicalized concepts through their various levels of meaning. Because we argue that this process happens bottom-up through the designers' ability to analyze information in context, we do not wish to look top-down at what concepts were generated (a text summarization process) but rather to look at concept formation through emergent expression and accumulation of lexicalized concepts. Thus, our method proceeds consecutively through the utterances. A lexical chain forms and grows as the result of the accumulation of lexicalized concepts through logically consistent semantic links. A chain breaks when the semantic links no longer support the accumulation of lexicalized concepts.

As a consequence of the interest in examining the building up (accumulation) of knowledge leading to concept formation, assumptions need to be made about the duration of the conversation during which designers retain a frame about the concepts that have been introduced by the other designers. These assumptions also have computational complexity consequences on the lexical chain algorithm (Figs. 7, 8, and 9). The assumptions are that

the designer remembers only what has just been said and lexical chain links can only be established between adjacent utterances;
the designer remembers a general framework of everything that has been said and lexical chain links can be established between an utterance and any prior utterance; and
the designer remembers a “small” frame of concepts within a window of thematic coherence, that is, during a period of time in which the topic of conversation is roughly constant; lexical chain links can be established within a window.

The pattern of links for assumption 1. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The pattern of links for assumption 2. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The pattern of links for assumption 3. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Assumption 2 leads to an O(n!) algorithm, and is thus not computationally tractable. Thus, we do not proceed with this assumption. However, assumption 2 is equivalent to connecting individual lexical chains through mutual lexicalized concepts even if the chain were broken during the conversation. Assumption 1 is subsumed by assumption 3 and is redundant. Thus, we proceed under the assumption that the designer retains a schematic representation of lexicalized concepts previously voiced; collaborative concept formation is enacted by an accumulation of these lexicalized concepts.

The process for constructing a lexical chain proceeds as follows:

One could manually segment the conversation (as is done in verbal protocol analysis) or segment the conversation by postulating when breaks in the thematic coherence occur. To find the window of thematic coherence, we assumed that the topical focus of the design conversation over time is generally coherent, but that utterances further away from each other would be less likely to be thematically similar. This is a similar to a method developed to examine thematic coherence in written text (Foltz et al., 1998). Then, there should exist a mostly ordered relation between semantic similarity and distance between utterance boundaries. An utterance boundary is defined by the turns taken between speakers, that is, when one speaker stops talking and the next speaker begins. This ordered relation can be revealed by examining the coherence between any two utterances (communicative acts) as a function of the “distance” between the utterance boundaries. That is, instead of calculating the coherence between adjacent utterances, calculate the average coherence between utterances that are “one” utterance away, the average coherence between utterances that are “two” utterance boundaries away, and so forth, to expose the structuring of language over the entire conversation. Given the perspective of language as expressing ideas, we would argue that the patterns of utterance thematic coherence portray the accumulated overlap of their lexicalized concepts. The decay of coherence would then indicate when the teams change directions in thinking, which could usefully be applied to segment their conversation for lexical chain analysis.

Coherence is defined per the standard definition. The coherence between any two utterances represented by the vectors d_q and d_q+t is the dot product of the utterance vectors normalized by the product of their norm.

We then find the best polynomial curve fit f (t), where t is the distance between utterances and f (t) is the value of the coherence. The point on the curve that is halfway between the maximum coherence (the average coherence between adjacent utterances) and the asymptotic limit of coherence is defined as the size of the window of a thematically coherent segment of the conversation. The asymptotic limit of coherence is defined by the point on the curve fit where the slope of the curve fit is nearest to zero. That is, the location of the asymptotic limit t_a is defined by a real-valued minimum:

where t_n is the total number of utterances.

A bounded minimization search on the derivative of the curve fit locates t_a. In practice, the choice of the window size w is arbitrary: the higher the value of w, the longer the candidate lexical chains and the higher the number of semantic links. The value of w only affects the speed of the lexical chain analysis, but not the validity of the chains found because the chains can be connected through mutual lexicalized concepts. Thus, if a chain breaks because no semantic links can be found between lexicalized chains within a window, the chain could nonetheless be joined with another chain if the two chains share mutual lexicalized concepts. The smaller chains thus join to form a longer accumulated chain of lexicalized concepts. However, a window size of one may miss chains when insufficient content is generated by the speakers in successive turns, such as then they utter “mmm” successively. The automated calculation of the window of thematic coherence enables a fully automated analysis.

4. EXPERIMENTS

Four group design sessions were analyzed. The first design session was a transcript from the mountain bike backpack design problem at the 1994 Delft Protocols Workshop. The team was tasked with designing, and hence creating, concepts, for ways to connect a backpack to a bicycle. The next three come from the Bamberg Study (Stempfle & Badke-Schaub, 2002) in which the teams designed a planetary gear train set. Native German speakers with mechanical engineering backgrounds translated the Bamberg Study transcripts into English. We chose these transcripts to enable qualitative comparisons between the results of our analysis methods with previously published studies of these design teams.

The transcripts were parsed and part of speech tagged for English language words using the Stanford Java NLP (http://www-nlp.stanford.edu/javanlp/) from which a set of nouns, based on the tagged part of speech, were extracted. Then, a standard word by document matrix, where each “document” is an utterance, was created. Each entry in the word by document matrix counts the number of times a content-bearing noun appears.

The Delft transcript contains 2190 “raw” utterances among three designers over a 118-min period. There were 1236 content-bearing utterances, excluding utterances containing invectives and noncontent terms such as “mmmm,” “oh,” and “laughs.” Participants I, J, and K spoke 34, 39, and 27%, respectively, of the conversation. According to a qualitative profile of this team, John is the ideas person (Goldschmidt, 1996) and is the most active in driving the direction of the team. Ivan is the process manager and the timekeeper who summarizes but weakly influences the team. Kerry has the most domain knowledge, and appears to make specific contributions to the functional specifications.

There are three teams in the Bamberg Study, denoted by 1102, 2202, and 2302. Team 1102 consisted of six participants (A–F) who contributed 15, 21, 9, 20, 18, and 16% of the content-bearing utterances; team 2202 consisted of four participants (A–D), who contributed 32, 16, 30, and 21%; and team 2302 consisted of four participants (A–D), who spoke 40, 13, 19, and 28%. For brevity, as an example notation we use 1102/D to refer to participant D in Bamberg team 1102.

During the WordNet analysis, only the noun category for all senses of a lexicalized concept was searched. Although word sense disambiguation is a general issue for computational linguistics, our reading of the transcripts is that the designers tended to stick with a single sense of a word. In addition, our quantitative results indicated that, within the window of thematic coherence, there were no lexicalized concepts with ambiguous semantic links to another lexicalized concept. For the thematic coherence analyses, we processed the word by document matrix using latent semantic analysis following a standard procedure for analyzing design team communication (Dong, 2005; such as retaining dimensions 2–101 for the k-reduced matrix).

5. RESULTS

Figures 10, 11, 12, and 13 display the average thematic coherence as a function of the distance between utterance boundaries as well as a third-order curve fit of the data for all data sets. The plots are intriguing in that they appear fairly regular with a nonzero slope initially, but eventually approach an asymptotic limit. Outside a certain distance between utterances, the coherence drops off and is highly scattered. The regularity of the curves, and their similarity to curves derived from written text, suggests a logical consistency in the ideas expressed by the designers in each of these teams.

The log coherence for the Delft team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 1102 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Using a third-order curve fit and Eq. (2), we calculated the window of coherent conversations for each team as summarized in Table 1.

Window of coherent conversation

Based on these window sizes, we calculated the accumulation of lexicalized concepts by the number and type of semantic links. The values reported in Tables 2, 3, 4, and 5 quantify the frequency of occurrence and the types of semantic links that interrelate the lexicalized concepts. In addition to counting the horizontal (synonym), hypernym (a lexical concept that is a generic class of concepts), hyponym (a lexical concept that is a member of a class of concepts), and meronym (a lexical concept that designates a concept as a constituent component of another class) relations, we counted the number of lexicalized concepts generated (a lexicalized concept that has no prior link) and the repetition of lexicalized concepts (Figs. 14, 15, 16, and 17).

Number of chain links Delft backpack design team (w = 3)

Number of chain links Bamberg 1102 team (w = 3)

Number of chain links Bamberg 2202 team (w = 6)

Number of chain links Bamberg 2302 team (w = 22)

The percentage of semantic links for the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 1102 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

In total, and as a percentage of semantic links, the team members repeated words most often (except for Bamberg 2202), which was probably necessary to keep their conversations lexically cohesive. More interesting, though, were the high number and percentage of hypernym relations relative to the other types of semantic links, even in comparison to repetition. If we describe concept formation, in a cognitive sense, as the accumulation of knowledge representations, then the linguistic behavior of connecting lexicalized concepts through hypernyms indicate that the designers were connecting elemental lexicalized concepts through higher levels of abstraction. For each of the teams, there was at least one person who exhibited higher numbers of hypernym relations in the semantic links relative to the others; we could conjecture that this role is crucial in group design. For the Delft team, the high number of hypernym relations relative to the others is especially pronounced for John as expected, given John's known qualitative profile. Aside from 2302/A, the difference in the number of hypernyms cannot be accounted for solely by the number of utterances by each team member. There is no appreciable difference in the number of utterances by each team member that could account for the significant differences in the number of hypernym relations. For example, 2202/C spoke 2% less often than 2202/A, but had 47% more hypernym relations; 1102/D spoke 1% less often than 1102/A but had 51% more hypernym relations. Two other potential explanations exist for this phenomenon. First, John, 1102/D, 2202/C, and 2302/A could be what Sonnenwald (1996) has characterized as “interdisciplinary stars” by their ability to abstract knowledge from others and then to accumulate concepts from others. Second, because experts in a domain reason at more abstract levels (Zeitz, 1997), hypernym relations may serve as a proxy for identifying the “expert” reasoner in a design group if one accepts the assumption that one characteristic of experts is that they tend to engage in abstract reasoning more often than nonexperts.

Because the lexical chains are constructed by successively chaining utterances expressed by different participants in the team, lexicalized concepts flow through and from each participant. To quantify the flow of concepts, we define the following relationships between u and v:

a weak relationship (value = 1) if the synset(u) relates to a common word x of synset(v) in one WordNet classification direction, upward, downward, or horizontal or
a strong relationship (value = 3) if u and v co-occur or repeat in adjacent utterances or, v is a part of u, or u and v share common parts (meronym).

These relationships will be used to assess the influence of lexicalized concepts between the designers, or what we term the “strength of ties” between the designers. For illustrative purposes, the strength of ties for the Delft backpack team and Bamberg 2202 are shown in Figures 18 and 19, respectively. The strengths of ties for the other teams are represented in Tables 6 and 7 in which the rows containing D and A, respectively, indicate the participant with the strongest total strength of ties. For each team, there is a central participant who has the strongest set of ties to the other participants. John, 1102/D, 2202/C, and 2302/A are central, consistent with their high number of hypernym links within each respective team.

The strength of the ties for the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The strength of the ties for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Strength of ties for Bamberg 1102 team

Strength of ties for Bamberg 2302 team

Again, the percentage of contributions to the team's conversation cannot fully account for the stronger strength of ties among some of the team participants. In the case of team 2302, one would expect 2302/A to have much stronger ties than the others because this participant dominated the conversation, and, indeed, this is the case. However, 1102/D and 2202/C have a stronger total strength of ties to the other participants despite speaking less often than the most frequent speakers 1102/B and 2202/A, respectively. Instead, we might characterize these central participants as having the ability to connect with and join ideas from other team members. Further, the evidence suggests that this ability is not necessarily related to assertiveness in terms of speaking more often than others. Whereas the other participants state and name concepts, the central participant appears to recombine them to effectuate a concept. Thus, although concept formation requires each designer to bridge indirect relations among the concepts stored in each designer's mind, the evidence suggests the need for and existence of a person in the design team who specializes in the assemblage of the concepts.

Because there were six team members in Bamberg 1102, as opposed to four in both Bamberg 2202 and 2302, the flow of the conversation may have been impeded due to extra effort expended in coordinating the conversation or a subgroup forming within the larger group, which may account for the differences in the production of hypernym relations. The data suggest that other factors may be operating, however. First, each team member in Bamberg 1102, except 1102/C, contributed roughly equivalently to the conversation. Second, and more interestingly, the Bamberg researchers reported that “proceeding in the design task can be labeled from ‘chaotic’ (group 3) to ‘planned’ (group 1)” where group 3 is team 2302 and group 1 is team 1102 (Stempfle & Badke-Schaub, 2002, p. 484). Third, the number of hypernym links for 1102 is at least an order of magnitude lower than 2202 and 2302. Conversation flow and the number of people do not appear to justifiably account for this discrepancy.

We conjecture that the planned behavior of team 1102 could account for the higher percentage of repetitions relative to hypernyms when compared to the other Bamberg teams and the Delft team. That is, their planned design behavior manifests in the linguistic behavior described by Table 3; each team member reproduces and repeats lexicalized concepts. Instead, the other teams exhibit differentiated repetition of lexicalized concepts. The differentiated repetition appears as hypernyms that connect concepts that are similar yet differentiated by a level of abstraction. This behavior of differentiated repetition may signal the productivity of language use in design toward the formation of concepts. However, further evidence is needed to verify this hypothesis.

The planned behavior of 1102 also raises a conundrum related to normative design methods. In design research, there is a tendency to recommend that design teams follow prescribed methodologies because those methodologies deliver positive outcomes. However, this was certainly not the case for team 2302 who, according to the researchers, followed no prescribed design method. This is why the Bamberg researchers developed a model of design thinking based on the teams' actual design practices. This apparent contradiction might be explained by the researchers' observation (Stempfle, 2004) that team 2302 experienced disagreements and challenges of ideas that nonetheless lead to careful analyses and selection of a design idea and a positive design outcome. Thus, as Stempfle and Badke-Schaub (2002) argued, the accepted primacy of teaching and recommending normative methods is questioned.

This analysis suggests another way to explain this contradiction: why a design team that followed no prescribed design methodology outperformed a team that did. We dissected the conversations by speaker to examine the contribution of each speaker's thematic coherence to the group's thematic coherence for teams 1102 and 2302.

If verbalizations express the designers' thoughts, and if concept formation is about accumulating concepts by constructing knowledge representations contributed by each individual, then the patterns in Figures 20 and 21should show that the overall thematic coherence for the group is higher than that for each of the team members. Higher thematic coherence means that there is more overlap in the expressed concepts, more accumulation of lexicalized concepts. That is, we would expect additive behavior. The lexicalized concepts augment and build upon one another, and there is a genuine coconstruction of design knowledge. In Figure 20 (team 1102), each individual's coherence (nonfilled shapes) is “scattered” within a fairly linear band that also bounds the group's overall thematic coherence, as shown by the solid (red) dots. Each person in the group appears to be saying (contributing) approximately the same concepts. Conversely, as shown in Figure 21 (team 2302), although each speaker's coherence (shown by the nonfilled shapes) is roughly constant, in combination (solid red dots), they increased the thematic coherence of the conversation. Thus, despite team 2302 being labeled chaotic and not following any prescribed design methodology, the statistical patterns of semantic links (e.g., the hypernym relations) illustrate a high level of accumulation of concepts and a pattern of individual to group thematic coherence. This additive thematic coherence, we would argue, allows us to characterize this team as being a productive design team. The Delft team is similarly productive, as shown in Figure 22. Our characterizations based on the quantitative data are consistent with the observed characterizations.

The Bamberg 1102 team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The Bamberg 2302 team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The Delft team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Finally, each lexical chain was connected through mutual lexicalized concepts and contextualized to the originating utterance. The connection of the lexicalized chains to their originating utterances allows interrogation (“reading”) of the accumulated knowledge in context. We present sample chains from the Delft backpack team and the Bamberg 2302 team. For illustration purposes, the chosen set of linked utterances for the Delft backpack team in Figure 23 is interweaved whereas the linked structure of the Bamberg 2302 team in Figure 24 is linear in time. In practice, the structure of the connected chains is complex and has many possible paths. In the examples provided below, a representative is chosen to expose key utterances in the conversation. In the figures, connections on the top of the circles represent hypernym links whereas the horizontal links represent synonyms or repetitions. A curved arrow denotes the exclusion of interim utterances.

The connected links from the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The connected links from the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The Delft backpack team must deal with attaching a bicyclist's backpack to the rack (Fig. 23). What is interesting, as the utterances show, is that the team attempted to resolve, in parallel, the design concepts for connecting the rack to the bicycle (with various types of brackets) with design concepts for attaching equipment (with bungee cords, zippers, nets, and snaps) to the backpack. In fact, the choice of materials for manufacturing the rack (injection molding, vacuum forming) is directly related to the possible type of attachment concept. Valkenburg (1998) noted that the process of coming to agreement about the materials, manufacturing method, and backpack fastening device lead to a shared understanding about the agreed-upon manufacturing method: injection molding. Although the lexical chains are not sufficient to ascertain whether or not the team came to a shared understanding, the data calculated in generating the chain usefully indicates the extent to which these concepts were discussed and the accumulation of concepts throughout the design team's conversation.

Figure 24 illustrates a set of linked utterances during which the participants dealt with the issue of positioning the reflected light that the planetary gear mechanism moves. When 2302/D begins talking about a “mechanism,” it makes sense that 2302/A refers to the “reflector,” which is what the mechanism will be physically connected to. The reflector will produce a certain type of “light.” The light may alternatively be produced by a “lamp.”

What these connected chains revealed is that in collective design situations, design concepts are constructed through an accumulation process where new knowledge, expressed as lexicalized concepts, is added to succeeding utterances. In a design team conversation, the choice of a particular concept or way of proposing a concept by a designer seems to trigger other related concepts from other designers. When a concept is lexicalized, the semantics of the lexicalized concept will necessarily impose some structuring on the further possible choice of concepts to be lexicalized.

6. CONCLUSION

Designers are skilled in organizing and representing artifacts. Human thinking and ability to develop knowledge is heavily constrained by, and made possible through, language. This research suggests that language is essential for the production of knowledge (i.e., concept formation) during group design. Using the two perspectives on language use and the machinery of computational linguistics, this paper has demonstrated that the production of knowledge relies on designers' linguistic behavior to construct a composite concept.

Concept formation in group design situations has been deduced through statistical language processing of patterns of word co-occurrence and semantic links. We were able to characterize the accumulation of individual and group knowledge structures through their lexicalizing of concepts. These analyses illustrated how language serves as a container for transferring knowledge from one designer to another. The construction of the lexical chains and accumulation of lexicalized concepts suggests that the lexicalized concepts represented design concepts that were generated on the fly. The formation of design concepts was reflected in the semantic connections between lexicalized concepts.

Three implications arise from the results. Design researchers have hypothesized the existence of core skills and knowledge, such as planning and abstracting, which are transferable across design domains. The methods laid out in this paper suggest that these differences could manifest as differences in linguistic behavior. We have already seen these types of differences between the Bamberg teams where, for example, the density of semantic links in team 1102 is much lower than teams 2202 and 2302. Although we would be cautious against making a prescriptive statement such as productive design teams should have high numbers of semantic links, perhaps diagrams such as Figures 19 and 21 could serve as descriptors of how much knowledge the teams accumulated toward concept formation. The evidence of a high percentage of hypernyms relative to other types of semantic links and constructive thematic coherence in the Delft, Bamberg 2202, and Bamberg 2302 teams suggests that these factors may indicate productive concept formation.

Second, computational systems for characterizing and critically describing a design team's accumulation of knowledge using language-based communication could make visually transparent a cognitive dimension of design team collaboration. Although studies have not yet been conducted to ascertain whether this type of system would actually encourage more reflection by design teams of their thinking processes, our the intent is not to suggest realism in modeling cognitive performance but rather to encourage an atmosphere in which design teams continually self-monitor their performance. Such a design tool is intended to improve the awareness and effectiveness of the cognitive activities that take place during designing.

Third, the data from the Bamberg study bring forth an interesting quandary. Design practice is often described and assessed by the proper execution of a formal process; a formal process has been codified in a standard, the German Verein Deutscher Ingenieure (VDI) VDI 2221. However, only one of the teams of German-educated students (2202) followed a methodology. Team 1102 appeared “planned,” whereas team 2302 appeared “chaotic.” Why then was team 2302 quite successful in terms of design outcome? We would conclude that assessing process alone might not adequately account for design team performance. This research offers a new automated means to assess design teams through a rigorous, computable linguistic analysis of their design content in text. Although process matters, the content produced by the process could indicate whether the design team successfully accumulated knowledge. The linguistic, content-oriented analysis shown in this paper puts forth a rigorous, computable means to evaluate whether design process lead to concept formation or merely produced content.

The computational analysis described in this paper offered a means to ascribe a performative aspect of design text to its representational and mediated actions perspectives. The linguistic behavior of differentiated repetition manifested as hypernym links appears to impact the productive formation of a design concept. The connections between the semantic links could be interpreted as constituting performative vectors of power in design text that give rise to the designed. The semantic connections enact the designed, a design concept, in the text. What the research suggests is the need to journey beyond the representational seductions of design text. What goes on after the cataloguing of noun phrases, intent and rationale, and decisions represented in design text? In addition, if the design text can in a sense design, to what extent is designing possible by language alone? Let us think of design text as performative, operating on a semiotic system to give rise to the designed.

ACKNOWLEDGMENTS

This research was supported by an Australian Research Council Grant DP0557346. The author gratefully acknowledges the assistance of Prof. Petra Badke-Schaub for the transcripts of the Bamberg mechanical engineering design students and Dr. Ir. Rianne Valkenburg for her annotated transcript of the Delft mountain bike backpack team.

References

REFERENCES

Benami, O. & Jin, Y. (2002). Creative stimulation in conceptual design. Proc. ASME 14th Int. Conf. Design Theory and Methodology, Paper No. DTM-34023, Montreal.

Brereton, M.F., Cannon, D.M., Mabogunje, A., & Leifer, L.J. (1996). Collaboration in design teams: mediating design progress through social interaction. In Analysing Design Activity (Cross, N., Christiaans, H. & Dorst, K., Eds.), pp. 319–341. Chichester: Wiley.

Buchanan, R. (1989). Declaration by design: rhetoric, argument, and demonstration in design practice. In Design Discourse: History, Theory, Criticism (Margolin, V., Ed.), pp. 91–109. Chicago: University of Chicago Press.

Coyne, R.D., Newton, S., & Sudweeks, F. (1993). A connectionist view of creative design reasoning. In Modeling Creativity and Knowledge-Based Creative Design (Gero, J.S. & Maher, M.L., Eds.), pp. 177–209. Hillsdale, NJ: Erlbaum.

Cutkosky, M., Engelmore, R., Fikes, R., Genesereth, M., Gruber, T., Mark, W., Tenenbaum, J., & Weber, J. (1993). PACT: an experiment in integrating concurrent engineering systems. Computer 26(1), 28–37.CrossRef Google Scholar

de Vries, B., Jessurun, J., Segers, N., & Achten, H. (2004). Word graphs in architectural design. In Design Computing and Cognition '04 (Gero, J.S., Ed.), pp. 541–556. Dordrecht: Kluwer Academic.CrossRef

Dong, A. (2005). The latent semantic approach to studying design team communication. Design Studies 26(5), 445–461.CrossRef Google Scholar

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

Foltz, P.W., Kintsch, W., & Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes 25(2–3), 285–307.CrossRef Google Scholar

Garcia, A.C.B. & Howard, H.C. (1992). Acquiring design knowledge through design decision justification. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 6(1), 59–71.CrossRef Google Scholar

Gero, J.S. (1990). Design prototypes: a knowledge representation schema for design. AI Magazine 11(4), 26–36.Google Scholar

Goldschmidt, G. (1996). The designer as a team of one. In Analysing Design Activity (Cross, N., Christiaans, H. & Dorst, K., Eds.), pp. 65–91. Chichester: Wiley.

Jacobsen, K., Sigurjónsson, J., & Jakobsen, Ø. (1991). Formalized specification of functional requirements. Design Studies 12(4), 221–224.CrossRef Google Scholar

Landauer, T.K. (1999). Latent semantic analysis: a theory of the psychology of language and mind. Discourse Processes 27(3), 303–310.CrossRef Google Scholar

Lloyd, P., Lawson, B., & Scott, P. (1995). Can concurrent verbalization reveal design cognition? Design Studies 16(2), 237–259.Google Scholar

Mabogunje, A. & Leifer, L. (1997). Noun phrases as surrogates for measuring early phases of the mechanical design process. Proc. ASME 9th Int. Conf. Design Theory and Methodology, New York.

Oxman, R. (2002). The thinking eye: visual re-cognition in design emergence. Design Studies 23(2), 135–164.CrossRef Google Scholar

Poole, M.S. (1999). Group communication theory. In The Handbook of Group Communication Theory and Research (Frey, L.R., Gouran, D. & Poole, M.S., Eds.), pp. 88–165. Thousand Oaks, CA: Sage.

Regli, W.C., Hu, X., Atwood, M., & Sun, W. (2000). A survey of design rationale systems: Approaches, representation, capture and retrieval. Engineering with Computers 16(3–4), 209–235.CrossRef Google Scholar

Schön, D.A. (1983). The Reflective Practitioner: How Professionals Think in Action. New York: Basic Books.

Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal 27, 379–423 and 623–656.CrossRef Google Scholar

Silber, H.G. & McCoy, K.F. (2002). Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28(4), 487–496.CrossRef Google Scholar

Sonnenwald, D.H. (1996). Communication roles that support collaboration during the design process. Design Studies 17(3), 277–301.CrossRef Google Scholar

Stempfle, J. (2004). Personal communication to Andy Dong, Sydney [e-mail].

Stempfle, J. & Badke-Schaub, P. (2002). Thinking in design teams—an analysis of team communication. Design Studies 23(5), 437–496.CrossRef Google Scholar

Valkenburg, R.C. (1998). Shared understanding as a condition for team design. Automation in Construction 7(2–3), 111–121.CrossRef Google Scholar

van der Lugt, R. (2005). How sketching can affect the idea generation process in design group meetings. Design Studies 26(2), 101–122.CrossRef Google Scholar

Wood, W.H., III & Agogino, A.M. (1996). Case-based conceptual design information server for concurrent engineering. Computer-Aided Design 28(5), 361–369.CrossRef Google Scholar

Zeitz, C.M. (1997). Some concrete advantages of abstraction: how experts' representations facilitate reasoning. In Expertise in Context (Feltovich, P.J., Ford, K.M. & Hoffman, R.R., Eds.), pp. 43–65. Menlo Park, CA: American Association for Artificial Intelligence.

A diagram of accumulation. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The lexical relations between concepts.

The hypernym (upward) relation between two words: the concepts “adjustment” and “move” are a kind of “change.”

For synonyms, a “base” is synonymous with a “stand” related by the concept “support.”

The meronym (part-of) relation between words: a “wheel” is a part of a “bicycle.”

The hyponym (downward) relation between words: a “circumference” is a more specific concept of both “length” and “size.”

The pattern of links for assumption 1. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The pattern of links for assumption 2. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The pattern of links for assumption 3. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Delft team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 1102 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The log coherence for the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Window of coherent conversation

Number of chain links Delft backpack design team (w = 3)

Number of chain links Bamberg 1102 team (w = 3)

Number of chain links Bamberg 2202 team (w = 6)

Number of chain links Bamberg 2302 team (w = 22)

The percentage of semantic links for the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 1102 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The percentage of semantic links for the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The strength of the ties for the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The strength of the ties for the Bamberg 2202 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Strength of ties for Bamberg 1102 team

Strength of ties for Bamberg 2302 team

The Bamberg 1102 team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The Bamberg 2302 team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The Delft team per speaker coherence. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The connected links from the Delft backpack team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

The connected links from the Bamberg 2302 team. [A color version of this figure can be viewed online at www.journals.cambridge.org]

Article contents

Concept formation as knowledge accumulation: A computational linguistics study

Abstract

Keywords

1. INTRODUCTION

2. CONCEPT FORMATION: A LINGUISTIC PERSPECTIVE

3. LEXICAL CHAIN ANALYSIS

4. EXPERIMENTS

5. RESULTS

6. CONCLUSION

ACKNOWLEDGMENTS

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests