1 Introduction
Computer-Assisted Language Learning (CALL) is known to facilitate second language acquisition (SLA) in many aspects (Chapelle, Reference Chapelle2003); first, it can provide enhanced, more salient input through modification and elaboration to help learners comprehend input and thus attend to target linguistic features, leading to acquisition (Robinson, Reference Robinson1995, Schmidt, Reference Schmidt1990); secondly, CALL facilitates interaction among language learners synchronously (Yilmaz & Grañena, Reference Yilmaz and Grañena2010) and asynchronously (Hoshi, Reference Hoshi2003; Itakura, Reference Itakura2004); and lastly, it provides enhanced opportunities for corrective feedback on learners’ linguistic productions (Heift, Reference Heift2003). In addition, web-based technology enhances second language (L2) testing practices and qualities by making interactive features, authentic input, and automated delivery and scoring possible (Shin, Reference Shin2012). Internet-based CALL, in this manner, has great potential to provide valuable resources for task construction as well as data collection and analysis (Chapelle, Reference Chapelle2003), which makes online tools, particularly electronic portfolios (E-portfolios), useful means for investigating the effects of contextual features on student performance across different tasks and over a period of time. However, to date, there has been little or no guidance on how best to utilize specific online resources such as e-portfolios as research and assessment tools, despite the importance of contextual factors in SLA research and L2 assessment practices more generally. This study first discusses the role of contexts in SLA and L2 assessment, reviews existing E-portfolios, and suggests a framework for developing E-portfolios as potential research and assessment tools to better understand the contextual features in SLA and assessment research.
2 The role of context in SLA
It has been claimed that utterances produced by L2 learners will vary in a systematic and predictable manner as they attempt to perform different tasks in the target language (Tarone, Reference Tarone1985, Reference Tarone1988; Tarone & Liu, Reference Tarone and Liu1995; Young, Reference Young1988). Such variability in linguistic form in L2 learner data was termed interlanguage (IL) variation (Tarone, Reference Tarone1998), which is known to be triggered by either internal or external factors (Romaine, Reference Romaine2003). Internal variation occurs when it is constrained by linguistic factors (i.e., the phonetic or syntactic environment in which a target form is used). On the other hand, external variation takes place when it is conditioned by external variables, such as other language users, topic, degree of attention paid to the task, and communication pressure (Tarone, Reference Tarone1998). These linguistic and external contexts have been shown to have an effect on phonological (Dickerson, Reference Dickerson1975; Beebe, Reference Beebe1980), morphological (Bardovi-Harlig, Reference Bardovi-Harlig1998; Tarone, Reference Tarone1985), and syntactic features (Tarone & Liu, Reference Tarone and Liu1995) of the language produced by L2 learners.
In particular, different task features often lead to different levels of linguistic accuracy. This relationship is shaped by the extent to which tasks predispose learners to focus either on form or meaning. For example, Dickerson and Dickerson (Reference Dickerson and Dickerson1977) found that Japanese learners of English produced the English consonant /r/ sound most accurately in the word list reading task in which learners focus their attention on form, and least accurately in the free speech situation when their focus is mainly on meaning. Similarly, Eisenstein and Starbuck (Reference Eisenstein and Starbuck1989) found that grammatical accuracy rates were higher when learners were more attentive to form than meaning; Ten English as a Second Language (ESL) learners in their study were less syntactically accurate in terms of English tense usage and meaning, and verb formation and meaning when they talked about a topic of interest to them rather than one of little or neutral interest. However, the relationship between linguistic accuracy and degree of attention to language form dictated by task requirements is not always simple and linear. Tarone (Reference Tarone1985) has shown that some grammatical forms such as direct object pronouns and articles that were closely linked to textual cohesion were more accurately produced in oral communication tasks, which required more attention to cohesive discourse, than on a written grammar test, even though the grammar test was specifically designed to direct learners’ attention to grammatical forms. Such a complex, interactive effect of contextual features has been found on linguistic behaviors as well as on accuracy; diverse speech styles (vernacular vs. careful) were elicited by different tasks (Tarone, Reference Tarone1982), and different degrees of conversational dominance, operationalized as the number of interruptions and questions and amount of talk, were observed when the interlocutors’ degrees of topical knowledge varied within native/non-native dyads (Zuengler, Reference Zuengler1989).
All in all, there is general agreement that much variability is commonly found in L2 learner data, but there is still an old but continuing debate in the field of SLA as to the way we interpret L2 learners’ variable performance across tasks. Gregg (Reference Gregg1990) argued that linguistic competence is constant, not variable; what varies is performance, which can be viewed as a collection of ‘slips’ or the result of lack of attention to forms. On the other hand, Tarone (Reference Tarone1990) and Ellis (Reference Ellis1990) emphasized the role of context in SLA, insisting that linguistic ability cannot exist in a vacuum, so it should be understood as an underlying ability to use linguistic knowledge in the real world, not just knowledge itself. Regardless of positions taken as to the relationships between competence and performance, the sources and degree of variability conditioned by different tasks have not yet been clearly identified. This paper argues that a full account of variability in learner language can only be captured through the rich data collected across diverse tasks over a period of time.
3 The role of context in language assessment
There have also been differing views about the role of context in language testing. Proponents of the social interactional perspective contend that context should be taken as a part of the construct of language ability and it should be defined contextually (Chalhoub-Deville, Reference Chalhoub-Deville2003). Conversely, those who subscribe to the cognitive-oriented language ability perspective (Bachman, Reference Bachman2007) argue that context does not belong to the construct to be assessed. Rather, much like “test method facets” (Bachman, Reference Bachman1990: 111), context should be considered as a characteristic of the setting in which language use takes place because conceptualizing performance on the assessment task as a direct outcome of the construct makes it difficult to distinguish the language abilities we want to measure from the method factors used to elicit language use samples and to make generalizations across contexts (Bachman, Reference Bachman2007). Despite their many differences, proponents of both perspectives agree that an understanding of different effects of varying contexts on test takers’ performance is a key issue in language testing because the way we define the context of assessment may affect how we conceptualize two important assessment qualities: reliability and validity.
Different conceptualizations of what causes variability in test takers’ performance fundamentally affect the way we estimate reliability (Deville & Chalhoub-Deville, Reference Deville and Chalhoub-Deville2006) in that reliability is essentially an estimation of the ratio of construct-related variance to error variance, which is attributed to random sources irrelevant to the construct. From the cognitive-oriented language ability proponents’ perspective, performance variability due to different tasks is relegated to error or noise variance. On the other hand, for interactionists, variability related to the effects of task of interest would be a relevant dimension of the construct to be assessed. Such different views of context/task variability also inevitably affect the different aspects of validity in that the way researchers interpret the meanings of test scores will be either context-independent or context-dependent. Thus, generalizability of the interpretations of test scores will be affected as well. Advocates of the cognitive view are more interested in language ability to generalize beyond test takers’ performance on the assessment itself to a particular target language use (TLU) domain and its associated tasks (Bachman & Palmer, Reference Bachman and Palmer2010). However, interactionists would be more concerned with predictive validity in that they believe that a delineated and contextualized construct would provide a more accurate prediction about how well test takers use language in the TLU domain (Chalhoub-Deville, Reference Chalhoub-Deville1997).
Despite the importance of conceptualizing and operationalizing the contextual features in language testing, the relationship between different contexts or tasks and learners or test takers’ performance is not yet clear. In particular, most quantitative research on the effect of context or task effect on learner performance has limited external validity because it is common that we should control for other variables in order to understand the effect of a specific contextual feature of interest. For example, we may not be able to examine topic effects while controlling for task difficulty in terms of cognitive complexity. Additionally, most previous studies that have examined this issue are based on cross-sectional data with a focus on between-group differences rather than within-individual differences across different contexts (Larsen-Freeman & Cameron, Reference Larsen-Freeman and Cameron2008).
Such limits on cross-sectional data often make it difficult for researchers to understand individual developmental trends in interlanguage as well as to interpret contextual constraints on the learner's spoken and written language data. Thus, a portfolio, particularly an E-portfolio, can be a promising and practical research and assessment tool to understand IL variations and test task effects in that it can include and store a variety of language use samples across multiple tasks. This allows us to generalize from the results of performance on a portfolio to a broader domain of context and to examine the effect of different contexts on production for individual learners.
The following section will provide an overview of E-portfolios in general and existing E-portfolios such as the European Language Portfolio (ELP) and LinguaFolio, and their common features and potential use and possible adaptation as an IL research and assessment instrument.
4 Electronic Portfolios
4.1 What is a portfolio?
Brown (Reference Brown2005:62) defined a portfolio as “a procedure that requires students to collect samples of their second language use (e.g., compositions, audio recordings, video clips, etc.) into a box or folder for examination at some time in the future by peers, parents, outsiders, etc.” Thus, minimally, a portfolio should include samples of students’ work to demonstrate their progress and achievements that can be shared and reviewed periodically according to a given set of criteria clearly stated by others and themselves (O'Malley & Valdez Pierce, Reference O'Malley and Valdez Pierce1996). Among other things, a portfolio has been known to be used as an informative instrument to provide an ongoing, cumulative record of language development, insight into individual progress, and tangible, sharable evidence. Moreover, it can help students reflect on their learning and promote the ownership of and responsibility for their learning (Genesee & Upshur, Reference Genesee and Upshur1996).
4.2 What is an Electronic portfolio (E-portfolio)?
An E-portfolio is a purposeful collection of learners’ language use samples which are usually uploaded online or sometimes copied onto a CD/DVD-ROM (Al Kahtani, Reference Al Kahtani1999). As opposed to a paper-based portfolio which requires some physical space for storage, an E-portfolio allows language teachers and researchers to collect and store language samples in multi-media types (e.g., audio and video files, graphics, and text) under different thematic and chronological folders. Additionally, hypertext links make it much easier to share their progress toward goals and work samples with other classmates and teachers (Cummins & Davesne, Reference Cummins and Davesne2009), and to communicate more efficiently with various stakeholders outside the classroom. Once implemented, both students and teachers can easily access an E-portfolio as a tool to monitor the students’ language learning process.
There are three currently available E-portfolio models used in Europe and the US: the European Language Portfolio (ELP), the LinguaFolio (LF), which is an American version catering to K-12 contexts, and its university-level version, the Global Language Portfolio (GLP) (Cummins & Davesne, Reference Cummins and Davesne2009). Each is modeled on different scales: the ELP (Little, Reference Little2002) is based on the Common European Framework of Reference (CEFR) describing L2 proficiency in relation to five communicative activities (listening, reading, spoken interaction, spoken production and writing) at six possible levels ranging from Level A: basic user (A1 and A2), to Level B: independent user (B1 and B2), and finally to Level C: proficient user (C1 and C2). As for its American counterpart, LF (Cummins, Reference Cummins2007) was constructed on the basis of the American Council on the Teaching of Foreign Languages (ACTFL) proficiency guidelines ranging from novice, intermediate, advanced, to superior levels, which has expanded to cover three communication modes (interpretive, presentational, and interpersonal), and the national standards’ “Five Cs” of language learning (communication, cultures, connections, comparisons, and communities). The GLP can be referenced to either the CEFR or ACTFL scales (Cummins & Davesne, Reference Cummins and Davesne2009).
Although each employs different proficiency scales for their evaluation rubrics, they all have the following three elements in common: (a) a language passport, which is a summary of an L2 learner's linguistic identity and experiences of language learning, usually including official test score results and a brief introduction to each proficiency guideline; (b) a language biography, which is a self-evaluation section built around can-do statements constructed based on either the CEFR or ACTFL guidelines; and (c) a dossier, which contains a selection of the work of the L2 learners, chosen to represent their progress in terms of language skills and cultural competence development.
5 Usability of existing E-portfolios as research and assessment tools
As can be seen above, the summary and storage functions currently available in E-portfolio systems might enable SLA researchers to compare and monitor L2 learners’ linguistic progress based on between-group and within-individual data collected across diverse tasks over a long period of time. In addition, E-portfolios may well have the potential to be used as a useful online assessment tool to provide valuable formative information for students and teachers about their instructional goals, and progress toward the goals through tangible evidence to demonstrate whether such aims have been achieved.
However, there are several obstacles and limitations that need to be overcome in order to utilize currently existing E-portfolios as research and assessment tools to investigate the IL variations and test task effects addressed earlier in this paper. Thus, the potential problems in using the current E-portfolios as research and assessment tools, and what could be done to make them more usable for this specific intended purpose will now be discussed.
The major limitations of an E-portfolio as both research and assessment tool are related to its heavy reliance on learner's self-evaluation. Self-evaluation itself might facilitate learning by allowing them to be more aware of their own learning goals and make progress towards those goals and performance (O'Malley & Valdez Pierce, Reference O'Malley and Valdez Pierce1996). Several ELP pilot projects offer support for this positive impact of learners’ self-reflection on their learning process both in elementary and secondary school settings (Nováková & Davidová, Reference Nováková and Davidová2003; Päkkilä, Reference Päkkilä2003) showing that the ELP's self-assessing function helps learners to engage with their own learning, monitor progress, set follow-up goals, and develop self-awareness.
Despite these positive pedagogical functions of self-evaluation there are concerns about a lack of validity as an assessment tool, due to its inherent subjectivity (Butler & Lee, Reference Butler and Lee2010). For example, on the basis of a meta-analysis of validation studies on self-assessment, Ross (Reference Ross1998) showed that adult learners tend to assess their speaking and writing skills less accurately than they do with reading and listening skills, and the validity of self-assessment can be affected by a number of other factors including learners’ proficiency levels (Patri, Reference Patri2002), motivation (Dörnyei, Reference Dörnyei2001), age (Butler & Lee, Reference Butler and Lee2006) and, crucially, the way items are constructed and delivered (Butler & Lee, Reference Butler and Lee2006).
Specifically, several researchers have also expressed their reservations about the reliability and validity of can-do statement grids used in the ELP. Leaving aside the abstractness and vagueness which affect their appropriacy as the basis of test construction (Alderson, Figueras, Nold, North, Takala & Tardieu, 2006), Hulstijn (Reference Hulstijn2007) criticized these statements at a more theoretical level on the grounds that they confound quantity, quality, and the types of task. He worried that it would not be possible for learners to perform the specific tasks described in the grid as the researchers intend:
The question then arises whether it is necessarily true that a learner who is placed at the B2 level of overall production must also have attained the B2 level on all the linguistic competence scales, or whether it is possible for a learner to be situated at different levels on different scales. In principle, one could conceive of three types of L2 users: (a) L2 users who can do only few things in terms of quantity but whose performance is characterized by high linguistic quality, (b) L2 users who can do many things in terms of quantity but whose performance is characterized by low linguistic quality, and (c) L2 users whose quantity range matches their performance quality, as suggested by the CEFR scales of the mixed type. (op. cit.:664)
In short, the can-do statement grids confound the test tasks with the individual test taker's language ability. The fundamental problem with this approach is that the construct of underlying language ability is mixed with the task methods. Proficiency level is determined by different combinations of personal trait and task features. Such conflation makes it difficult to understand the contextual influence on learners’ performance, and to document their progress over time because mutually unrelated contextual features are confounded and lumped together into sets of can-do grids. In addition, there are problems in terms of the meaningfulness of their holistic scores and feedback that learners receive according to their CEFR or ACTFL levels. It is not clear what kind of take-home message learners can get from their self-evaluation processes. L2 learners and teachers may be more interested in how they can develop and improve their language knowledge and skills such as their grammatical, lexical, textual, and pragmatic knowledge, rather than whether or not they are able to complete the specific task.
Another limitation in using currently existing E-portfolios as a research and assessment device for investigating the contextual effect on learners’ performance is that the language use examples stored to represent their proficiency levels are not usually collected under controlled conditions. Learners are usually allowed to select their “best works” to be used as evidence to show to others what they can do with language. Since it is quite possible that students may rehearse the language, such performance may not be representative of their underlying ability and systematic variation across different tasks; it has been well documented that the language use samples elicited from planned performance are more accurate and fluent than ones produced in spontaneous tasks (Ellis, Reference Ellis1987; Foster & Skehan, Reference Foster and Skehan1996; Mehnert, Reference Mehnert1998). Therefore, in order to determine whether a learner's spoken and written samples stored in an E-portfolio are novel or practised, it would be necessary to develop a system to enable a teacher to indicate whether a learner was given time to plan performance prior to responding to the task in an E-portfolio. Alternatively, it would also be possible to install two separate folders into the E-Portfolio system for the different needs and purposes of the users: a growth (or working) folder where learners upload all their works under controlled conditions, which would serve as a data source for IL variation research and summative assessment; and a showcase folder in which students self-select their best and favorite works as commonly used in many ELP projects in Europe (Lenz, Reference Lenz2004).
6 Recommendations
As stated above, there are a number of inherent problems with the use of existing E-portfolio systems for IL variation research and assessment purposes. Nevertheless, in my view, it is possible to make constructive changes to the system which address these limitations. To begin with, diverse contextual features existing in each level descriptor of the CEFR or the ACTFL scales which determine E-portfolio tasks should first be systematically categorized. With a more organized means of describing the relevant features of the task, learners’ language use performance stored and reported in an E-portfolio would enable more transparent interpretations. Several task frameworks to describe task characteristics proposed by SLA researchers (Pica et al., Reference Pica, Kanagy and Falodun1993; Skehan & Foster, Reference Skehan and Foster2001) and assessment researchers (Bachman & Palmer, Reference Bachman and Palmer2010) can be implemented to describe and develop diverse task features that are used in E-portfolios. Among these existing frameworks for describing task characteristics, Bachman and Palmer's (2010: 66) “framework of language task characteristics” for describing several aspects of tasks appears the most promising one because this model is most comprehensive, and it can also further be used to describe and characterize real life tasks, so that we can also establish the authenticity of our tasks by the degree to which the characteristics of real life and test tasks are matched up (Bachman, Reference Bachman1990).
Based on Bachman and Palmer's (2010) framework of language task characteristics, a task feature grid for developing and analyzing tasks implemented into E-portfolios is proposed (see Table 1).
The grid will help to systematically organize the large number of language tasks that learners will perform. It could thus enable L2 teachers and researchers to choose and construct tasks in a principled manner, and to interpret the observed linguistic performance of learners as a function of various task features over an extended period of time.
In this grid, task features are described in terms of three main dimensions: input, outcome, and conditions under which tasks are performed. Input refers to what L2 learners are expected to process and respond to in a task, and outcome contains linguistic responses that learners are to produce in response to input. As in Bachman and Palmer's (2010: 74–76) task characteristic framework, input can be divided into item, prompt, or input for interpretation depending on the length and purposes of the task; an item contains a fairly limited chunk of language, which is commonly used for a multiple-choice test which asks a learner to choose from among several options; a prompt consists of short utterances or sentences in a directive form to elicit an extended response from a learner, such as in essay writing or sustained monologue tasks; and input for interaction includes lengthy oral and written passages for learners to process which are common in a typical reading and listening test. The format of the input and outcome can be further characterized in terms of channel (aural, visual or both), and language of the input/outcome (L1 or L2). With regard to the conditions dimension, four components are included: task types, participants, time pressure, and goals. Task types refer to whether the task is performed individually, in pairs or small groups, online or offline; participants provide information about the main characteristics of other individuals who take part in pair or group work, including their gender, degree of familiarity with the learner, proficiency level, and so on (in other words, this will tell us who is paired or grouped with whom, and who they are); time pressure indicates whether a task is completed in a controlled or an uncontrolled environment; lastly, goals denotes the general purposes of each task in which learners are engaged. Each language task can be characterized by a number of goals to be achieved. L2 learners might use language for a variety of purposes, including announcement, narration, description, discussion, debate, conversation, etc. For example, as described in the CEFR scale rubrics (Council of Europe, 2001:58–84), individual speaking activities can be divided into sustained monologue for describing experience and putting a case, making public announcements, and addressing audiences. Similarly, written tasks can be classified into ones for creative writing, writing personal or business letters, and writing reports and essays. Likewise, interaction through the medium of spoken language can be done for formal or informal discussions, debate, interview, and learners could also interact with each other in written form for correspondence by letter or e-mails, or exchange information online asynchronously.
No existing E-portfolios have yet been specifically constructed for collecting longitudinal and cross-sectional data as a research or assessment tool. However, this framework for task description can be applied to actual E-portfolio examples. For instance, one French secondary school used the ELP as a class project to plan and evaluate the writing of a detective story (Mullois, Reference Mullois2003). This project consisted of several sub-tasks which learners completed individually and in groups; learners first independently read the book, Detectives from Scotland Yard, and then responded to open-ended reading comprehension questions for homework (task 1); secondly, they discussed how they should proceed in writing their own detective story and came up with posters outlining their plan in groups (task 2); thirdly, each group of four students drew the characters of their story and provided a written description of each (task 3); fourthly, as homework students were asked to write a self-introduction letter to be sent to their English pen-friend (task 4); fifthly, they summarized their story based on their planned plot (task 5); lastly, they produced dialogues between the detectives and the suspects (task 6). These different tasks from a single class project can be systematically compared using the proposed task characteristic framework, as shown in Tables 2 and 3.
The tasks in Tables 2 and 3 are the same in terms of channel and language of both input and outcome. However, they are different from each other with regard to the types of input and outcome, time pressure, and goals. In task 4, each learner individually wrote a personal letter under relatively uncontrolled conditions, whereas the summary task, task 5, required learners to process and summarize lengthy input with other learners under controlled conditions. Since these two tasks are dissimilar based on more than two features, it would be difficult for researchers to identify any kind of contextual effects. Nonetheless, researchers may be able to investigate task effects given more samples collected over time as long as they manipulate task features using this kind of task characteristic framework. For example, if a teacher asks students to write an additional personal letter in class, a researcher might be able to understand potential time pressure effects on learners. Practically speaking, a drop-down menu with context or task entries in a user interface model could be implemented into an E-portfolio system, so that learners and teachers would better understand which specific characteristics affect learners’ performance on each task and how.
Although this grid would not provide the most comprehensive framework, it would help to equip an E-portfolio with a system to describe task features and the conditions under which tasks are performed. With this systematic approach, many tasks could be generated, compared, and replicated by other institutions, which would eventually help us to accumulate knowledge in SLA and assessment research. It could therefore provide a useful mechanism for choosing and developing tasks, and identifying task features as a way of accounting for different aspects of learners’ linguistic performance.
Even though tasks in an E-portfolio are developed and organized in a principled manner, learners’ performance may not be systematically compared without considering what we want to look for in their stored works. Thus, another recommendation would be to provide a framework for systematic evaluation, particularly when E-portfolios are intended to be used as a formative assessment instrument. When teachers evaluate language samples that learners have produced, or learners themselves reflect on their work in most current E-portfolio systems, it would be preferable to focus on the communicative language competences they have displayed on each task, instead of functional abilities that can only be applied to a certain kind of task. For instance, when a learner is asked to perform a CEFR A2 level speaking task, “giving a simple description or presentation of people, living or working conditions, daily routines, and likes/dislikes” (Council of Europe, 2001:58), close attention should be given to learners’ linguistic competences including the richness and accuracy of vocabulary, grammatical accuracy, phonological control, textual competence, and fluency. Relevant rubrics are well articulated in the Council of Europe documents (2001:101–130), the use of which would allow for particular contextual effect on language use to be more clearly captured. In addition, it would be fruitful to allow learners to report how they perceive the varying contexts in their self-evaluation sections because each learner may interpret and react to external contextual features in a different manner (Douglas, Reference Douglas1998). It would also be informative to evaluate the extent to which teachers’ and/or peers’ feedback predict or even affect learners’ subsequent performance in the E-portfolios in future research.
To conclude, it has long been recognized by both SLA researchers and language testers that the way we elicit or assess learners’ language samples affects how they perform on specific tasks. Such task effects have also created heated controversy in defining what it means to know and use language in both SLA and language testing areas. Considering the importance of understanding context/task effects on language use in the theory and practice of both fields, it is crucial to document and monitor such variations occurring in different contexts over a period of time. An E-portfolio has the potential to isolate task effects in that it functions by collecting and documenting various language samples from the same learners across different conditions over time. However, currently available E-portfolios have a number of limitations to overcome in order to be used as research and assessment tools. To make them more usable for those intended purposes, we should revise the way we assess language proficiency, define the characteristics of the portfolio's contents, and compartmentalize those contents according to their purposes in order to provide evidence of the state of a learner's interlanguage and the progress they make.