Stephen van Vlack Sookmyung Women’s University Graduate School of TESOL
Second Language Learning Theories Spring 2010 Week 3 - Answers M&M, Chapter 2: The recent history of second language research 1. What are some of the underlying ideas of behaviorism? Are they valid today? Behaviorism is a general theory of learning based on the belief that people learn by remembering things they were exposed to and forming associations between what they learn and the conditions of learning. For behaviorists, language is learned based on a series of exchanges between others (usually the parents or principle caregivers) and children through which the child internalizes (remembers) all that they have heard (and said) and builds a system out of this. This works as a result of a simple remembering and then mimicking on the part of the child, what was encountered n the environment. Associations are made between the bits (linguistic forms) the child is exposed to and has memorized and features of the environment, both external and internal, which occur along with the linguistic exposure. On this view, learning is environmentally-driven and the learner is a rather passive vehicle on a ride through the environment. It is the environment which determines learning. Latter versions of behaviorism claimed that once enough associations had been made the child could break down the utterances they had memorized and arrange them into certain templates that could be used to create new utterances. Some of these ideas still exist today in certain theories of language acquisition and even generative linguists admit that there are some behavioristic elements in language acquisition, particularly in phonological and lexical acquisition although in more recent publications (Chomsky, 2000) innate universal elements are claimed to play a central role in even these aspects of the acquisition process. In order to get a better feeling for how behaviorism works, let us now take a more detailed look at the concept of conditioning. The model discussed below is more modern and decidedly more cognitive version of conditioning called the S-S (stimulus-stimulus) Model (Terry, 2006). It is a model I feel which has potential in being able to very simply describe processes and outcomes of associative learning. Classical conditioning is a simple example of what is called associative learning. This is learning where one learns to associate two concepts, ideas, stimulus not related in the physical world for example rain and a raincoat. It should be obvious that we are following the stimulus-stimulus (s-s) model of classical conditioning here. Classical conditioning, otherwise known as Pavlovian conditioning, is when an organism forms a simple association between two previously unrelated stimuli. One of the stimulus is important to the organism and is called an unconditioned stimulus. The other stimulus is a conditioned stimulus. The association made between the two is seen as being learned as there is a demonstrable change in behavior as a result of the association having been made. As the result of the association the organism will have transferred the natural responses to the unconditioned stimulus to the conditioned stimulus even though prior to the their dual and temporally controlled exposure they were not at all related. Thus, for language learning, hearing the word `wind` as the wind blows enables an association to be built between these two things. In a
nutshell this is what classical conditioning is. It is associations being made on the basis of co-occurrence. Although for the purposes of trying to control the process and/or concepts of morality, classical conditioning experiments are usually carried out in a laboratory and with organisms other than humans, it is possible to see how this can be applied to the outside world and to the human world in general. Conditioning is simple type of learning which takes habituation (the idea of diminishing associative power between things we get used to, such a loud noise, with no follow up response) just one baby step further. In conditioning we are taking what was learned through habituation and extending it on to another stimulus which just so happen to cooccur in the world. Such a type of learning is obviously largely responsible for our uncanny ability to survive for it means that we can extend our behaviors beyond a simple fight or flight response generated by habituation. It may also be the very underpinning of our ability for language. The very defining feature of language is its arbitrariness as proposed by de Saussure (1959); that is the arbitrariness of the actual structures (but not necessarily the meaning). We would then wonder, if there is no logical reason why some structural units should be expected to be found adjacent to others (to think so would be ludicrous) then how did they come to co-occur like they do? This is a question of linguistic evolution and unfortunately cannot concern us here. What does concern us here, however, is the question of how a child can learn these patterns of form. There must be a type of learning which supports the relatively quick learning of language. Conditioning might, at least partially, explain how learning of this type actually ensures. Think about context. Context is the glue that ties all learning together and those that can see it and differentiate its parts are good learners. Here are few ideas of my own as to how classical conditioning, as a generalized theory of learning might be seen as aiding in the language learning process. Some of these ideas come from cognitive linguistics (Fauconnier 1997, Barlow & Kemmer 2000) and cognitive science Lepore & Pylyshyn 1999, Gazzaniga 2000) and were never considered within a behaviorist framework for classical conditioning. Rather they represent, I would like to believe and more modern, and complete model of how classical conditioning might be used to describe aspects of the language learning process. Feature-hopping The basic idea in the feature hopping sequence is that all linguistic units (words, for example) are composed of sets of underlying features. Some of these features are direct attributes of the word while others are less direct attributes and take the form of links to other different words and concepts. Now, when two stimulus (and in this case what we might be really be talking about here is the language forms and functions) are linked and a new stimulus comes in the two stimulus are associated. It is in this association that features hop or are mapped from one stimulus onto the other. It is therefore possible for us as language teachers who are very interested in language learning to try to devise a test to see how features of one linguistic stimulus are mapped onto another linguistic stimulus. For us as second-language teachers this of course would be done in relation to feature mapping across the two languages. We would assume that the mapping would occur from the dominant language to the less dominant language at least in the initial stages. This can become simply by having lexical items co-occur and then test to see if the features of one map onto the other. Co-occurrence
The idea behind co-occurrence comes from basic observations that in order for two stimulus to be associated they need to co-occur and not just once but probably several times depending on the type of stimulus they are in and the intervals between the co-occurrences. Well, one thing that we can do as a way of testing is to have some lexical items co-occur and then to test and see if the students or subjects after certain period of time also use these to reoccur. This is simple learning pattern of association. Feature generalization The underlying idea of behind feature generalization is that a linguistic stimulus (and that includes words/phrases/sentences) are composed of features we know that association can actually be extended from the original associated link to other things or concepts which have similar features. So, for example is somebody associates the actual animal dog with the lexical item dog than the attributes of the animal dog will he mapped onto the lexical item. When the learner encounters a similar word to dog, such as pooch or canine or anything else then we would expect them to map the same types of features on based on the fact that there seemed to be common features. This of course could be turned into an experiment in the classroom. There are several other factors that have a strong effect on conditioning. They are: Prior exposure - familiarity, Compound stimuli, Surprise, Relevance, and Inhibition. Prior exposure is a sword that cuts both ways. Prior exposure in which the two stimuli did co-occur will serve to reinforce or heighten the conditioned response. Prior exposure, however, in which the two stimuli did not co-occur will increase the chances that a conditioned response will not occur. When compound conditioned stimuli are presented several different effects occur depending on the timing and salience of the conditioned stimulus in relation the unconditioned stimulus. Most of this is intuitive and, therefore, does not require further explanation here as are the effects of surprise and blocking when two or more conditioned stimuli are used. The amount of relevancy of the two stimulus to each other seems to have a fairly large effect on not only what will be associated but also on how quickly the association will take place. An inhibitory response is one in which the absence of the unconditioned response will cause a conditioned response. What is interesting for us is that an inhibitory response can only be generated based on an understanding (knowledge) of the components that go into making the unconditioned stimulus. In order to know that something will not happen you need to be able to predict under what conditions it would happen and to know about what elements are present and what elements are missing. Trying to integrate elements of this S-S model into our daily teaching is stunningly easy and probably is no different from what you are already doing. The basic idea is teachers need to carefully regulate the input that the students are exposed to. A real kind of social situation must also be developed in the classroom for this to work, as we will se below in our brief discussion of form-to-function mapping. This attention to the forms presented and the overriding importance of the situational/linguistic context in which these forms must be embedded also serve to link this simple S-S
Model to dialogical models of SLA. Finally, we might be able to posit that classical conditioning may have an effect on language learning as it can be used to explain the phenomena of form-to-function mapping. According to some theories of language the basis of language itself is the mapping of forms to functions. A good indicator of this is the fact that all language has to be uttered or exist in a functional framework. This means that we always say things for a particular reason which not only has to be cleared to the speaker but also to the listener. Therefore, we can see that functions are the basis and the groundwork of all language. As a person goes to learn and speak language they have functions as their basic stimulus so when they then hear a piece of language which co-occurs within the context of the function they associate them. Thus forms and functions are associated and their features are mapped onto each other. This is form-to-function mapping and it is the basis of language. Form-to-function mapping is based, at least in part, on the type of learning which we see in classical conditioning. There is an association between a specific function which we notice in the world and seems to be like, at least initially, an unconditioned stimulus, and the conditioned stimulus which is a piece of language itself. Associations are formed initially between functions and chunks of language. Because we know that stimulus that share features will associate with other things, such as the initial unconditioned stimulus (in this case the function) we can easily see how similar forms will be associated with single functions. And it works the other way around as well. Similar functions will be mapped onto singular forms simply because the functions have similar features. 2. How has the body of research into First Language Acquisition (FirLA) affected theories of Second Language Acquisition (SLA) and linguistics in general? The effect of FirLA theories on SLA has been tremendous. In fact it would be safe to say that almost all modern theories of SLA are based to varying degrees on theories of FirLA. This supports the much debated idea that FirLA and SLA are related if not the same (or even similar in some views) processes. In the 1970s and even into the 80s, based on similarities in results from both first and second-language acquisition studies, it was assumed that first language and second-language were really part of the same process. It was through this attachment to FirLA that SLA actually was able to develop into a field of study of its own. It should not be surprising then that the next step in the development of SLA was to break away from the shadow of FirLA studies. I am mentioning this because it should be clear to you as potential researcher in this field that theories are all a product of the political, cultural, and historical environment in which they are created. They are accepted or rejected, created and avoided not necessarily on their merit alone but on the prevailing political/social situation. The Development of SLA as a field of study is a good example of this. After initial studies closely linking SLA to FirLA, such as the morpheme order studies of Dulay and Burt, studies began to emerge focusing on differences and most of these differences were negative. The prevailing focus then became how try to account for why FirLA is a process which is universally successful (as if?) while SLA is a process which ultimately ends in universal failure. This is still the predominant position in UG-focused SLA studies, but exceptions are emerging from out of the field of bilingualism which sees things very differently. In more recent theories of SLA, however, it is FirLA which is primary and SLA which is seen as being based on a departure from FirLA. In effect, researchers working in the UG tradition no longer see FirLA and SLA as being the same although they might
share certain similarities. The amount of difference and the type of differences are a major points of contention among SLA researchers today. Few today argue that SLA operates totally differently from FirLA as both are seen as being the product of the same innate mechanisms to variant degrees according to the varying theoretical perspectives). What is weird, though, is that many teaching methods and certainly most learners and teachers still treat SLA as something totally different. This may be because they do not remember or fully understand the FirLa process and certainly are no up on the latest models of UG linguistics. 3. What are some of the major results of the theories developed in the 1970s? The major results of the theories of the 1970s are three. The first of these is that language acquisition, all language acquisition, is rule governed and systematic. Thus, children or students do not make errors, they only make errors when compared to the adult or native speaker norm. This leads us into the two additional results which are error analysis and Interlanguage. Error analysis, developed by Corder, claims that errors do not come primarily from L1 interference, but rater from an incomplete L2 system. As the learner learns more and internalizes more of the system the nature of their `errors` will change. Selinker was the first to be able to come up with a systematic description of the stages that learners go through in L2 acquisition. He coined the term Interlanguage and a new way of looking at SLA was born. Interlanguage makes use of the same basic idea of sequential (serial) development as the generative model’s view of FirLA. Learners are seen as moving through a series of stages in development, each of which is a self-contained and acceptable system at its moment of use, but further changes will impel the system forward into a new stage and so on until a stage of permanent equilibrium (fossilization) is attained. Thus, interlanguage is systematic in all its stages. It was, however, acknowledged of late that the SLA process, despite the underlying systematicity of interlanguage is still a highly variable process. Based on this, a new area of research was born where the causes and effects of variability within interlanguage were carefully investigated. 4. Which of the five components of Krashen’s Monitor Model seem to make the most sense? The most annoying thing about Krashen’s monitor model is that while parts of it make intuitive sense, much of it is theoretically vacuous in that it is very difficult to formulate or validate. Especially for us as both teachers and learners we can feel that many of the ideas have some degree of validity in our own learning/teaching process, but the model does not have a sound enough theoretical basis to actually be able to apply any of these ideas. The ideas presented are extremely vague and not linked to a wide range of various research which was actually existent at the time that Krashen was writing. The five components of Krashen’s monitor hypothesis are as follows and each has problems. -The acquisition-learning hypothesis -The monitor hypothesis -The natural order hypothesis -The input hypothesis -The affective filter hypothesis Each of these appeal strongly to our honed intuitions about SLA, but they all lack substance. They are vague and hard to qualify exactly. Likewise, it is hard to realize
them as a common theory in actuality. The main problem as I see it for us is that, due to the lack of any theoretical backing or specificity in these theories they are rendered it possible to apply. How can a teacher try to get her students to monitor their utterances effectively if we do not know how the monitor works or what it actually is? Similar questions can be applied to all the other aspects of Krashen's model. And then some things, like the input hypothesis, simply don't make very much sense in actual teaching. Looking at this the monitor model really underscores some of the basic problems of cognitive models in general. Learners are seen more like machines than as cognizant beings. They all seem to take on the shape of little black boxes, where we can't really go inside to see what's going on. This makes it actually very hard to effectively teach any of these ideas. His research, however, was a turning point in SLA research and many research areas, such as affect, were developed out of the monitor model. 5. Do you believe that acculturation and pidginization can really be linked? How does this reflect on a common problem of the theories discussed above? Although it might seem that within the area of SLA the transformation of a pidgin to a creole and second language learning are similar in a theoretical way, to argue that there are strong correlations between levels of pidginization and acculturation would seem to be too simplistic. This draws us to a common problem among many SLA theories, namely they are all ESL-based theories and, moreover, theories based on immigration to the inner core of the English-speaking world. Certainly, most English learning that is going on in the world is EFL, or ESL in the outer circle. The goal of English learning for most learners is not to be indistinguishable from educated native speakers from an inner circle country (for example, Canada, Australia, the US, the UK, Ireland, New Zealand). Most English learners wish to be bale to use English to meet personal or national goals. As a result, it is hard to simply transfer theories like acculturation and pidginization into our situation. At the same time, as mentioned in class in relation to the idea of pragmatic transfer, to be a competent user of a language one does in fact need to be familiar with some of the cultural norms of the target language, otherwise the language they produce will be pragmatically inappropriate. Extending the discussion to practicality, it would seem that many of these theories do not apply to our situation here for a variety of reasons, the most important of which is the amount and type of input the learners here are exposed to. The try to get our students to acculturate we need to provide a huge amount of input. Exposure would need to be intensive and well planned meaning all teachers of any given learner would somehow need to cooperate and work together to ensure that everything that student did would lead them towards this end goal. Although some students in Korea do get rather extensive exposure to English at times, it is hard to coordinate all their exposure as the teachers and institutions vary. Pidginization is similar in many ways to interlanguage in that it is formulated in a stepwise serial type of development. It is not as limited or negative in its view of the developmental process in that it focuses not only on errors as a way of determining a learner’s spot in the continuum, but looks as well at what the learner can do. In a way it is functional and not just structural and this means that we need to teach functionally, some thing that is not currently the norm in South Korea.
Johnson, Chapter 2: Behaviorism and second language learning
6. What is contrastive analysis and how does it work? The basic idea of Contrastive Analysis is that errors are seen as a highly negative aspect of the second language learning process. This is because contrast analysis, based on the ideas of behaviorism, sees errors as misbehaviors. On this view, the presence of errors could lead to long-term and permanent types of problems for the second language learner. In the basic behaviorist view, errors need to be stamped out as quickly as possible before they became an established routine or behavior. Likewise, errors were seen as being a main difference between first language and second language acquisition processes. The main purpose of contrast analysis was to develop a technique which would allow teachers to somehow deal with errors, eradicating them. In the Contrastive Analysis Hypothesis the learner’s over-all behavior and certainly their linguistic behavior is controlled/determined/dominated by the L1. Therefore, the learning of the L2 is an uphill battle in which a new system needs to be built in the overriding presence of a dominant system. Errors, for their part, arise based on the influence of the L1 on the L2. From this point of view then errors can either be prevented a priori or explained a posteriori through a careful analysis of the structural system of both the L1 and the L2. The pedagogical implications of contrastive analysis were strongly felt particularly with the publication of Lado (1957). Based on a detailed comparison of all the different structural systems of the two languages, teachers tried to predict where their students will have problems and where there will be no problems and, thus, the teacher will know exactly what to teach and what they do not need to teach. Materials were designed with this approach in mind and all was well until it was clear that the model did not work, not in the strong form anyway. There are several problems with the CAH. This view is related to behaviorism in that they both really only analyze the surface level of the language. There is no attention paid, or payable in the model, to what is actually happening in the mind of the learner. Only behavior is analyzed, hence the name behaviorism. Additionally, in order to follow this approach, the teacher needs to be an expert (really a linguist) in both languages and be able to carry out a detailed linguistic inquiry, which we know is neither practical nor possible. As a result a later, weaker version of the CAH came into being. In this weaker version, the goal was not to use the CAH to predict errors and stamp them out before they occurred, but to use an understanding of grammar to determine where existent errors were coming from so they could be eradicated quickly before they fossilized. It was a type of analysis and treatment model not unlike how practitioners of medicine work. Despite some differences in the actual implementation of the model, the strong version of the contrastive analysis and the weaker version both adhere to the same underlying principles. With the advent of so-called cognitive models, focusing on internal processing in the brain, as well as the actual failure of contrastive analysis to actually do what it proposed to do hastened the demise of the model. Interestingly, despite the vicious stamping out of behaviorism and denials on all fronts from teachers that they are in any way behavioristic in their approach, people still seem to use
elements of the CAH and behaviorism in their teaching. It seems to be an idea which has not entirely died yet. 7. What is error analysis and do you think there is a role for this in modern theories of SLA? When the idea of innateness started to be taken seriously by second-language researchers what was the weak version of contrastive analysis was soon developed into error analysis. In this way we can see error analysis as a development out of the weak version of contrastive analysis. The main difference between error analysis and contrastive analysis was that in the former errors were seen as coming not from first language interference as from an incomplete system (knowledge base) in the target language. Based on this, it is not hard to see how the idea of interlanguage developed directly out of error analysis. What error analysis basically claimed is that people make mistakes because they don't have full knowledge of the target linguistic system. Errors are based on problems with (incomplete) competence. In looking at errors, however, the researcher or teacher needs to make a distinction between those errors which are random and those which are systematic. Random (performance-based) errors were called mistakes while systematic (competencebased) errors were referred to as errors. This was an important step forward because researchers are finally able to address cognitive processes in language learners. At the same time errors were no longer seen as being evil properties of warped affected minds, but rather seen as a natural step in the innate process of language acquisition. While error analysis has also largely fallen by the wayside there are some elements of their analysis which we still might find useful today. The most important of these is that errors are not a bad thing. In fact we know today that errors are a natural event in the process of language learning. The distinction between errors and mistakes is also one which is very useful today particularly when giving feedback. By focusing on errors and not worrying about mistakes teachers get a better idea of what their students can actually do with the language. One major problem however with error analysis, and this unfortunately is still often present in foreign language teaching today, is that it focuses too much attention on deficits that students have, thus drawing attention away from the positive attributes of the students. Recently, however, error analysis has made a sort of comeback. James (1998) offers a more modern and up-to-date version of error analysis as it fits in with modern theories of SLA. The main problem I have with error analysis is that it not only focuses the teacher's attention away from the student’s global performance and places it only on what the student is doing which maybe isn't so great, but it also seems to be a rather vacuous endeavor. In my experience working with students researching error analysis, it seems virtually impossible to determine the actual origin of many different errors. I am not convinced that we can ever clearly say whether errors are coming from first language interference as opposed to an incomplete second-language system or really are the result of some other processing going on. We kind of unintentionally stumbled across this in class last night, but in looking at things like error analysis we are clearly confronted
with the issue of separation and integration in different linguistic systems. This is not generally something that SLA researchers engage in as it is usually thought to be the realm of bilingualism. From SLA it should be clear that secondlanguage acquisition people necessarily adhere to a strong separation between different linguistic systems. Such as separation is evident in the different theories which they propose and without such separation error analysis wouldn't exist. Therefore, when we think about error analysis at all we need to first try to determine for ourselves our position on the separation issue. 8. What effect have the morpheme order studies of Dulay and Burt had on the field of SLA? The morpheme order studies of the 1970s had a profound effect on mainstream second language acquisition theory. The most enduring of these effects was the idea that second language learning was indeed an innate, biologically determined process. This idea is still the prevalent idea today. The other major immediate effect of these morpheme order studies was that second language acquisition was seen as being equivalent to first language acquisition. While this idea is no longer prevalent it allowed researchers to begin to conduct studies on SLA in ways quite different than those conducted under behaviorism. Mainly, the studies began to focus on the cognitive processes underlying second language acquisition and little attention was paid to the environment in which learning was conducted. For a modern comprehensive view of morpheme order studies see Goldschneider and DeKeyser (2005).
References Barlow, M. and S. Kemmer. (Eds.) (2000). Usage-based models of language. Stanford, CA: CSLI Publications. Chomsky, N. (2000). New horizons in the study of language and mind. Cambridge: CUP. Fauconnier, G. (1997). Mappings in thought and language. Cambridge: CUP. Gazzaniga, M. (Ed.) (2000). Cognitive neuroscience. London: Blackwell. Goldschneider, J. and R. DeKeyser. (2005). Explaining the “natural order of L2 morpheme acquisition” in English: A meta-analysis of multiple determinants. Language Learning 55, Supplement 1, pp. 27-77. James, C. (1988). Errors in language learning and language use. Harlow: Longman. Lado, R. (1957). Linguistics across cultures. Ann Arbor: University of Michigan Press. Lepore. E. and Z. Pylyshyn. (Eds.) (1999). What is cognitive science? London: Blackwell.
de Saussure, F. (1959). Course in general linguistics. New York: McGraw-Hill. Terry, W. Scott. (2006). Learning and memory (3rd edition). Boston: Allyn and Bacon.