Classroom Management Main Page -  EDEL 414  -  EDSE 415

 

 

Examining the Soundness of Two Collaborative Assessment Practices in Teacher Education Courses

 

John V. Shindler, Ph.D.

Division of Curriculum and Instruction

Charter College of Education

California State University, Los Angeles

jshindl@calstatela.edu

 

A paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, LA, April, 2002

 

Abstract

 

Most often new teachers default to the pedagogical practices that they themselves were exposed to as teacher candidates.  This point was emphasized in a 1997 Report by NCATE (National Council for Accreditation of Teacher Education), in which they stated, “Today’s teacher candidates will teach tomorrow as they are taught today (p.1).”  This methodological reproduction suggests an elevated need for those of us in teacher education to model both sound as well as innovative practice.  While the field of educational assessment has produced much innovation in the past decade, most assessment in teacher education is still primarily individualistic. If teacher education programs are to promote the value of collaboration within their candidates, they must teach and model collaborative pedagogy within their programs. The reticence for using more collaboratively structured assessment methods may be that they are seen as less sound.

This study is a qualitative examination of the soundness of two forms of collaborative assessment within teacher education courses. The forms of assessment being investigated are 1) collaborative or group exams, and 2) a system of collaborative, interactive roundtable presentations.  The construct of soundness is defined within a four-dimensional framework consisting of validity, reliability, efficiency, and effect on the learner. Subjects (N=45, 46, 248) were members of required methods courses.  Data consisted of participant surveys, focus group interviews, and instructor participant observation.  The results of the study suggest that these collaborative assessment methods compared favorably on all 4 dimensions of soundness.  While conventional wisdom would call into question these method’s ability to achieve reliable measurements and differentiation of student performances as well as the ability to be performed as efficiently as more traditional methods of assessment, participant surveys rated collaborative methods slightly higher on each of these areas.  Moreover, the data suggested that the benefits experienced by the participants taking part in the collaborative methods were significant.  Participants experienced a greater degree of critical thinking, motivation to prepare, enjoyment of the assessment process, and relationship with classmates, while reporting that they learned more in the collaborative assessment conditions.  A discussion of findings and directions for how collaborative assessment might be implemented into a course are included in the paper.

 

 

Examining the Soundness of Two Collaborative Assessment Practices in Teacher Education Courses

 

Most often new teachers default to the pedagogical practices that they themselves were exposed to as teacher candidates.  This point was emphasized in a 1997 Report by NCATE (National Council for Accreditation of Teacher Education), in which they stated, “Today’s teacher candidates will teach tomorrow as they are taught today (p.1).”  This methodological reproduction suggests an elevated need for those of us in teacher education to model both sound as well as innovative practice.  While the field of educational assessment has produced much innovation in the past decade, most assessment in teacher education is still primarily individualistic.  Current standards from the paramount professional societies in teacher education including NCATE, INTASC, and NBPTS hold collaboration skills and dispositions as critical to a well-prepared teacher.  For example, INTASC Principle #7, Disposition, #3, states, “The teacher values planning as a collegial activity.”  If teacher education programs are to promote the value of collaboration within their candidates they must teach and model collaborative pedagogy within their programs. The reticence for using more collaboratively structured assessment methods may be that they are seen as less sound.

 

This study is a qualitative examination of the soundness of two forms of collaborative assessment within graduate teacher education courses at two large state universities with large teacher education programs.  The forms of assessment being investigated are 1) collaborative or group exams, and 2) a system of collaborative interactive roundtable presentations.  The construct of soundness is defined within a four-dimensional framework consisting of validity, reliability, efficiency, and effect on the learner.  Collaborative assessment is rarely used in teacher education and even less outside of education (Antony, 1994).  The reticence is likely a result of both its unfamiliarity and the fear that it is not as sound as more traditional forms.  This study examines each of these concerns, and explores the technical requirements of collaborative assessment usage and compares its soundness to more common methods.

 

In their limited application, collaborative exams have been shown to improve content retention, promote higher level thinking (Stearns, 1996; Yuretich, Khan, & Leckie, 2001), and increase the overall enjoyment of the course (Stearns, 1996).  Interactive presentation formats have been shown to have a similar set of effects (Hermann, 1995; MacDonald, 1989; Schumm, 1995).  The collaborative element of the assessments seems to promote a more thoughtful level of processing and more creative work (Bohde, 1996).  Moreover both methods seem to provide a potentially more authenticity context, inasmuch as “good teachers” have a greater tendency to plan collaboratively (Fullan, 1993).   


THEORETICAL FRAMEWORK FOR SOUNDNESS

 

This study incorporates a four-dimensional theoretical framework for soundness that has been shown to be conceptually as well as practically robust (Shindler, Yang, Nephew & Keen, 2000).  Within this framework, any assessment practice can be considered sound to the degree that it possesses validity, reliability, efficiency, and has a positive effect on its users.  Validity is defined by the degree to which a method measures the most important concepts, matches the content covered, and is the best-suited form of methodology to capture the desired learning.  Reliability could be characterized by the degree to which a method can obtain an accurate representation of the learning, both among raters (or hypothetical rates) and across multiple performances.  Efficiency deals with how “doable” an assessment method is, and how well it can be performed without either taking time away from other teaching and/or other learning.  The area related to the effect on the learner could also be considered what has been termed “consequential validity,” but is dealt with as a separate consideration here.  This dimension includes the motivational, psychological and epistemological affects the assessment has on any learner and/or the class as a whole. (See Appendix A for working definition of soundness provided students)

 

METHODS

 

The Two Study Assessment Conditions

 

1. Cooperative Group Exams

 

Assessment Procedure:

Condition A: In this exam format, students are allowed to work together to develop their response to written exam prompts, but each student’s exam is evaluated individually. Students are allowed to choose their own groups, and because there should have been a great deal of cooperative class work to this point, they are familiar with one another and are in a good position to purposefully select a team. Opting to work alone is allowed at any point in the process, but is not encouraged. Prompts consist of items that require an extensive amount of course content synthesis and application. Prior to the exam period, exam guidelines and rubrics are provided outlining the target requirements for content and degree of development necessary for maximum credit. Actual questions are not provided until the date of the exam. The intention of the task is to achieve a exam performance that is as close as possible to an applied behavioral performance as can be obtained with pen and paper.

 

Condition B: This format differs only in that groups submit only 1 set of responses as a collective, and therefore each receives the same grade.

           

2. Roundtable Interactive Peer Feedback Presentation Assessment:

 

Assessment Procedure: This presentation format varies from the traditional presentation in that students present their ideas to a series of smaller groups of peers in an interactive roundtable format as opposed to standing in front of the entire class and presenting with little or no interaction.  Each roundtable session lasts about 15 minutes.  Students are asked to provide a brief introduction and then peer groups are permitted to ask questions of the presenter.  A rubric outlining what constitutes a quality presentation is included in the course syllabus (Appendix C).  Teacher assessment is obtained within one of the peer group sessions.  In this session, the teacher is often required to ask questions that elicit evidence of both the content of the presentation as well as the students digestion of the critical issues related to their topic.  Given that the presenters move from group to group, roughly the same amount of time is required as that for traditional presentations.

 

Study Methods

 

Participants consisted of students from 2 graduate education courses for each study condition (collaborative exam condition A: N=21, 25, condition B: N=122, 126; roundtable presentation N= 22, 23). Participants in all groups were surveyed after taking part in either of the respective assessment conditions. Surveys were constructed to obtain a measure of students’ perceptions within each of the four dimensions of the construct for soundness.  Following each exercise, volunteer were recruited for participation in focus group interviews.  In these focus group interviews, 5-8 students were asked to discuss their experiences in more depth. For the collaborative exam condition B: focus group samples of 12 were selected for each section.  Being that the participants for each condition consisted of the entire population of 2 required courses, the survey sample was considered fairly representative of all students admitted to these graduate certification programs. Moreover, the sample for the collaborative exams was obtained from universities in two separate geographical regions of the U.S.

 

RESULTS

 

Results from the survey and focus group data analysis (see data display below) showed findings that in some respects confirmed previous research, yet were surprising in other respects.  In general, the collaborative method conditions received much better ratings than the traditional individualistic method conditions across all dimensions of soundness for both treatment groups. The only exception being that of the collaborative essay condition B which received higher marks on 3 of the 4 dimensions, falling below on reliability.

 

Initially, when considering implementing each of the conditions, researchers had little concern with their fundamental validity, but did question their ability to obtain reliable measures of performance.  While participants had mixed feelings about the reliability of the collaborative exam, participants generally rated the reliability of both collaborative methods equal to or higher than traditional methods. This finding suggests that the primary concern for not using such practices, that students would feel that their grade was unfairly obtained, was not generally reported by these participants.

 

Possibly the most significant study findings for teacher educators were the participants’ strongly positive feelings related to each assessment method’s “effect on the learner.”  These findings supports previous research.  For both methods, students felt strongly that it “promoted critical thinking,” and “positive relationships among class members.”  For the roundtable method, participants overwhelming felt that it was “more enjoyable as an audience member,” and “they learned more about the other members’ presentation.”  For the collaborative exam, participants reported “learning more in the process,” and being “more motivated to study.”  The fact that in a collaborative condition students tried harder is something of a surprise, given that many instructors would assume that students would take the opportunity to “ride on each others’ coattails.”  From the majority of accounts this was not the case with either group of participants.  In fact, participants suggested they prepared more rigorously so that the would not “let their group mates down.”

 

Date Display

 

Study data is displayed in this section 1) by survey mean for each of the four areas of soundness, then 2) with a representative sample of participant comments from the focus group interviews and survey comment sections, and finally 3) with the participant observations of the instructor.  Survey means for reliability and validity are amalgamated from 3 items each.  The efficiency rating, the effect on the learner rating, and overall soundness rating each reflect one item.

 

1. Reliability

 

Reliability – Roundtable: X= +0.4

+--------------+-------{}----I--------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “It is about the same [in response the question, do you think this format is as reliable?]”

 

·         “Because [the instructor] could ask questions it made us have to be more prepared. I did not want to look stupid.  If I just presented, I could talk about what I knew, but with the roundtable I had to be ready for people asking me hard implementation questions, so I had to be more prepared.”

 

 

Reliability – Collaborative Exam Condition A: X= +0.5

 +--------------+-----{}-----I--------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         Student generally agreed with the statement that this format could produce a reliable measure.

 

·         Most students did not have a strong feeling one way or another.  A few felt that they thought that theoretically there should be a lack of reliability but none expressed that they personally experienced a problem.

 

Reliability – Collaborative Exam Condition B: X=3

+--------------+-------------I-{}-----------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “I do not want to be mean, but there were a couple of people in my group that did not contribute at all.”

 

·         “I think it was reliable because it was a good way to see what we actually knew as opposed to a multiple-choice test like the midterm.”

 

·         “Some might have done all the work for the group.”

 

Instructor Participant Observation - Reliability

As the participants suggested, there was little difference in the reliability of the roundtables.  In either case, the instructor would have used a clearly developed rubric (See Appendix C) and would be present for each presenter.  The difference in the two cases would be the instructor’s ability to ask questions and listen to group generated questions.  This characteristic of the roundtable puts more control in the hands of the examiners and forces the presenter to defend and explain their ideas.  In this sense, it could be suggested that there is generally greater reliability given the ability in this condition to determining what the presenter knows through something of a cross-examination.

 

In the case of the collaborative exams, condition A demonstrated an unexpectedly good ability to determine the abilities of exam takers, and as expected condition B, showed little of such an ability to discriminate. Because groups turned in one set of responses, condition B fell prey to students who “rode the coattails” of their peers.  However, in the cases where either all group members performed well, or performed poorly, the exam did provide a representative assessment of knowledge, preparation, or performance. 

 

Nonetheless, in condition A, given the ability to assess individual papers independently, there was a fairly good ability to discriminate between the quality of each participants contribution.  The responses of those who were more prepared were clearly distinguishable from those less prepared, in most cases.  However, in groups where each member transcribed the group answer, members became indistinguishable. This can be reduced to some degree by instructions against using this strategy. Yet, overall, in condition A, students attempting to ride coattails or fake their way through were exposed pretty apparently.

 

In condition B, the area of reliability was a definite liability.  It was impossible to discriminate one student’s contribution from another.  It was clear that there were some 20 percent of the students that were of little help to their group and may not have prepared to any great extent.

 

2. Validity

 

Validity – Roundtable: X +0.8

 +--------------+--{}---------I-------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “I learned more about the projects. Asking questions enabled us to help the presenter with ideas or problems they had.”

 

·         “There was just more discussion and processing.”

 

·         “I still like the control of the (traditional) presentation.”

 

 

 

Validity – Collaborative Exam Condition A: X= +1.3

 +---------{}-----+-------------I-------------+-------------+

Much Better                Better                      Even                     Worse                  Much Worse

 

 

Participant Comments:

·         “The state standards strongly suggest that teachers help create critical thinkers who can work well with others. I think that if teachers themselves do this, the students can model their behavior.”

 

·         “[This format] provides the real world experience of working as a team (teachers, T.A.s, Principals).”

 

·         “Although the group and individual outcomes may both be valid, I think the individual on his/her own would arrive at a different solution if not influenced by group dynamics.  The best way to assess an individual’s knowledge is individually.”

 

 

Validity – Collaborative Exam Condition B:  X= +0.9

+------------+{}-------------I--------------+---------------+

Much Better              Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “We got to practice what we preach.”

 

·         “An understanding of the content was clearer.”

 

·         “At first I was hesitant about the exam, but after doing it I found that we could not have come up with a test this good alone. So I learned a lot and got a lot of encouragement for my ideas from the others.  It was validating.”

 

·         “I am more comfortable doing things on my own – like I thought, just let me work by myself – this was uncomfortable. But I thought too that in real life you have to work with others like this and so I could see the value.”

 

Instructor Participant Observation - Validity

In all 3 cases, participants felt that a collaborative format was more valid than an individual format.  This could be seen not only from the survey responses but from the verbal responses.  Participants enthusiastically expressed their delight with the methods. As the comments suggest, participants found collaboration to be much more authentic.  The roundtable format provided a venue to better process more complex aspects of the assignment than a stand up presentation.  Participants felt that ideas in education are less often developed in a vacuum and more often are the result of collaborative discussion.  The process could be observed to be more organic inasmuch as it was interactive and iterative.  Products grew out of a generative process.  This created a higher quality of product as well as a more satisfied producer.

 

In the exam conditions, students were generally surprised at what they found.  They expected to have to compromise, which happened to some degree, but what they did not expect was how much better the quality of the ideas were that were ultimately generated.  If they had guessed at their post-exam responses to the validity items they would not have been as high as they were.   As students came up to turn in their exams they tended to be smiling.  They felt very accomplished, especially those who worked collaboratively with each item and did not divide the labor.  And it should be noted that exams where students were more collaborative were better overall than those who reported having certain members focus more on certain sections and paste together a finished product.

 

3. Efficiency

 

Efficiency – Roundtable: X= +0.3

 +--------------+--------{}---I-------------+--------------+

Much Better              Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “Smaller groups. Questions from classmates promoted discussion.”

 

·         “It helped you write your paper (and with your idea) you could sit down with people and discuss it and find problems and get ideas so you could go back home and make changes.” 

 

·         “I think the fact that we had started out the class working in cooperative groups helped make this work.”

 

·         “Maybe you could have a person designated as the facilitator for each session, that way you could keep people from wandering.”

 

·         “I think the fact that I missed a couple people still bugs me.”

 

 

Efficiency – Collaborative Exam (both conditions) X= +1.0

 

 +--------------{}-------------I-------------+--------------+

Much Better              Better                   Even                       Worse                  Much Worse

 

Participant Comments:

·         “[Great way to get] feedback on your ideas.”

 

·         “This is a great way to assess student’s knowledge of material especially where there is so much material.”

 

·         “I think this format takes a certain amount of [discipline] I could see my 6th graders taking about everything but what they were supposed to be talking about at their roundtable when I was not at their table.  And we did that too. . .”

 

·         “If the class was not supportive like this one was, I don’t know if I would have been comfortable doing this.  I could not imagine presenting like this with the people in my high school [when I was a student].”

 

Instructor Participant Observation - Efficiency

In each condition, the amount of work and coordination was about the same as that for the types of assessment with which they are being compared.  The roundtable takes the same amount of time to do as regular presentations.  The instructor gets the same total time with each participant in the collaborative condition as they would in the traditional condition.  But the fact that there are only 5-7 members in a group makes the opportunity to ask questions much more convenient.  So with respect to getting at what the student knew, and for actually being of use in the thinking/writing process, the roundtable was more effective. The drawback is that no matter how one does the logistics, some students will not hear other student’s presentations.  In the end, students can hear the introduction to all the presentations, and can to take part in the roundtable portion for all but about 10-15 percent of their peers.

 

The collaborative exam condition A, where each student turned in a set of responses, is about the same logistically, after the exam, as if one had assigned the same essay items to individuals.  Before the exam, there is a need to get students into groups and provide a set of study guidelines (see Appendix B), but this also has the benefit of structuring the exam preparation. So it is hard to tell if the amount of time is greater or lesser.

 

The primary reason that one would consider using the exam format in condition B, (having groups produce one set of responses per group), it would seem, has to do precisely with the issue of the efficiency or the shear quantity of work involved for the instructor. Clearly, reading a set of responses by a whole class of students is a lot of work.  It takes about 10-30 minutes apiece to read exams completely.  Making the choice between using collaborative exams and traditional essay exams with a manageable sized class did not pose any conflict between areas of soundness.  However, assessing 120 student poses a dilemma.  Assessing 120 sets of essay responses is unreasonable whether they were completed within a collaborative format or an independent format.  So the choice is to do a collaborative exam where groups turn in one set of answers (producing about 25 exams to grade), or to give an objective test.  In this case, the choice was based on the notion that soundness would be best served if a collaborative exam were used, knowing that reliability was the price for gaining the other benefits desired.

 

4. Effect on Learner

 

Effect on Learner - Roundtable Aggregate X=1.2

 

 +-------------{}-+-----------I-------------+--------------+

Much Better              Better                   Even                       Worse                  Much Worse

 

Survey Item Means:

Enjoyed as an audience member                                           +1.6

Promoted positive relationships                                             +1.5

Caused more critically thinking                                               +1.2

Learned more about other presentations                               +1.2

Helped in writing process                                                       +0.5

Motivational                                                                             +0.3

 

Participant Comments:

·         “Smaller groups. Questions from classmates promoted discussion.”

 

·         “Held small audience better. Had to respond to Q and A you might not have thought of.”

 

·         “Socially I think you would get to know people better.”

 

·         “[if you have an interactive mechanism] It helps you think about your topic better.”

 

·         “I liked the familiarity of this format over the other, because it promoted a different mindset.”

 

·         “I could not imagine doing this with a class that was not supportive like this one was.  If it was a hostile class, then I can’t imagine. . .“

 

 

 

Effect on Learner - Collaborative Exam (both conditions aggregate) X=1.2

 

 +------------{}--+-----------I-------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Survey Item Means:

Promoted positive relationships                                 +1.6

Caused more critically thinking                                   +1.6

Learned more in process                                           +1.0

Motivational                                                                 +0.5

 

Participant Comments:

·         “Helped me think the questions out more, explain my thinking and therefore clarify my answers more.”

 

·         “Ownership of the material (peer pressure) more likely to be prepared in order to not let the group down.”

 

·         “The process reinforced my confidence in my knowledge of the content.”

 

·         “Fosters teamwork. Allows for peer teaching.”

 

·         “Exchange of ideas. Reminder of things/concepts learned, but temporarily forgotten.”

 

·         “The material was discussed, debated, and then written, allowing students to develop a deeper understanding.”

 

·         “Helped me understand how to do a very worthwhile alternative assessment method.”

 

·         “[This format promoted many] levels of skills, cognition, organization- group is bigger than sum of its parts.”

 

·         “6 months down the road if you tested us again, I think we would know the material better after going through this process.  I really think we will remember it better.”

 

·         “It lets you know how the children feel when you ask them to work collaboratively.”

 

·         “The [exam] seemed secondary to the feelings I got working with the group.”.

 

 

Instructor Participant Observation – Effect on Learner

The most notable observation regarding how the collaborative conditions benefited the students was that they did not foresee beforehand how the process would effect them.  Before the exam took place, most students were either mildly optimistic or somewhat indifferent to the thought of being assessed using a collaborative structure, but a good number were uncomfortable with the idea.  This discomfort seemed to be most related to the methods being “different” and odd, and also that they required one to work outside of his/her comfort zone, especially in the case of the collaborative exam.  It was not uncommon to hear questions such as “why are we doing it this way?” or “I don’t see the purpose of doing this.”  But, in most cases this attitude changed after they took part in the activity.  It was not uncommon to hear the comment after the exam, “I did not think this was going to work, but it really did help me ___.“  Not all students were sold on the idea after taking part in the assessment, but as the survey data suggests, they walked away with a very positive impression of what they had done.  I would guess that if this survey was given to the participants before they had done it, and if they were asked to predict their feelings about the methods, they would not have expressed nearly as positive attitudes toward the idea of working collaboratively.

 

The best analogy I can find to characterize most students’ feelings after completing the collaborative exam (each condition), is that of being part of a “winning team.”  Succeeding as part of a team, it could be said, may be more satisfying than succeeding as an individual.  Participants typically expressed a very vivid sense of accomplishment after completing the collaborative exam.  This observation reflects what could be seen as a stand-alone benefit of using such a system, but it may also explain the homogeneously positive rating most participants typically gave to the collaborative condition in general.  That is to suggest, the feeling of “winning” may potentially have influenced the objectivity of participants on their survey ratings.

 

In terms of the motivational influence, the roundtable appeared to be more motivational due to the sense of accountability and responsibility.  The collaborative exam also seemed to be more motivational to most in each condition.  But there were a very few in condition A that “slacked” a bit (maybe 5%) because they knew the others in their group would be prepared. However in condition B, there were maybe 20-30 percent that did not prepare as rigorously.  An observation that was made by this instructor and many students was that one the one hand, a collaborative outcome is motivating to students with a high sense of group responsibility and on the other hand, it can be an opportunity to ride on the coattails of the better prepared for students with a low sense of group responsibility.

 


 

5. Overall Soundness Rating

 

Participant Survey Ratings for Overall Soundness:

 

Overall Soundness - Roundtable X= +0.6

 +--------------+----{}-------I-------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Summary of Overall Survey Results – Roundtable Presentations

Reliability =   Not a significant concern (as might have been expected).

Validity =        More authentic.

                        Helped in writing process.

                        More engaging and educational for audience.

Efficiency =    About the same with improvement suggestions.

Benefits =      Students worked just as hard or harder.

                        Promoted more collegial environment.

                        Promoted higher levels of critical thinking.

 

Overall Soundness – Collaborative Exam (both conditions)  X= +0.7

 +--------------+---{}--------I-------------+--------------+

Much Better             Better                   Even                       Worse                  Much Worse

 

Summary of Overall Survey Results – Collaborative Exams

Reliability =    A hypothetical concern of some, but not tangibly experienced by participants in condition A. Inability to detect “slackers” was a significant problem in condition B.

Validity =        More authentic given nature of teacher work.

Efficiency =    No real difference.

Benefits =      Students worked just as hard or harder.

                        Promoted better interpersonal relationships.

                        Promoted higher levels of critical thinking.