Classroom Management Main
Page - EDEL 414 - EDSE 415
Examining the Soundness of Two
Collaborative Assessment Practices in Teacher Education Courses
John V. Shindler, Ph.D.
Division of Curriculum and Instruction
Charter College of Education
California State University, Los Angeles
jshindl@calstatela.edu
A paper presented at
the Annual Meeting of the American Educational Research Association, New
Orleans, LA, April, 2002
Abstract
Most often new
teachers default to the pedagogical practices that they themselves were exposed
to as teacher candidates. This point
was emphasized in a 1997 Report by NCATE (National Council for Accreditation of
Teacher Education), in which they stated, “Today’s teacher candidates will
teach tomorrow as they are taught today (p.1).” This methodological reproduction suggests an elevated need for
those of us in teacher education to model both sound as well as innovative
practice. While the field of educational
assessment has produced much innovation in the past decade, most assessment in
teacher education is still primarily individualistic.
If teacher education programs are to promote the value of collaboration within
their candidates, they must teach and model collaborative pedagogy within their
programs. The reticence for using more collaboratively
structured assessment methods may be that they are seen as less sound.
This study is a
qualitative examination of the soundness of two forms of collaborative assessment
within teacher education courses. The forms of assessment being investigated
are 1) collaborative or group exams, and 2) a system of collaborative,
interactive roundtable presentations.
The construct of soundness is
defined within a four-dimensional framework consisting of validity,
reliability, efficiency, and effect on the learner. Subjects (N=45, 46, 248)
were members of required methods courses.
Data consisted of participant surveys, focus group interviews, and
instructor participant observation. The
results of the study suggest that these collaborative assessment methods
compared favorably on all 4 dimensions of soundness. While conventional wisdom would call into question these method’s
ability to achieve reliable measurements and differentiation of student
performances as well as the ability to be performed as efficiently as more
traditional methods of assessment, participant surveys rated collaborative
methods slightly higher on each of these areas. Moreover, the data suggested that the benefits experienced by the
participants taking part in the collaborative methods were significant. Participants experienced a greater degree of
critical thinking, motivation to prepare, enjoyment of the assessment process,
and relationship with classmates, while reporting that they learned more in the
collaborative assessment conditions. A
discussion of findings and directions for how collaborative assessment might be
implemented into a course are included in the paper.
Examining the Soundness of Two Collaborative
Assessment Practices in Teacher Education Courses
Most often new
teachers default to the pedagogical practices that they themselves were exposed
to as teacher candidates. This point
was emphasized in a 1997 Report by NCATE (National Council for Accreditation of
Teacher Education), in which they stated, “Today’s teacher candidates will
teach tomorrow as they are taught today (p.1).” This methodological reproduction suggests an elevated need for
those of us in teacher education to model both sound as well as innovative
practice. While the field of
educational assessment has produced much innovation in the past decade, most
assessment in teacher education is still primarily individualistic. Current
standards from the paramount professional societies in teacher education
including NCATE, INTASC, and NBPTS hold collaboration
skills and dispositions as critical to a well-prepared teacher. For example, INTASC Principle #7,
Disposition, #3, states, “The teacher values planning as a collegial activity.” If teacher education programs are to promote
the value of collaboration within their candidates they must teach and model
collaborative pedagogy within their programs. The reticence for using more
collaboratively structured assessment methods may be that they are seen as less
sound.
This study is a
qualitative examination of the soundness of two forms of collaborative
assessment within graduate teacher education courses at two large state
universities with large teacher education programs. The forms of assessment being investigated are 1) collaborative
or group exams, and 2) a system of collaborative interactive roundtable
presentations. The construct of soundness is defined within a
four-dimensional framework consisting of validity, reliability, efficiency, and
effect on the learner. Collaborative
assessment is rarely used in teacher education and even less outside of
education (Antony, 1994). The reticence
is likely a result of both its unfamiliarity and the fear that it is not as sound as more traditional forms. This study examines each of these concerns,
and explores the technical requirements of collaborative assessment usage and
compares its soundness to more common methods.
In their limited
application, collaborative exams have been shown to improve content retention,
promote higher level thinking (Stearns, 1996; Yuretich, Khan, & Leckie,
2001), and increase the overall enjoyment of the course (Stearns, 1996). Interactive presentation formats have been
shown to have a similar set of effects (Hermann, 1995; MacDonald, 1989; Schumm,
1995). The collaborative element of the
assessments seems to promote a more thoughtful level of processing and more
creative work (Bohde, 1996). Moreover
both methods seem to provide a potentially more authenticity context, inasmuch
as “good teachers” have a greater tendency to plan collaboratively (Fullan,
1993).
This study
incorporates a four-dimensional theoretical framework for soundness that has
been shown to be conceptually as well as practically robust (Shindler, Yang,
Nephew & Keen, 2000). Within this
framework, any assessment practice can be considered sound to the degree that
it possesses validity, reliability, efficiency, and has a positive effect on
its users. Validity is defined by the
degree to which a method measures the most important concepts, matches the
content covered, and is the best-suited form of methodology to capture the
desired learning. Reliability could be
characterized by the degree to which a method can obtain an accurate
representation of the learning, both among raters (or hypothetical rates) and
across multiple performances.
Efficiency deals with how “doable” an assessment method is, and how well
it can be performed without either taking time away from other teaching and/or
other learning. The area related to the
effect on the learner could also be considered what has been termed
“consequential validity,” but is dealt with as a separate consideration
here. This dimension includes the motivational,
psychological and epistemological affects the assessment has on any learner
and/or the class as a whole. (See Appendix A for working definition of
soundness provided students)
The Two Study Assessment
Conditions
1. Cooperative Group Exams
Assessment Procedure:
Condition A: In this exam format, students are allowed to work together
to develop their response to written exam prompts, but each student’s exam is
evaluated individually. Students are allowed to choose their own groups, and
because there should have been a great deal of cooperative class work to this
point, they are familiar with one another and are in a good position to
purposefully select a team. Opting to work alone is allowed at any point in the
process, but is not encouraged. Prompts consist of items that require an
extensive amount of course content synthesis and application. Prior to the exam
period, exam guidelines and rubrics are provided outlining the target
requirements for content and degree of development necessary for maximum
credit. Actual questions are not provided until the date of the exam. The
intention of the task is to achieve a exam performance that is as close as
possible to an applied behavioral performance as can be obtained with pen and
paper.
Condition B: This format differs only in that groups submit only 1 set
of responses as a collective, and therefore each receives the same grade.
2. Roundtable Interactive Peer
Feedback Presentation Assessment:
Assessment Procedure: This
presentation format varies from the traditional presentation in that students
present their ideas to a series of smaller groups of peers in an interactive
roundtable format as opposed to standing in front of the entire class and
presenting with little or no interaction.
Each roundtable session lasts about 15 minutes. Students are asked to provide a brief
introduction and then peer groups are permitted to ask questions of the
presenter. A rubric outlining what
constitutes a quality presentation is included in the course syllabus (Appendix
C). Teacher assessment is obtained
within one of the peer group sessions.
In this session, the teacher is often required to ask questions that
elicit evidence of both the content of the presentation as well as the students
digestion of the critical issues related to their topic. Given that the presenters move from group to
group, roughly the same amount of time is required as that for traditional
presentations.
Participants
consisted of students from 2 graduate education courses for each study
condition (collaborative exam condition A: N=21, 25, condition B: N=122, 126;
roundtable presentation N= 22, 23). Participants in all groups were surveyed
after taking part in either of the respective assessment conditions. Surveys were
constructed to obtain a measure of students’ perceptions within each of the
four dimensions of the construct for soundness. Following each exercise, volunteer were recruited for
participation in focus group interviews.
In these focus group interviews, 5-8 students were asked to discuss
their experiences in more depth. For the collaborative exam condition B: focus
group samples of 12 were selected for each section. Being that the participants for each condition consisted of the
entire population of 2 required courses, the survey sample was considered
fairly representative of all students admitted to these graduate certification
programs. Moreover, the sample for the collaborative exams was obtained from
universities in two separate geographical regions of the U.S.
Results from the
survey and focus group data analysis (see data display below) showed findings
that in some respects confirmed previous research, yet were surprising in other
respects. In general, the collaborative
method conditions received much better ratings than the traditional
individualistic method conditions across all dimensions of soundness for both
treatment groups. The only exception being that of the collaborative essay
condition B which received higher marks on 3 of the 4 dimensions, falling below
on reliability.
Initially, when
considering implementing each of the conditions, researchers had little concern
with their fundamental validity, but did question their ability to obtain
reliable measures of performance. While
participants had mixed feelings about the reliability of the collaborative
exam, participants generally rated the reliability of both collaborative
methods equal to or higher than traditional methods. This finding suggests that
the primary concern for not using such practices, that students would feel that
their grade was unfairly obtained, was not generally reported by these
participants.
Possibly the most
significant study findings for teacher educators were the participants’
strongly positive feelings related to each assessment method’s “effect on the
learner.” These findings supports
previous research. For both methods,
students felt strongly that it “promoted critical thinking,” and “positive
relationships among class members.” For
the roundtable method, participants overwhelming felt that it was “more
enjoyable as an audience member,” and “they learned more about the other
members’ presentation.” For the
collaborative exam, participants reported “learning more in the process,” and
being “more motivated to study.” The
fact that in a collaborative condition students tried harder is something of a
surprise, given that many instructors would assume that students would take the
opportunity to “ride on each others’ coattails.” From the majority of accounts this was not the case with either
group of participants. In fact,
participants suggested they prepared more rigorously so that the would not “let
their group mates down.”
Study data is
displayed in this section 1) by survey mean for each of the four areas of
soundness, then 2) with a representative sample of participant comments from
the focus group interviews and survey comment sections, and finally 3) with the
participant observations of the instructor.
Survey means for reliability and validity are amalgamated from 3 items
each. The efficiency rating, the effect
on the learner rating, and overall soundness rating each reflect one item.
+--------------+-------{}----I--------------+--------------+
Much Better Better Even Worse Much Worse
Participant Comments:
·
“It
is about the same [in response the question, do you think this format is as
reliable?]”
·
“Because [the
instructor] could ask questions it made us have to be more prepared. I did not
want to look stupid. If I just
presented, I could talk about what I knew, but with the roundtable I had to be
ready for people asking me hard implementation questions, so I had to be more
prepared.”
+--------------+-----{}-----I--------------+--------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
Student
generally agreed with the statement that this format could produce a reliable
measure.
·
Most students did not have a strong feeling one
way or another. A few felt that they
thought that theoretically there should be a lack of reliability but none
expressed that they personally experienced a problem.
+--------------+-------------I-{}-----------+--------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
“I
do not want to be mean, but there were a couple of people in my group that did
not contribute at all.”
·
“I
think it was reliable because it was a good way to see what we actually knew as
opposed to a multiple-choice test like the midterm.”
·
“Some
might have done all the work for the group.”
Instructor Participant Observation - Reliability
As the participants suggested, there was little
difference in the reliability of the roundtables. In either case, the instructor would have used a clearly
developed rubric (See Appendix C) and would be present for each presenter. The difference in the two cases would be the
instructor’s ability to ask questions and listen to group generated
questions. This characteristic of the
roundtable puts more control in the hands of the examiners and forces the
presenter to defend and explain their ideas.
In this sense, it could be suggested that there is generally greater
reliability given the ability in this condition to determining what the
presenter knows through something of a cross-examination.
In the case of the collaborative exams,
condition A demonstrated an unexpectedly good ability to determine the
abilities of exam takers, and as expected condition B, showed little of such an
ability to discriminate. Because groups turned in one set of responses,
condition B fell prey to students who “rode the coattails” of their peers. However, in the cases where either all group
members performed well, or performed poorly, the exam did provide a
representative assessment of knowledge, preparation, or performance.
Nonetheless, in condition A, given the ability
to assess individual papers independently, there was a fairly good ability to
discriminate between the quality of each participants contribution. The responses of those who were more
prepared were clearly distinguishable from those less prepared, in most
cases. However, in groups where each
member transcribed the group answer, members became indistinguishable. This can
be reduced to some degree by instructions against using this strategy. Yet,
overall, in condition A, students attempting to ride coattails or fake their
way through were exposed pretty apparently.
In condition B, the area of reliability was a
definite liability. It was impossible
to discriminate one student’s contribution from another. It was clear that there were some 20 percent
of the students that were of little help to their group and may not have
prepared to any great extent.
2. Validity
+--------------+--{}---------I-------------+--------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
“I
learned more about the projects. Asking questions enabled us to help the
presenter with ideas or problems they had.”
·
“There was just more
discussion and processing.”
·
“I still like the control of
the (traditional) presentation.”
+---------{}-----+-------------I-------------+-------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
“The state standards
strongly suggest that teachers help create critical thinkers who can work well
with others. I think that if teachers themselves do this, the students can
model their behavior.”
·
“[This format] provides the
real world experience of working as a team (teachers, T.A.s, Principals).”
·
“Although the group and
individual outcomes may both be valid, I think the individual on his/her own
would arrive at a different solution if not influenced by group dynamics. The best way to assess an individual’s
knowledge is individually.”
+------------+{}-------------I--------------+---------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
“We
got to practice what we preach.”
·
“An
understanding of the content was clearer.”
·
“At
first I was hesitant about the exam, but after doing it I found that we could
not have come up with a test this good alone. So I learned a lot and got a lot
of encouragement for my ideas from the others.
It was validating.”
·
“I
am more comfortable doing things on my own – like I thought, just let me work
by myself – this was uncomfortable. But I thought too that in real life you
have to work with others like this and so I could see the value.”
Instructor Participant Observation - Validity
In all 3 cases, participants felt that a
collaborative format was more valid than an individual format. This could be seen not only from the survey
responses but from the verbal responses.
Participants enthusiastically expressed their delight with the methods.
As the comments suggest, participants found collaboration to be much more authentic. The roundtable format provided a venue to better process more
complex aspects of the assignment than a stand up presentation. Participants felt that ideas in education
are less often developed in a vacuum and more often are the result of
collaborative discussion. The process
could be observed to be more organic inasmuch as it was interactive and
iterative. Products grew out of a
generative process. This created a
higher quality of product as well as a more satisfied producer.
In the exam conditions, students were generally
surprised at what they found. They
expected to have to compromise, which happened to some degree, but what they
did not expect was how much better the quality of the ideas were that were
ultimately generated. If they had
guessed at their post-exam responses to the validity items they would not have
been as high as they were. As students
came up to turn in their exams they tended to be smiling. They felt very accomplished, especially
those who worked collaboratively with each item and did not divide the
labor. And it should be noted that exams
where students were more collaborative were better overall than those who
reported having certain members focus more on certain sections and paste
together a finished product.
3. Efficiency
+--------------+--------{}---I-------------+--------------+
Much Better Better Even Worse Much Worse
Participant
Comments:
·
“Smaller
groups. Questions from classmates promoted discussion.”
·
“It
helped you write your paper (and with your idea) you could sit down with people
and discuss it and find problems and get ideas so you could go back home and
make changes.”
·
“I think the fact that we
had started out the class working in cooperative groups helped make this work.”
·
“Maybe you could have a
person designated as the facilitator for each session, that way you could keep
people from wandering.”
·
“I
think the fact that I missed a couple people still bugs me.”
+--------------{}-------------I-------------+--------------+
Much Better Better Even Worse Much Worse
Participant Comments:
·
“[Great
way to get] feedback on your ideas.”
·
“This is a great way to
assess student’s knowledge of material especially where there is so much
material.”
·
“I think this format takes a
certain amount of [discipline] I could see my 6th graders taking
about everything but what they were supposed to be talking about at their
roundtable when I was not at their table.
And we did that too. . .”
·
“If the class was not
supportive like this one was, I don’t know if I would have been comfortable
doing this. I could not imagine
presenting like this with the people in my high school [when I was a student].”
In
each condition, the amount of work and coordination was about the same as that
for the types of assessment with which they are being compared. The roundtable takes the same amount of time
to do as regular presentations. The instructor
gets the same total time with each participant in the collaborative condition
as they would in the traditional condition.
But the fact that there are only 5-7 members in a group makes the
opportunity to ask questions much more convenient. So with respect to getting at what the student knew, and for
actually being of use in the thinking/writing process, the roundtable was more
effective. The drawback is that no matter how one does the logistics, some
students will not hear other student’s presentations. In the end, students can hear the introduction to all the
presentations, and can to take part in the roundtable portion for all but about
10-15 percent of their peers.
The
collaborative exam condition A, where each student turned in a set of responses,
is about the same logistically, after the exam, as if one had assigned the same
essay items to individuals. Before the
exam, there is a need to get students into groups and provide a set of study
guidelines (see Appendix B), but this also has the benefit of structuring the
exam preparation. So it is hard to tell if the amount of time is greater or
lesser.
The
primary reason that one would consider using the exam format in condition B,
(having groups produce one set of responses per group), it would seem, has to
do precisely with the issue of the efficiency or the shear quantity of work
involved for the instructor. Clearly, reading a set of responses by a whole
class of students is a lot of work. It
takes about 10-30 minutes apiece to read exams completely. Making the choice between using
collaborative exams and traditional essay exams with a manageable sized class
did not pose any conflict between areas of soundness. However, assessing 120 student poses a dilemma. Assessing 120 sets of essay responses is
unreasonable whether they were completed within a collaborative format or an
independent format. So the choice is to
do a collaborative exam where groups turn in one set of answers (producing
about 25 exams to grade), or to give an objective test. In this case, the choice was based on the
notion that soundness would be best served if a collaborative exam were used,
knowing that reliability was the price for gaining the other benefits desired.
4. Effect on Learner
+-------------{}-+-----------I-------------+--------------+
Much Better Better Even Worse Much Worse
Survey Item Means:
Enjoyed as an
audience member +1.6
Promoted positive
relationships +1.5
Caused more
critically thinking +1.2
Helped
in writing process +0.5
Motivational +0.3
Participant Comments:
·
“Smaller groups. Questions
from classmates promoted discussion.”
·
“Held small audience better.
Had to respond to Q and A you might not have thought of.”
·
“Socially I think you would
get to know people better.”
·
“[if
you have an interactive mechanism] It helps you think about your topic better.”
·
“I liked the familiarity of
this format over the other, because it promoted a different mindset.”
·
“I could not imagine doing
this with a class that was not supportive like this one was. If it was a hostile class, then I can’t
imagine. . .“
+------------{}--+-----------I-------------+--------------+
Much Better Better Even Worse Much Worse
Survey Item Means:
Promoted positive
relationships +1.6
Caused more
critically thinking +1.6
Learned
more in process +1.0
Motivational +0.5
Participant Comments:
·
“Helped me think the
questions out more, explain my thinking and therefore clarify my answers more.”
·
“Ownership of the material
(peer pressure) more likely to be prepared in order to not let the group down.”
·
“The process reinforced my
confidence in my knowledge of the content.”
·
“Fosters teamwork. Allows
for peer teaching.”
·
“Exchange of ideas. Reminder
of things/concepts learned, but temporarily forgotten.”
·
“The material was discussed,
debated, and then written, allowing students to develop a deeper
understanding.”
·
“Helped me understand how to
do a very worthwhile alternative assessment method.”
·
“[This format promoted many]
levels of skills, cognition, organization- group is bigger than sum of its
parts.”
·
“6 months down the road if
you tested us again, I think we would know the material better after going
through this process. I really think we
will remember it better.”
·
“It lets you know how the
children feel when you ask them to work collaboratively.”
·
“The [exam] seemed secondary
to the feelings I got working with the group.”.
The
most notable observation regarding how the collaborative conditions benefited
the students was that they did not foresee beforehand how the process would
effect them. Before the exam took
place, most students were either mildly optimistic or somewhat indifferent to
the thought of being assessed using a collaborative structure, but a good
number were uncomfortable with the idea.
This discomfort seemed to be most related to the methods being
“different” and odd, and also that they required one to work outside of his/her
comfort zone, especially in the case of the collaborative exam. It was not uncommon to hear questions such
as “why are we doing it this way?” or “I don’t see the purpose of doing
this.” But, in most cases this attitude
changed after they took part in the activity.
It was not uncommon to hear the comment after the exam, “I did not think
this was going to work, but it really did help me ___.“ Not all students were sold on the idea after
taking part in the assessment, but as the survey data suggests, they walked away
with a very positive impression of what they had done. I would guess that if this survey was given
to the participants before they had done it, and if they were asked to predict
their feelings about the methods, they would not have expressed nearly as
positive attitudes toward the idea of working collaboratively.
The
best analogy I can find to characterize most students’ feelings after
completing the collaborative exam (each condition), is that of being part of a
“winning team.” Succeeding as part of a
team, it could be said, may be more satisfying than succeeding as an
individual. Participants typically
expressed a very vivid sense of accomplishment after completing the
collaborative exam. This observation
reflects what could be seen as a stand-alone benefit of using such a system,
but it may also explain the homogeneously positive rating most participants
typically gave to the collaborative condition in general. That is to suggest, the feeling of “winning”
may potentially have influenced the objectivity of participants on their survey
ratings.
In
terms of the motivational influence, the roundtable appeared to be more
motivational due to the sense of accountability and responsibility. The collaborative exam also seemed to be
more motivational to most in each condition.
But there were a very few in condition A that “slacked” a bit (maybe 5%)
because they knew the others in their group would be prepared. However in
condition B, there were maybe 20-30 percent that did not prepare as
rigorously. An observation that was
made by this instructor and many students was that one the one hand, a
collaborative outcome is motivating to students with a high sense of group
responsibility and on the other hand, it can be an opportunity to ride on the
coattails of the better prepared for students with a low sense of group
responsibility.
5. Overall Soundness Rating
Participant Survey Ratings for Overall Soundness:
+--------------+----{}-------I-------------+--------------+
Much Better Better Even Worse Much Worse
Summary of Overall Survey Results – Roundtable Presentations
Reliability = Not a significant concern
(as might have been expected).
Validity = More authentic.
Helped in writing
process.
More engaging and
educational for audience.
Efficiency = About the same with improvement suggestions.
Benefits = Students worked just as hard or harder.
Promoted more collegial
environment.
Promoted higher levels
of critical thinking.
+--------------+---{}--------I-------------+--------------+
Much Better Better Even Worse Much Worse
Reliability = A hypothetical concern of
some, but not tangibly experienced by participants in condition A. Inability to
detect “slackers” was a significant problem in condition B.
Validity = More authentic given
nature of teacher work.
Efficiency = No real difference.
Benefits = Students worked just as hard or harder.
Promoted better
interpersonal relationships.
Promoted higher levels of critical thinking.