Teaching
Main Learning Styles EDSE 415 PLSI School Climate Page
Shindler Index Classroom
Management
Theoretical Considerations in the Development of
“Authentic Assessment”
·
processes
·
content
skills
·
applied learning
·
social skills
·
problem solving
·
products

·
objective
·
valid and reliable
·
doable/efficient
·
desirable consequences
·
translate to a grade
·
define what “good” is
·
have a clear “task analysis” of your
objective
·
develop a sound instrument for assessment
·
develop a clear and effective process to
gather data
1. Define the performance
·
integrate the desired performance with course
instructional objectives.
·
“operationalize” the task and
clearly define the concept of a “quality” performance.
·
determine if the performance should be naturally
occurring or manufactured.
2. Select the most appropriate form of
assessment scale
*
checklist (i.e., performance is characterized by
can/can’t type outcomes)
*
combination of primary traits (i.e., performance is characterized by a
number of component parts, defining its critical attributes)
*
rubric scale (i.e., performance is characterized by a
hierarchical progression of quality and complexity as the performance is
mastered)
3.
Create the assessment criteria and scale to represent it
¨
develop a framework that any and all performances
can be placed within for clear and reliable scoring.
¨
determine the type of score/feedback most suitable.
*
holistic
(i.e., one score representing the complex elements of the performance)
*
primary
traits (i.e., a series of scores relating to each component of the performance)
*
narrative
(i.e., anecdotal feedback relating to salient aspects of the performance)
¨
communicate assessment criteria with participants,
and/or have the participants take part in the development of the criteria.
4. Prepare for sampling and technical
considerations
¨
how much of student’s work is necessary to
represent the whole?
¨
how can the procedure best be carried out
efficiently?
5. Address issues of reliability and bias
3 Conceptual Scale Types for use in
Performance Assessment
CHECKLISTS
YES/did NO/didn’t
_____
_____ task 1
_____ _____ task 2
_____ _____ task 3
_____ _____ etc...
Best
for performances that are defined by did or didn’t - there or not there
characteristics. These tasks need to be
observably evident and can not require interpretation.
PRIMARY TRAIT
SCALES
Best for performances and products that have a complex series of traits. If the definition of a “good . . .” can not be reduced to one holistic scale, separate traits must be determined, and this scale type is necessary.
|
|
Trait A |
Trait B |
Trait C |
Level 3 |
|
|
|
|
Level 2 |
|
|
|
|
Level 1 |
|
|
|
HOLISTIC
RUBRIC SCALE

This
scale is best for assessing performances and products that require an
interpretation of quality, and can be reduced to progressive levels of
caliber. The scale should represent
clear and concrete behaviors defining distinct levels
Conceptual Design for Holistic
Rubric Scale


Level 4
Level 3
![]()
Level 2
Level 1
Rules
for rubric construction:
1.
each level should be stated in positive,
behavioral terms
2.
each progressive level should be inclusive
of the last
3.
each level should be clear and distinct
from the last
4.
each line should represent specific
defining behavior(s)
5.
avoid negative behaviors unless absolutely
necessary
6.
the number of levels should reflect the
nature of the task
7.
label levels according to the needs of
student group
Which
Type of Rubric is Best? Exploring Various Structural
Options for Performance Assessment Scale Design
(Forum
Journal of Teacher Education, V12, n2 2002)
Introduction:
In recent years, the field of
education has incorporating an increasing amount of performance
assessment. As a result, there has been
a proliferation and legitimization of the use of assessment scales often called
scoring rubrics. Training in rubric
design has become common, and even many parents are becoming familiar with the
practice. This growing enthusiasm is not
surprising; the use of well-designed performance assessment procedures opens
many new assessment possibilities to today’s teacher.
The
benefits of rubrics to students can
be significant. Quality rubrics can
provide students with clear targets (Stiggins, 1994; Huffman, 1998). They can help students become more
self-directed and reflective (Luft, 1998), and feel a greater sense of
ownership for their learning (Branch, 1998).
Maybe most importantly, given that rubrics have the capacity to capture
complex performances, they provide the opportunity to assess many more student
outcomes than traditional objective methods – outcomes that are in many cases
very relevant, authentic and which involve real world applications.
Yet all performance assessment
scales (or rubrics) are not created
equal. A quality scale must first,
incorporate the best design option for the given task, and second, be
constructed “soundly”. There are a few
basic principles to consider when constructing or choosing a pre-made
scale. This article may be helpful to
teachers who want to be able to develop just the right scale for the situation,
and who want to be confident that their rubrics are having the educational benefits
that they desire.
An Operational Definition of Soundness
Assessment
in a very real and material way defines success for our students. This is especially true for rubrics. The soundness of our rubrics will effect how
instructional, fair, and motivational they are.
For an assessment procedure to be considered sound it must possess high degrees of validity, reliability, must
be able to be carried out efficiently, and must have a generally positive
academic and psychological affect on students.
Validity deals generally with the
degree to which any assessment method suits the job. The question validity asks is, “Does this
form of assessment capture the most relevant, essential, and inclusive set of
outcomes for a particular learning performance?” For this reason, rubrics
theoretically are often the most valid way to assess complex performances when
a reliable qualitative measurement is required.
However, there are many forms of rubrics and they all function
differently, and produce different results.
Issues
of reliability seem to be the primary focus of the academic community’s
examination of rubric usage (Crehan, 1998; Pophen, 1997), yet reliability is
just one aspect of holistic soundness (Myford, 1996; Shindler, 1999; Stiggins,
1994). Reliability generally deals with
how well any assessment method can obtain similar results over separate
applications, from one performance to the next, and from one student to the
next. Greater reliability is usually
generated by greater specificity of content.
Usually, the more concrete, precise and observable the language content,
the more reliable the rubric will be.
However, in the effort to gain specificity it may lead one to an over
emphasis on quantification. In the extreme,
this can cause problems with validity.
Many times the most important and essential
qualities of a performance do not lend themselves to quantification. The conception of the quality whole can be
lost in a list of amounts.
Given the already unreasonable
amount of work required of teachers, practices that are not efficient will not
be sustained for long. Teachers are just
too burdened by too many needs.
Constructing rubrics and then using them to assess individual
performance does take time. But usually
the time to construct a quality rubric pays for itself many times due to the
time saved clarifying and re-teaching the desired performance tasks. Still, the teacher must ask the question,
“When I am doing my performance assessment, what am I not doing?” The cost must be worth the benefit. For that reason, regardless of grade level,
having students help develop and then use the rubric for peer assessment can
make assessment very cost effective in the long run. It not only relieves the teacher of being the
sole instrument of evaluation, but having students assess themselves and one
another throughout the performance product or skill development process has
benefits beyond merely greater efficiency.
Any assessment procedure that claims
to be sound educationally must not only be valid, reliable and efficient, but
must have an overall positive influence on the student. We have traditionally viewed assessment as
“measuring what went on during the learning.”
In this view, assessment is a value-free abstraction. The reality is that every assessment practice
has an affect. It either improves or
detracts from students’ sense of motivation, control, worth, and belonging
within the group. Assessment, in a very
real way, defines the epistemological reality of the classroom. It tells us what knowledge is, and that which
is important in our learning. Assessment
can empower or erode each student’s basic psychology of achievement.
Rubrics as a practice can neither be
viewed as homogeneously sound or unsound.
In one case, an assessment procedure using a rubric could rate high on
all four of these areas of soundness. In another case, it could fail on all
fronts. These four areas of soundness
will be investigated within the context of a discussion of rubric design and
construction considerations.
General Guidelines for Scale Development:
To begin the process of developing a
performance assessment scale, it is best to start with the desired outcome or
learning objective. These can come from
state or district learning outcomes, or the teacher’s own curriculum. Given what we want our students to learn,
what task would best reflect that learning?
Quite often teachers are not ambitious enough at this point. With the right tool, there is very little
that we could not assess pretty soundly.
It might be useful to ask oneself the question, “What is the most
authentic and meaningful way that I could see my students learn ____, or show
they have learned ___ ?” This learning outcome can take the form of a
project, lab, product, report, presentation, piece of writing, or the process
of getting to an outcome. The
performance task can be done individually or in groups, and the assessment can
be applied to individuals or to groups.
To a great extent the soundness of
any performance assessment will be predicated on how well it can be “operationalized.” If a task, product, or process can be broken
down into a well-defined set of parts, it can be assessed. If the assessor can not define the “good
performance” before assigning the work or beginning to assess that work, the
assessment will be unsound. If the
qualities of a good performance can not be clearly outlined to students before
they begin the work, the assessment will not be useful to their learning, it
will not be perceived as fair, and it will not be reliable across multiple
performances. It will be no better than
a subjectively determined mark, the kind we remember at the top of many of our
papers over the years, the kind that was of little help to our learning, and
that most of us experienced as something of a personal gift or punishment.
So
let’s say the authentic task we chose to demonstrate the essential learning was
some kind of a project. We would need to
define very clearly each and every quality, component, and quantity that would
be need to be included in a fully successful performance. Again the more specific our language the
better our students can worry about the content and less about guessing what we
want. The less students have to guess,
the more they are in control of their learning, and consequently the more
motivated they will be, especially the typically low performing ones. Moreover, I have to ask myself, “If I can not
tell them specifically what I am looking for in the assignment, why did I
assign it?”
At this definition stage of the
process it can be effective on many levels to bring the students in on the
rubric construction. It can be as easy
as asking the class, “What needs to be included in a quality _____?” or “What
should a good _____ have in it?” Student
involvement in the process engenders a sense of ownership, which leads to a
greater investment of care and effort.
The process also reinforces in each student’s mind a definition of the
concept of a “quality performance.” Having
this advanced organizer introduced early in the performance construction process
can be a substantial learning and motivational tool.
The next consideration is the choice
of format or type of scale design that should be used. This is the step where too often we make
casual choices that can keep us from achieving the soundest results. A design poorly matched to the task can
perform awkwardly at best and be counterproductive at worst. If ultimately we need to attach a reliable,
justifiable, informative grade to the performance we are assessing, then we
must use some form of scale, either a checklist, holistic rubric, or a primary
trait rubric. It should be noted that
not all performance tasks should be formally graded, but if they are they
should be done so with a well-designed, sound scale. As mentioned earlier, if we do not give the
task a grade we make an implicit statement that it is less important learning
than that which we do grade.
Three
Scale Design Type Options;
Checklists,
Holistic Rubrics and Primary Trait Rubrics
CHECKLISTS
The most simple and most easily interpreted
scale design is that of a checklist. Checklists are best for performances that
are defined by “did or didn’t,” or “there or not there” characteristics. When the performance is defined by a series
of procedural steps, or a set of concrete components that need to be included,
and/or the list of possible behaviors is vast and can not easily be reduced to
a theme, then a check list is usually the most appropriate scale choice.
YES/did NO/didn’t
_____ _____ task/component 1
_____ _____ task/component 2
_____ _____ task/component 3
_____ _____ etc...
_____ _____ total score
Developing a checklist can be as
straightforward as the figure above suggests, but they also can pose potential
dilemmas. For instance, if the tasks and/or components are all listed uniformly
and given the same value, then this may belie their relative or differential
importance. If certain tasks are more
important to the overall quality of the performance, then assigning differential
point values is necessary or it is necessary to use another type of scale.
Checklists
are very popular due to their ease of construction, but they are limited. The
fundamental limitation is that by nature they cannot contain items that infer
grades of quality. For example, if one of our desired outcomes
were creativity, since creativity does not exist as an absolute and material
reality, we could not make a judgment that any performance was either entirely
creative or entirely not creative. In
that case, we would need to breakdown creativity into concrete, observable
sub-components, drop it from our checklist, or use another type of scale
design.
HOLISTIC RUBRIC SCALES
A holistic rubric
functions to capture a complex performance and then express it in ascending
grades of attainment. This type of scale
is best for assessing performances and products that require an interpretation
of quality and represent a “whole performance,” which is essentially greater
than the sum of its parts.
The
3 Structural Designs for Holistic Rubric Scales
Holistic
rubrics assume that all qualities in a product or performance can be reduced to
a single score. That score reflects the
level of the quality of that performance or product on an ascending scale –
from lowest to highest. Given these
assumptions, the following 3 structures can be used in the design of a holistic
rubric.
Figure 2A: Option A: Proportion of Desired Qualities
|
Level 5 |
All 5 traits |
|
|
|
The assessment
scale outlines a number of essential qualities or traits that need to be in a
“fully successful” product/performance.
|
|
|
Level 4 |
4 of 5 traits |
|
|
|
If all of the
qualities are shown in the work, the performance is scored the highest
possible. If some are missing then the
score reflects the number missing. The
score reflects qualities evident. |
|
|
Level 3 |
3 of 5 traits |
|
|
|
Advantages: It is the most cut and dried
format. Easy to use and for students
to understand. |
|
Level 2
|
2 of 5 traits |
|
|
|
Disadvantages: It can’t discriminate between the quality of the various traits. In most cases, you might as well use a
checklist. |
|
|
Level 1 |
1 of 5 traits |
|
|
|
|
|
Figure 2B: Option B: Good news bad news
|
++++++ level 4 |
All desired traits there, None of the undesired |
This assessment
scale outlines all likely outcomes and then arranges them into a scale where
the desired ones are more on top and the undesired ones are more on the
bottom. If a product/ performance had
only positive traits then it receives the highest score. |
|
++++- - level 3 |
Mostly desired traits,
few undesired |
If the performance possesses some of the prescribed negative traits
then it gets a lower score. The chart
to the left depicts this – all good, mostly good, mostly bad, all bad –
structure.
|
|
++ - - - level 2 |
Some desired
traits, many undesired |
Advantages: It includes the traits that characterize unwanted components in the
performance.
Disadvantages: Why include the unwanted aspects of a performance? It can reinforce negative behaviors that
you are trying to un-teach.
|
|
+ - - - - - level 1 |
Mostly undesired traits
|
|
Figure 2C: Option C: Each level inclusive of the last
Level 5
|
All-inclusive |
|
|
|
|
||||||
|
|
Level 4
Includes qualities
from 1,2, and 3+ |
|
Assessment outlines
all of the traits that are present in a quality performance and then places
them in order of ascending importance. Only positive qualities of a “good”
product are included |
||||||||
Level 3
includes |
qualities in levels
1 and 2+ |
|
Advantage: Gives students a good image of how
to progress up the levels of quality.
Creates a psychological mindset for success (moving-up). |
||||||||
Level 2
Includes the
qualities in L1+ |
|
|
Disadvantages: Problems with ordering the desired
traits. A “good” performance may violate a lower
level requirement. It does not include
what you don’t want included. |
||||||||
Level 1
Evidence of minimal
quality |
|
|
|
||||||||
PRIMARY TRAIT SCALES
A
primary trait scale is best for performances, processes and products that have
a complex series of traits and/or components.
If the definition of a “good performance” cannot be reduced to one
holistic entity, then a scale that contains separate traits must be used. The question here involves, “Is it possible
for a student to do very well on one aspect of the performance and very poorly
on another?” If so, a holistic rubric
may be technically impossible to construct and be deficient in providing the
specificity of feedback that a separate trait scale is capable of.
When constructing a primary trait
rubric it is useful to think of each of the separate traits as its own holistic
scale. Therefore, any of the 3 types of
thinking regarding holistic scales, (e.g., proportion of components,
wanted/unwanted content, or ascending quality) could be applied to a particular
trait. Yet as always, soundness requires
concrete, specific and as observable language, and a well-tailored design.
A generic example of a primary trait
scale is provided in figure 3 below. In
this example performance, the hypothetical designer chose organization,
content, and presentation as the categorical “traits” judged to be the
fundamental and essential areas that would thoroughly defined the qualities of
a successful whole.
|
Trait |
Organization |
Content |
Presentation |
|
level 4 |
Concrete specific
qualities defining all the desired aspects of a well-organized performance |
Concrete specific
qualities defining all the desired content necessary for an excellent
performance |
Concrete specific
qualities defining all the desired aspects of a fully successful
presentation. |
|
level 3 |
Specific qualities
that define a level below level 4 and greater than level 2. |
Specific qualities
that define a level below level 4 and greater than level 2. |
Specific qualities
that define a level below level 4 and greater than level 2. |
|
level 2 |
Level 1 plus
additional specific components or qualities. |
Level 1 plus
additional specific components or qualities. |
Level 1 plus
additional specific components or qualities. |
|
level 1 |
Concrete specifics
defining a minimum effort. |
Concrete specifics
defining a minimum effort. |
Concrete specifics
defining a minimum effort. |
|
level 0 |
Unacceptable. |
Unacceptable. |
Unacceptable. |
Given this design,
there is flexibility in assigning relative importance to each trait. Separate
traits can be given differing weights and thus point values. For example, in the scale above, organization
may be worth 4,3,2,1,0 points depending on the level achieved, whereas content,
may be given the corresponding values of 8,6,4,2,0 respectively. This weighting would make the statement to
students that content, in this case, was twice as meaningful as organization.
Each of the scale designs outlined
above could be combined and/or modified to suit the occasion. Yet, in most cases, our assessment needs lead
us to some form of one of them. It bears
repeating that our task - the manner that manner that we feel students could
best demonstrate their learning – should drive our choice. This is why beginning with a clear idea of
our objective and then sufficiently operationalizing the performance task are
such critical steps in the process.
While for those new to rubric construction, this process may seem too
complicated to attempt. I guarantee it
get easier each time you try it. Many
teachers find working with a partner can promote both confidence and soundness.
Use of Rubrics:
While
the popularity of rubrics is primarily due to their ability to achieve a very
fair and reliable means of measuring complex end product outcomes, they have
the additional benefits that bear mentioning.
First, as stated earlier, they are as much teacher as test. A soundly designed rubric is not only an
accurate tool to assess the outcome, but it can guide and motivate the learner
along the way. Second, rubrics can be
designed to assess process. It could be
said that only a rubric could do
so. Any system that attempts to make a
judgement as to the quality of a learner’s effort, progress, incremental
growth, affect, behavior, peer interactions, or developmental stage requires
some form of rubric. While there has always
been a well justified apprehension to assessing process and behavior, these are
powerful areas and assessed soundly can have powerful benefits. When asked to list the outcome that we want
our students to obtain from their education, the ones most critical for a good
life, most of us mention quite a few that are in the domain of processes and
behaviors. Sure, operationalizing
processes can be difficult. However, if
we are able to we can achieve benefits that no other educational process
can. What one finds when one assesses
process and behavior (or anything for that matter), is that if you assess it,
you get more and a better quality of it.
For example, teachers who have a sound system for assessing the quality
of cooperation find they have more cooperative students. The other benefit of assessing process is
psychological and its fruits can often only be seen over time. When we assess outcomes that are 100% within
the control of students (i.e., choices, behavior, application
to the process) it develops a cause and effect relationship in their minds
between what they put into a task and how they are rewarded. This is not generally true of traditional
methods of assessment. When a student
begins to trust that relationship between effort and outcome the enhanced sense
of internal locus of control is very motivating, and they develop the habit of
being self-responsible for their learning.
Conclusion:
Assessment is such a critical factor
in the instructional design process.
What we assess and attach a grade to in a very real and material way
defines success in our classes.
Well-constructed performance assessment rubrics can provide the capacity
to assess more meaningful and authentic outcomes. If we develop sound rubrics and procedures
for our assessment, we can profoundly affect student achievement.
Arter, J.
(1993) Designing Scoring Rubrics for
Performance Assessment: Getting to the Heart of the Matter. Paper Presented at
the Annual Meeting of the American Educational Research Association,
Branch, M., Grafelman, B., Hurelbrink, K.
(1998) Increasing Student Ownership and
Responsibility through Collaborative Assessment Process. Unpublished Report (ERIC Reproduction service
number ED424284).
Crehan, K., Hudson, R. (1998) A Comparison of Two Scoring Strategies for
Performance Assessment. Paper presented at the National Council on
Measurement in Education,
Goodrich, H.
(1997) Understanding Rubrics.
Educational Leadership, 54 (4) pp.
14-17
Huffman, E.
(1998) Authentic Rubrics. Art
Education, 51 (1) pp. 64-68.
Jensen, K. (1995) Effective Rubric
Design: Making the Most of this Powerful Tool. Science Teacher, 62 (5) pp. 72-75.
Myford, C. (1996) Constructing
Scoring Rubrics: Using “Facets” to Study Design Features of Descriptive Rating
Scales. Paper Presented at the Annual Meeting of the American Educational
Research Association,
Popham, W. (1997) What’s Wrong and What’s
Right – With Rubrics? Educational Leadership, 55 (3) pp.
72-75.
Taggart,
G., ed.; Phifer, S. ed.; Nixon, J, ed.; Wood, M. ed.
(1998) Rubrics
a Handbook for Construction and Use. Technomics
Publishing,
Wiggins, G. (1999) Educative Assessment. Designing Assessments To Inform and Improve
Student Performance. Jossey Bass.