Smarter Balanced Confuses Fairness and Validity

Over the past two weeks, we’ve looked the ETS guidelines for fair assessments that PARCC adopted, as well as a sample item from PARCC. Now let’s turn to the “Bias and Sensitivity Guidelines” ETS developed for Smarter Balanced. While I can’t say that ETS’s guidelines for Smarter Balanced contradict those adopted by PARCC, they are different.

In the introduction, validity and fairness are equated: “if an item were intended to measure the ability to comprehend a reading passage in English, score differences between groups based on real differences in comprehension of English would be valid and, therefore, fair…. Fairness does not require that all groups have the same average scores. Fairness requires any existing differences in scores to be valid” (p. 6).

By this logic, since youth from higher-income homes, on average, have more academic and common knowledge than youth from lower-income homes, the test that conflates reading comprehension ability with opportunity to learn is perfectly fair. Valid I can agree with. Fair I cannot.

A couple pages later, further explanation is offered (p. 8):

Exposure to information

Stimuli for English language arts items have to be about some topic…. Which topics and contexts are fair to include in the Smarter Balanced assessments? One fairness concern is that students differ in exposure to information through their life experiences outside of school. For example, some students experience snow every winter, and some have never experienced snow. Some students swim in the ocean every summer, and some have never seen an ocean. Some students live in houses, some live in apartments, some live in mobile homes, and some are homeless.

Even though curricula differ, the concepts to which students are exposed in school tend to be much more similar than are their life experiences outside of school. If students have become familiar with concepts through exposure to them in the classroom, the use of those concepts as topics and contexts in test materials is fair, even if some students have not been exposed to the concepts through their life experiences. For example, a student in grade 4 should know what an ocean is through classroom exposure to the concept, even if he or she has never actually seen an ocean. A student does not have to live in a house to know what a house is, if there has been classroom exposure to the term. Similarly, a student does not have to be able to run in a race to know what a race is. Mention of snow does not make an item unacceptable for students living in warmer parts of the country if they have been exposed to the concept of snow in school.

Let’s pause here: “Even though curricula differ, the concepts to which students are exposed in school tend to be much more similar than are their life experiences outside of school.” Maybe. Maybe not.

It might be the case that all elementary schools teach snow, oceans, houses, races, and deserts. But does Smarter Balanced really test such banal topics? No. As far as I can tell from its sample items, practice tests, and activities for grades three to five, Smarter Balanced (like PARCC) tests a mix of common and not-so-common knowledge. Passages include Babe Ruth, recycling water in space, how gravity strengthens muscles, papermaking, the Tuskegee Airmen, tree frogs, murals, and much more.

The sample items strike me as comprehensible for third to fifth graders with broad knowledge, but I am highly skeptical that we can safely assume that children are acquiring such broad knowledge in their elementary schools.

As Ruth Wattenberg explained in “Complex Texts Require Complex Knowledge” (which was published in Fordham’s Knowledge at the Core: Don Hirsch, Core Knowledge, and the Future of the Common Core), students in the elementary grades have minimal opportunities to acquire knowledge in history and science. Reviews of basal readers in 1983 and 2003 revealed that they contained very little content. This would be a lost opportunity, not a serious problem, but for the fact that elementary schools tend to devote a substantial amounts of time to ELA instruction, and very little to social studies and science instruction. Wattenberg’s table (p. 35) should be shocking:

Grade and subject	1977	2000	2012
K–3 social studies	21	21	16
4–6 social studies	34	33	21
K–3 science	17	23	19
4–6 science	28	31	24

Even worse, Wattenberg found that “When elementary teachers were asked during what time period struggling students received extra instruction in ELA or math, 60 percent said that they were pulled from social studies class; 55 percent said from science class.”

In their home environments, the schools they attend, and the curriculum to which they are exposed, lower-income children do not have an equal opportunity to learn. As Smarter Balanced guidelines state, the assessment is fair “if students have become familiar with concepts through exposure to them in the classroom.” That’s a big if.

Making matters worse, Smarter Balanced (like PARCC) asserts that it’s just fine for some kids to have to learn during the test. Returning to the “Bias and Sensitivity Guidelines” (p. 8):

Information in the stimulus

A major purpose of reading is to learn about new things. Therefore, it is fair to include material that may be unfamiliar to students if the information necessary to answer the items is included in the tested material. For example, it is fair to test the ability of a student who has never been in a desert to comprehend an appropriate reading passage about a desert, as long as the information about deserts needed to respond to the items is found in the passage.

Last week, we explored how difficult it is to learn from one passage and how greatly such test items advantage students who already know the content that the passage is purportedly teaching. Smarter Balanced clearly disagrees with me. Here’s the introduction it its fourth grade Animal World activity:

The Classroom Activity introduces students to the context of a performance task, so they are not disadvantaged in demonstrating the skills the task intends to assess. Contextual elements include: an understanding of the setting or situation in which the task is placed, potentially unfamiliar concepts that are associated with the scenario; and key terms or vocabulary students will need to understand in order to meaningfully engage with and complete the performance task.

Please take a look at the activity—it assumes an enormous amount of knowledge. Even if it did not, the notion of learning and immediately demonstrating ability flies in the face of well-established research on human’s limited working memory capacity. There’s no getting around it: the students with relevant prior knowledge have a huge advantage.

One (sort of) positive note: I am cautiously optimistic that Smarter Balanced’s computer adaptive testing will help—a little. Here’s how it’s described:

Based on student responses, the computer program adjusts the difficulty of questions throughout the assessment. For example, a student who answers a question correctly will receive a more challenging item, while an incorrect answer generates an easier question. By adapting to the student as the assessment is taking place, these assessments present an individually tailored set of questions to each student and can quickly identify which skills students have mastered…. providing more accurate scores for all students across the full range of the achievement continuum.

In a hierarchical subject like math, the benefits of this adaptation are obvious. In reading, adaptation might help, but it might be misleading. Once a student has mastered decoding, what makes one passage “easier” to comprehend than another is driven primarily by the topic. If the student knows a lot about the topic, then factors like rare vocabulary (which isn’t rare to the reader with the relevant knowledge) and complex sentence structure are of little import. If a student does not know about the topic, then making the vocabulary and sentence structure easier will only help a little. The main way in which adaptive testing might be helpful is in varying the topics; “easier” passages would consist of more common topics, while more “challenging” passages would consist of less common, more academic topics. Then, if we examined the results carefully, we might see that a child lacks essential—teachable—academic knowledge.

Yet, I am only cautiously optimistic because the knowledge that drives reading comprehension is accumulated more haphazardly than hierarchically. One can have some academic knowledge while missing some common knowledge. A student whose grandparents lived most of their lives in Greece may know a great deal about ancient and modern Greece and be ready for a highly sophisticated passage comparing and contrasting ancient and modern Greece. That same student may have no knowledge of China, gravity, Harlem’s Jazz age, or other topics that might appear on the test. Without assessing topics that have been taught, I see no way to truly gauge a students’ comprehension ability (or what the teacher or school has added).

To reinforce the most important message—that comprehension depends on knowledge, and thus schools must systematically build knowledge—the tests need to be tied to the content taught or the high stakes need to be removed so schools will no longer take time out of regular instruction for test preparation.