| Title | Abstract | Author(s) | Date |
| Bulletin #25: College Readiness Indicators | This paper outlines current student-level indicators at the high school and middle school levels that predict college success. In this bulletin, indicators are divided into three categories: assessment scores (e.g., SAT® exam scores), transcript attributes (e.g., course rigor), and additional indicators (e.g., attendance) that impact achievement. | Cromwell, Ashley M.
Larsen McClarty, Katie
Larson, Sarah J. | 05-2013 |
| Bulletin #24: Learning Progressions: An Overview of Current Validation Methods | Learning progressions represent a set of skills or pieces of knowledge ordered sequentially from least to most complex. This sequence can guide instruction as well as assessment content. This paper describes several useful methods for validating learning progressions including validating the relationship between the progression and student knowledge as well as between the progression and associated assessments. | Soto, Amanda C.
Taylor, Melinda A. | 05-2013 |
| Bulletin #23: What is College and Career Readiness? State Requirements for High School Graduation and State Public University Admissions | This paper compares the minimum requirements for high school graduation in each state with admission requirements for the state’s main (or “flagship”) university campus. In 80% of the states, the high school graduation requirements do not meet the minimum standards necessary for admission to their own state universities. | Conforti, Peter A. | 05-2013 |
| Bulletin #22: What is College and Career Readiness? A Summary of State Definitions | This paper outlines each state’s definition of college and career readiness and shows whether they associate with the Common Core State Standards Initiative (CCSSI), ACT College and Career Readiness System (ACCRS), or both. The general definition that emerges can not only facilitate interstate discussions on multiple levels, but also provide schools, districts, educators, students, and other stakeholders with clear objectives to effectively prepare students for postsecondary endeavors. | Conforti, Peter A. | 05-2013 |
| A Universal Design for Learning-based Framework for Designing Accessible Technology-Enhanced Assessments | The increased capabilities offered by digital technologies offer new opportunities to evaluate students’ deeper knowledge and skills and on constructs that are difficult to measure using traditional methods. Such assessments can also incorporate tools and interfaces that improve accessibility for diverse students, as well as inadvertently introduce new accessibility barriers. Designing these technology-enhanced tasks according to universal design principles is one way to address these accessibility concerns, but requires a grounded understanding of students’ diverse abilities and the ways they interact with the tasks. A thorough consideration of the factors that impact construct validity, with an emphasis on identifying and eliminating sources of construct-irrelevant variance, is essential | Dolan, Robert P.
Burling, Kelly
Harms, Michael
Strain-Seymour, Ellen
Way, Walter (Denny)
Rose, David H. | 04-2013 |
| Computer-based Assessment of Collaborative Problem-Solving Skills: Human-to-Agent versus Human-to-Human Approach | Collaborative problem solving (CPS) is a critical competency for college and career readiness. Students emerging from schools into the workforce and public life will be expected to have CPS skills as well as the ability to perform that collaboration in various group compositions and environments. However, structuring standardized computer-based assessment of CPS skills, specifically for large-scale assessment programs, is challenging. The aim of this study was to explore patterns in student CPS performance and motivation in human-to-agent (H-A) settings compared to human-to-human (H-H) settings. One hundred seventy-nine 14-year-old students from the United States, Singapore, and Israel participated in the study. | Rosen, Yigal, Ph.D. Tager, Maryam | 03-2013 |
| Halo Effects and Analytic Scoring: A Summary of Two Empirical Studies | This document summarizes the results of a research study that examines rater halo and how much unique information is provided by multiple analytic scores. Specifically, that study investigated whether unique information is provided by analytic scores assigned to student writing beyond what is depicted by holistic scores and to what degree multiple analytic scores assigned by a single rater display evidence of a halo effect. The authors analyze scores assigned to middle-school student responses to an expository writing prompt that were assigned by six groups of raters—four groups assigned single analytic scores, one group assigned multiple analytic scores, and one group assigned holistic scores. The results suggest that there is evidence of a halo effect when raters assign multiple analytic | Lai, Emily R. Wolf, Edward W. Vickers, Daisy H. | 03-2013 |
| An Adaptive-within-Testlet Item Selection Method with Both Testlet Level and Test Level Content Balancing in CAT | The purpose of this study is to propose a heuristic item selection procedure, which selects a testlet and a subset within the selected testlet to be administered with consideration of content balancing and being adaptive within the testlet. | Chien, Yuehmei Shin Chingwei, David | 01-2013 |
| Research Services Quarterly Newsletter - Vol 5, No 3 | | | 11-2012 |
| Halo Effects and Analytic Scoring: A Summary of Two Empirical Studies | We address the issue of whether unique information is provided by analytic scores assigned to student writing, beyond what is depicted by holistic scores, and to what degree multiple analytic scores assigned by a single rater display evidence of a halo effect. | Lai, Emily R. Wolf, Edward W. Vickers, Daisy H. | 11-2012 |
| Where is the Value in Value-Added Modeling? | This paper first provides an overview of value-added modeling (VAM), including definitions and descriptions of three general types of value-added models. The paper then summarizes the rationales for and against incorporating VAM into an accountability framework. Finally, we provide a recommendation that VAM estimates be used in conjunction with other measures to form a composite. | Dan Murphy | 10-2012 |
| Online Scoring vs. Materials Scoring for Portfolio Assessments: An Exploration of Score Stability | The study is designed to investigate whether scores assigned to portfolio submissions are comparable between materials-based scoring and online scoring conditions, and to evaluate how
scorers perceive the ease of using the online scoring platform and in facilitating the scoring process. | Hua Wei | 10-2012 |
| Establishing an Evidence-based Validity Argument for Performance Assessment | Recent initiatives have proposed to use performance tasks in ambitious new ways, including monitoring student growth and evaluating teacher effectiveness. | Lai, Emily R. Wei, Hua Hall, Erika L. Fulkerson, Dennis | 09-2012 |
| Responses to Claims Raised by Walter Stroup | In response to claims made by Walter Stroup about the Texas statewide assessments, experts from Pearson's assessment team serving the Texas program authored this brief to enumerate the flaws in Dr. Stroup's conclusions, as well as highlight the strengths of Texas' system of standards and assessments. For media inquiries, contact Susan Aspey at communications@pearsoned.com or 800-745-8489 | Test, Measurement & Research Services | 08-2012 |
| Research Services Quarterly Newsletter - Vol 5, No 2, 2012 | | | 07-2012 |
| A Literature Review of Gaming in Education | This research report provides an overview of the theoretical and empirical evidence behind five key claims about the use of digital games in education. | McClarty, Katie Larsen
Orr, Aline
Frey, Peter M.
Dolan, Robert P.
Vassilev, Victoria
McVay, Aaron | 06-2012 |
| Value-Added Models in the Evaluation of Teacher Effectiveness: A Comparison of Models and Outcomes | This study compared five value-added models and illustrated the impact of model choice on the estimates of teacher effectiveness. | Wei, Hua Hembry, Tracey Murphy, Daniel L. McBride, Yuanyuan | 06-2012 |
| A Comparison of Three Content Balancing Methods for Fixed and Variable Length Computerized Adaptive Tests | The purpose of this study is to compare the WPM method to the WDM method under various conditions including the simple and complicated content constraint structure, different CAT settings such as item pool, item exposure control specification, and theta estimation options for both fixed- and variable-length CAT tests. | Shin, Chingwei David Chien, Yuehmei Way, Walter Denny | 04-2012 |
| Improving Text Complexity Measurement through the Reading Maturity Metric | The purposes of this paper are to describe how Word Maturity has been incorporated into Pearson’s text complexity measure, to present initial comparisons between this new measure of text complexity and traditional readability measures, and to address measurement issues in the development and use of text complexity measurements. | Landauer, Tom
Way, Walter D. | 04-2012 |
| The Case for Performance-Based Tasks without Equating | This paper proposes a model for performance-based assessments that assumes random selection of performance-based tasks (PBTs) from a large pool, and that assumes tasks are comparable without equating PBTs. | Way, Walter D.
Murphy, Daniel
Powers, Sonya
Keng, Leslie | 04-2012 |
| Assessing 21st Century Skills: Integrating Research Findings | This paper synthesizes research evidence pertaining to several so-called 21st century skills: critical thinking, creativity, collaboration, metacognition, and motivation. | Lai, Emily R.
Viering, Michaela | 04-2012 |
| Creating Curriculum-Embedded, Performance-Based Assessments for Measuring 21st Century Skills in K-5 Students | This paper will share the author’s experiences working with a large and diverse school district to design curriculum-embedded, performance-based assessments (PBAs) that measure 21st century skills in K-5 students. | Lai, Emily R. | 04-2012 |
| Research Services Quarterly Newsletter - Vol 5, No 1, 2012 | | | 04-2012 |
| Linking Two Assessment Systems Using Common-Item IRT Method and Equipercentile Linking Method | When states move from one assessment system to another, it is often necessary to establish a concordance between the two assessments for accountability purposes. The purpose of this study is to model two alternative approaches to transitioning performance standards, both of which can be executed using data from regularly scheduled operational administrations. | Kirkpatrick, Rob Turhan, Ahmet Lin, Jie A | 04-2012 |
| The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model | This study investigates the impact of IPC in the context of operational testing programs that employ the 3PL model, alternative equating procedures, and different item re-use policies. | Meyers, Jason L.
Murphy, Stephen
Goodman, Joshua
Turhan, Ahmet | 04-2012 |
| Putting Ducks in a Row: Methods for Empirical Alignment of Performance Scoring | Using historical state data, this report evaluates nine different methods of aligning performance standards and discusses the effects of selecting different methods as well as the potential implications for interpretations of student progress and school success. | McClarty, Katie Larsen Murphy, Daniel Keng, Leslie Turhan, Ahmet Tong, Ye | 04-2012 |
| Population Invariance of Vertical Scaling Results | In this report, the population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3–10 and a state mathematics test spanning grades 3–8. | Powers, Sonya Turhan, Ahrmet Binici, Salih (Florida State University) | 04-2012 |
| Connecting English Langage Learning and Academic Performance: A Prediction Study | The purpose of this study was to investigate the use of English language proficiency and academic reading assessment scores to predict the future academic success of English learner (EL) students. | Kong, Jadie Powers, Sonya Starr, Laura Williams, Natasha | 04-2012 |
| Bulletin #21: Evidence Based Standard Setting: Establishing Cut Scores by Integrating Research Evidence with Expert Content Judgments | In this bulletin, we describe the processes and practices associated with Evidence Based Standard Setting, which draw directly from the concept of evidence-based medicine. | Beimers, Jennifer N. Way, Walter D. McClarty, Katie Larsen Miles, Julie A. | 01-2012 |
| Research Services Quarterly Newsletter - Vol 4, No 4, 2012 | | | 01-2012 |
| Pearson Feedback: Development of Open Technology Standards | This report includes Pearson’s feedback to the recent post by the U.S. Department of Education (ED) concerning the development of open technology standards for managing and delivering student assessments and assessment results.” | Jon S. Twing, PhD | 11-2011 |
| Research Services Quarterly Newsletter - Vol 4, No 3, 2011 | | | 10-2011 |
| Overview of Student Growth Models | This paper provides an overview of student growth modeling and describes how states use student growth models in the federal accountability system. | O'Malley, Kimberly J., Ph.D.
Murphy, Stephen, Ph.D.
Larsen McClarty, Katie, Ph.D.
Murphy, Daniel, Ph.D.
McBride, Yuanyuan, Ph.D. | 09-2011 |
| Research Services Quarterly Newsletter - Vol 4, No 2, 2011 | | | 07-2011 |
| Critical Thinking: A Literature Review | Critical thinking includes the component skills of analyzing arguments, making inferences using inductive or deductive reasoning, judging or evaluating, and making decisions or solving problems. | Lai, Emily R. | 06-2011 |
| Collaboration: A Literature Review | Collaboration is the “mutual engagement of participants in a coordinated effort to solve a problem together.” | Lai, Emily R. | 06-2011 |
| Pearson's Text Complexity Measure | Pearson's Knowledge Technologies group has developed a new measure of text complexity that is fundamentally different from current readability measures. | Landauer, Thomas K. | 05-2011 |
| Bulletin #20: Performance-based Assessment: Some New Thoughts on an Old Idea | The purpose of this bulletin is to review arguments in favor and arguments against the use of performance-based assessments to assess student achievement in light of proposed test score uses | Lai, Emily R. | 05-2011 |
| Top Ten: Transitioning English Language Arts Assessment | This document is designed to identify the top ten considerations that states and consortia will need to address as they plan the transition of their ELA assessments. | Becker, Delise
Bay-Borelli, Michael
Brinkerhoff, Lee
Crain, Kellie
Davis, Laurie
Fuhrken, Charles
Hartmann, Tiffany
Larkin, Jay
O’Malley, Kimberly
Trevvett, Suzanne
| 05-2011 |
| Pearson's Automated Scoring of Writing, Speaking, and Mathematics | This document describes several examples of current item types that Pearson has designed and fielded successfully with automatic scoring. | Streeter, Lynn
Bernstein, Jared
Foltz, Peter
DeLand, Donald
| 05-2011 |
| Impact of Group Differences on Equating Accuracy and the Adequacy of Equating Assumptions | This study compared four curvilinear equating methods including frequency estimation, chained equipercentile, IRT true score, and IRT observed score equating. | Powers, Sonya | 04-2011 |
| Motivation: A Literature Review | Motivation refers to reasons that underlie behavior that is characterized by willingness and volition. | Lai, Emily R. | 04-2011 |
| Comparing Methods for Detecting Unstable Anchor Items with Net DIF and Global DIF Conceptions | This study is to compare different approaches for detecting misbehavior anchor items in IRT equating using Rasch and partial credit models. | Lau, C. Allen
Arce, Alvaro J. | 04-2011 |
| Expanding the Model of Item-Writing Expertise: Cognitive Processes and Requisite Knowledge Structures | In this paper, we expand the cognitive model of item writing to not only include cognitive processes but to also include requisite knowledge structures used by item writers. | Fulkerson, Dennis (Pearson)
Nichols, Paul (Center for Assessment)
Snow, Eric (SRI International)
| 04-2011 |
| Bulletin #19: Making Sense of the Metrics: Student Growth, Value-added Models and Teacher Effectiveness | The goal of this bulletin is to define student growth, value-added models, and teacher effectiveness, three terms often confused. | O'Malley, Kimberly
McClarty, Katie
Magda, Tracey
Burling, Kelly
| 04-2011 |
| Metacognition: A Literature Review | Metacognition is defined most simply as "thinking about thinking." | Lai, Emily R. | 04-2011 |
| Through-Course Common Core Assessments in the United States: Can Summative Assessment Be Formative? | In this paper, we present a design for enhancing the formative uses of summative through-course assessments. | Way, Walter D.
Larsen McClarty, Katie
Murphy, Dan
Ken, Leslie
Fuhrken, Charles | 04-2011 |
| Research Services Quarterly Newsletter - Vol 4, No 1, 2011 | | | 04-2011 |
| Application of Latent Trait Models to Identifying Substantively Interesting Raters | This study demonstrates how existing latent trait modeling procedures can identify groups of raters who may be of substantive interest to those studying the experiential, cognitive, and contextual aspects of ratings. | Wolfe, Edward W.
McVay, Aaron | 04-2011 |
| Investigating Content and Construct Representation of a Common-item Design When Creating a Vertically Scaled Test | This study investigated how well the guideline of content and construct representation was maintained while evaluating two stability assessment criteria (Robust z and 0.3-logit difference). | Hardy, M. Assunta (BYU)
Young, Michael J. (Pearson)
Yi, Qing (Pearson)
Sudweeks, Richard R. (BYU)
Bahr, Damon L. (BYU) | 04-2011 |
| Statistical Properties of 3PL Robust Z: An Investigation with Real and Simulated Data Sets | The purpose of this paper was to inspect statistical properties of the robust z approach in the context of 3PL equating with the common item non-equivalent group design. | Arce, Alvaro J.
Lau, C. Allen | 04-2011 |
| Comparison of Asymptotic and Bootstrap Item Fit Indices in Identifying Misfit to the Rasch Model | In this study, our results indicate that bootstrap critical values allow for greater statistical power in diagnosing item misfit caused by varying item slopes and lower asymptotes. | Wolfe, Edward W.
McGill, Michael T. | 04-2011 |
| Does Size Matter? A Study on the Use of Netbooks in K-12 Assessments. | In this paper, we analyze a study conducted during the spring 2010 administration of the Texas End-of-Course (EOC) assessments to evaluate the feasibility of using netbooks in the context of K-12 assessments. | King, Leslie
Kong, Xiaojing Jadie
Bleil, Bryan
| 04-2011 |
| Investigating Common-Item Screening Procedures in Developing a Vertical Scale | Creating a vertical scale involves several decisions on assessment designs and statistical analyses to determine the most appropriate vertical scale. | Johnson, Marc
Yi, Qing | 04-2011 |
| Considerations for Performance Scoring When Designing and Developing Next Generation Assessments | This white paper explores the interactions between test design and scoring approach, and the implications for performance scoring quality, cost, and efficiency in next generation assessments. | Jones, Marianne
Vickers, Daisy | 03-2011 |
| Cognitive Lab Evaluation of Innovative Items in Mathematics and English Language Arts Assessment of Elementary, Middle, and High School Students | This research report examines a study in which a set of prototype items were developed to align with specific Common Core State Standards and administered to students in a series of cognitive labs. The report details results and offers implications and recommendations for future use. | Dolan, Robert P.
Goodman, Joshua
Strain-Seymour, Ellen
Adams, Jeremy
Sethuraman, Sheela | 03-2011 |
| Assessment Technology Standards | Pearson’s response to the United States Department of Education’s Request for Information. | Twing, Jon | 01-2011 |
| Research Services Quarterly Newsletter - Vol 3, No 4, 2011 | | | 01-2011 |
| Considerations For Developing Test Specifications For Common Core Assessments | The purpose of this paper is to describe the role that test specifications play in supporting the development of valid and reliable large-scale summative academic achievement assessments. | Bay-Borelli, Michael
Rozunick, Christine
Way, Walter "Denny"
Weisman, Eric | 12-2010 |
| Bulletin #18: An Investigation of an Assessment-Centered Learning Environment with Formative Use | Learning environments centered on assessments provide opportunities for feedback that yield information with potential benefit for improving learning and instruction. | Arce, Alvaro J. | 12-2010 |
| Thoughts on an Assessment of Common Core Standards | ETS, Pearson, and the College Board have collaborated in this paper to raise key assessment design questions and discuss some ideas for a systematic high-level assessment design that satisfies many of the needs expressed by stakeholders. | Camara, Wayne
Lazer, Stephen
Mazzeo, John
Sweeney, Kevin
Twing, Jon
Way, Walter "Denny" | 10-2010 |
| Research Services Quarterly Newsletter - Vol 3, No 3, 2010 | | | 10-2010 |
| Rater Effects as a Function of Rater Training Context | This study examined the influence of rater training and scoring context on the manifestation of rater effects in a group of trained raters. | Wolfe, Edward W.
McVay, Aaron | 10-2010 |
| A Cognitive Lab Report for the American Diploma Project Algebra I End-of-Course Exam | This cognitive lab study was an exploratory study that allowed for an in-depth investigation of students’ familiarity with ADP Algebra I Exam items as well as the strategies that students are engaging in when attempting to solve the problems. | Test, Measurement & Research Services | 10-2010 |
| A Cognitive Lab Report for the American Diploma Project Algebra II End-of-Course Exam | This cognitive lab study was an exploratory study that allowed for an in-depth investigation of students’ familiarity with ADP Algebra II exam items as well as the strategies that students are engaging in when attempting to solve the problems. | Test, Measurement & Research Services | 10-2010 |
| Next-Generation Assessment Interoperability Standards | The intent of this document is to elevate awareness and understanding of the importance of assessment interoperability standards and to begin addressing the necessary evolution of these standards to support next-generation assessments. | Dolan, Bob
Strain-Seymour, Ellen
Deokar, Ashman
Ostler, Wayne | 10-2010 |
| Universally Designed Computer Based Testing: UD-CBT Guidelines | This report's table of contents is hyperlinked for your convenience. These Universally Designed Computer Based Testing Guidelines aim to help item and test developers understand the cognitive processes involved in interacting with different item stimuli and response methods and, thereby, help identify sources of construct irrelevant variance. | Burling, Kelly
Dolan, Bob
Hanna, Elizabeth
Harms, Michael
Nichols, Amy
Strain-Seymour, Ellen
Way, Denny
In collaboration with CAST | 10-2010 |
| An Exploratory Teacher Survey Related to the American Diploma Project Algebra I End-of-Course Exam | A survey-based exploratory study was conducted to better understand the gaps between curriculum and instruction, and what content knowledge is expected of students on the American Diploma Project (ADP) Algebra I End-of-Course Exam. | Test, Measurement & Research Services | 09-2010 |
| An Exploratory Teacher Survey Related to the American Diploma Project Algebra II End-of-Course Exam | A survey-based exploratory study was conducted to better understand the gaps between curriculum and instruction, and what content knowledge is expected of students on the American Diploma Project (ADP) Algebra II End-of-Course Exam. | Test, Measurement & Research Services | 09-2010 |
| Bulletin #17: A Comparison of Distributed and Regional Scoring | Distributed scoring provides access to a wider pool of readers than those that could be included through regional scoring alone, thereby allowing for a larger number of readers to be recruited and permitting greater selectivity in reader recruitment. This has the potential for increased efficiency in training time for readers and could facilitate shorter turnaround times in performance scoring, which would, in turn, shorten the time between test administration and the reporting of test scores. | Keng, Leslie
Davis, Laurie L.
Ragland, Shelley | 09-2010 |
| American Diploma Project Algebra II End-of-Course Exam: Standard Setting Briefing Book | The Briefing Book includes an overview of 1) the American Diploma Project, 2) the ADP
Algebra II End-of-Course Exam, 3) the standard setting process, and 4) the validity studies conducted to
inform standard setting. Concurrent, cross sectional, and judgment studies are also included. | Pearson | 07-2010 |
| Research Services Quarterly Newsletter - Vol 3, No 2, 2010 | | | 07-2010 |
| Bulletin #16: Pearson’s Automated Scoring | Pearson’s automated scoring
technology, the Intelligent Essay
Assessor (IEA), delivers fast,
accurate, and valid assessment
scores. | Knowledge Technologies | 07-2010 |
| Automated Scoring for the Assessment of Common Core Standards | This paper discusses automated scoring as a means for helping to achieve valid and efficient
measurement of abilities that are best measured by constructed-response (CR) items. | Williamson, David M.
Bennett, Randy E.
Lazer Stephen
Bernstein, Jared
Foltz, Peter W.
Landauer, Thomas K.
Rubin, David P.
Way, Walter D.
Sweeney, Kevin | 07-2010 |
| Bulletin #15: Capturing Item Writers’ Expertise | The efficient development of quality innovative items may be hindered by inexperienced item writers who are not familiar with the challenges and nuances of innovative item types. The study of expert item writers offers the possibility of capturing and “bottling” the knowledge and skills acquired by these experts over years of hard work. | Fulkerson, Dennis Nichols Paul | 06-2010 |
| Investigating Approaches to Estimate an Individual's Strand/objective Score Profile Reliability: A Monte Carlo Study | The paper studies performance of generalizability and classical test theory reliability approaches to estimate reliability of an individual's strand/objective score profile. | Arce-Ferrer, Alvaro J. | 05-2010 |
| Bulletin #14: What Impact Does Calculator Use Have On Test Results? | Calculators are commonly used in mathematics and science instruction. In fact, over 20 years ago, two studies commissioned by the College Board (Kupin & Whittington, 1988; Pfeiffenberger & Zolandz, 1989) indicated that, at that time, nearly all math and science college instructors permitted use of calculators for all types of course work and at least some tests | Wolfe, Edward W. | 05-2010 |
| Thoughts on Linking and Comparing Assessments of Common Core Standards | The purpose of this paper is to discuss the types of comparisons that can and cannot be made among students who take different assessments supposedly developed to measure a single set of standards. | Lazer, Stephen
Mazzeo, John
Way, Walter D.
Twing, Jon S.
Camara, Wayne
Sweeney, Kevin | 05-2010 |
| Performance of Ability Estimation Methods for Writing Assessments under Conditions of Multidimensionality | An increasing number of large scale assessments contain constructed response items such
as essays for the advantages they offer over traditional multiple-choice measures. Writing
assessments in particular often contain a mixture of multiple-choice and essay items. These
mixed-format assessments pose many technical challenges for psychometricians. This study
directly builds upon the Meyers et al. (2009) study by investigating how ability estimation, essay
scoring approach, measurement model, and proportion of points allocated to multiple choice
items and the essay item on mixed-format assessments interact to recover ability and item
parameter estimates under different degrees of multidimensionality. | Meyers, Jason L.
Turhan, Ahmet
Fitzpatrick, Steven J. | 05-2010 |
| What Item Writers Think When Writing Items: Towards A Theory OF Item Writing Expertise | The study of expert item writers offers the possibility of “bottling” the knowledge and skills acquired by these experts over years of hard work. The descriptions of the identified conceptual knowledge and skills of expert item writers could be incorporated into item writing workshops in order to equip new item writers with the tools necessary to produce quality figural response items. | Fulkerson, Dennis
Nichols, Paul
Mittelholtz, David | 05-2010 |
| Running Head: Predicting ELP A Multi-level Modeling Approach to Predicting Performance on a State ELA Assessment | The purpose of this study was to examine on a State English Language Proficiency Examination for grades K-12 (a) the performance of students in low SES environments vs. high SES environments as measured by school Title I participation, (b) the performance of males vs. females, (c) the effect of ethnicity( Hispanic vs. non-Hispanic students), and (d) any interaction effects. | Brown, Raymond S.
Nguyen, T.
Stephenson, A. | 05-2010 |
| Comparisons of Test Characteristic Curve Alignment Criteria of the Anchor Set and the Total Test: Maintaining Test Scale and Impacts on Student Performance | The current paper investigates a tenet of the traditional view on the psychometric
characteristics of such anchor sets. Specifically, the traditional guideline, without any specificity, states that the test characteristic curve (TCC) of the anchor set and the total test should be closely overlapped. | Karkee, Thakur B., Ph. D
Fatica, Kevin
Murphy, Stephen T., Ph. D. | 05-2010 |
| Running Head: IMPACT OF DIFFERENT ANCHOR STABILITY METHODS
The Impact of Different Anchor Stability Methods on Equating Results and Student Performance | The key objective of this study is to demonstrate a methodological procedure or
strategy for examining the different anchor stability procedures and the accompanying
results and to evaluate the impact on the final RSSS tables and reported cut scores (i.e.,
performance levels). For our study we did not include the bivariate plots for the old and
new parameter values. | Murphy, Stephen
Little, Ian
Fan, Meichu
Lin, Chow-Hong
Kirkpatrick, Rob | 05-2010 |
| Improving the Post-Smoothing of Test Norms with Kernel Smoothing | The traditional methodology of apost-smoothing to develop norms used on educational
and clinic products is to hand-smooth the scale scores or their distributions. This approach is
very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the
scores or distributions are adjusted based on personal judgment. Different persons, or same
person at different times, will make significantly different judgments. By contrast, the kernel
smoothing method is a nonparametric approach, which is more flexible, less subjective, and
easier to replicate. | Lin, Anli Yi, Qing Young, Michael J. | 05-2010 |
| The Modified Briefing Book Standard Setting Method:
Using Validity Data as a Basis for Setting Cut Scores | This paper focuses on two aspects of the modified briefing book standard setting process
developed to meet this need: 1) the validity research conducted to support the standard
setting; and 2) the standard setting itself, through which the validity research and
associated pertinent information was organized and presented to the panelists, and
resulting process through which these data were used to elicit cut score judgments. | Miles, Julie A. Beimers, Jennifer N. Way, Walter D. | 05-2010 |
| Impact of Non-representative Anchor
Items on Scale Stability | This study attempts to fill this gap by simulating item response data over
multiple administrations under the common-item nonequivalent groups design and
examining the effects of non-representative anchor items on scale stability. | Wei, Hua | 05-2010 |
| The Hazards of Newness: A Portrait of Challenges Faced by New High School English Teachers | This paper reports findings of a survey study designed to examine how high school English
teachers are assigned to teach particular grades and track levels, whether these teachers have
their own classrooms, and how they and their students perceive one another. | Bieler, Deborah Holmes, Stephen Wolfe, Edward W. | 05-2010 |
| IRT Proficiency Estimators and Their Impact | In the current study, we further examined the statistical properties of the various
IRT estimators, especially focusing on their practical impact on the reported scores. We
4
also investigated a few practical scenarios, where the testing focus is on assessing college
readiness, assessing students’ minimal competency, or providing estimates for students
who have failed a previous exam (retesters). | Tong, Ye Kolen, Michael J. | 05-2010 |
| Correlates of Mathematics Achievement in Developed and Developing Countries: An HLM Analysis of TIMSS 2003 Eighth-grade Mathematics Scores | The purpose of this study was to investigate correlates of math achievement in both developed and developing countries. Specifically, two developed countries and two developing countries that participated in the TIMSS 2003 eighth-grade math assessment were selected for this study. For each country, contextual factors at both the student and the teacher/school levels were used to construct Correlates of Math Achievement 3 models that yield country-specific findings related to students’ math performance. | Phan, Ha Sentovich, Christina Kromrey, Jeffrey Dedrick, Robert Ferron, John | 05-2010 |
| AutoCorreleation in the COFM. The effects of Autocorrelation on the Curve-of-factors Growth Model | This simulation study examined the performance of the curve-of-factors model (COFM) when autocorrelation and grwth processes were present in the first-level factor sturcture. In addition to the standard curve-of-factors growth model, two new models were examined: one COFM that included a first-order autoagressive atuocorrelation parameter, and a second model that included first-order autoregressive and voving average autocorrelation parameters. | Murphy, Daniel J.
Beretvas, S Natasha
Pituch, Keenan A | 05-2010 |
| Distractor Rationale Taxonomy: Diagnostic Assessment of Reading with Ordered Multiple-Choice Items | The distractor rataionale taxonomy (DRT) examined in this study is an understanding-level-driven distractor analysis system for multiple-choice items. The DRT purposely creates distrators at different comprehension levels to pinpoint sources of misunderstanding. | Lin, Jie
Lee Chu, Kwang
Meng, Ying | 05-2010 |
| Designing and Operating a Common High School Assessment System | This paper attempts to lay out some of the important issues to be faced by states and consortia as they consider implementation of a high school assessment system within the current Race to the Top framework. | Camara, Wayne
Sweeney, Kevin
Twing, Jon S.
Way, Walter D.
Lazer, Stephen
Mazzeo, John | 04-2010 |
| Informing Design Patterns Using Research on Item Writing Expertise (Large-Scale Assessment Technical Report 9) | This technical report presents a study in which verbal reports from expert item writers were collected and analyzed. Findings from the study are used to suggest modifications to the development of design patterns. Note: This PDF is downloaded directly from ECDLarge. | Fulkerson, Dennis
Nichols, Paul | 03-2010 |
| Leveraging Evidence-Centered Design in Large-Scale Test Development (Large-Scale Assessment Technical Report 4) | This report depicts ECD as a series of integrated layers describing an assessment design process that includes analyzing and modeling domains, specifying arguments in terms of student, task and evidence models, and implementing the assessment and executing operational processes. Note: This PDF is downloaded directly from ECDLarge. | Fulkerson, Dennis
Nichols, paul
in collaboration with industry peers | 03-2010 |
| Narrative Structures in the Development of Scenario-Based Science Assessments (Large-Scale Assessment Technical Report 3) | A study was conducted to determine if the explication of Narrative Structures in storyboard development improves the quality and efficiency of storyboard writing. Research and evaluative findings suggest that Narrative Structure recognition and use may aid in the storyboard writing process. Note: This PDF is downloaded directly from ECDLarge. | Fulkerson, Dennis
Nichols, Paul
In collaboration with industry peers | 03-2010 |
| Bulletin #13: Comparability of Computerized Adaptive and Paper-Pencil Tests | When a traditional Paper-Pencil Test (PPT) is delivered by computer, two types of computerization can be implemented. One is a linear Computer-Based Test (CBT) in which the paper version of the test is presented and administered via computers. The other type of computerization is the Computerized Adaptive Testing (CAT) in which not only the medium of administration changes from paper to computer but also the test delivery algorithm turns from linear to adaptive. | Hong Wang, University of Pittsburgh
Chingwei David Shin, Pearson | 03-2010 |
| Research Services Quarterly Newsletter - Vol 3, No 1, 2010 | | | 03-2010 |
| Bulletin #12: What is a learning progression? | Learning progressions describe in words and examples what it means to move over time toward more expert understanding. This bulletin discusses research, assessment development, and the Pearson Foundation support for ongoing efforts relating to learning progressions. | Nichols, Paul D. | 02-2010 |
| Some Considerations Related to the Use of Adaptive Testing for the Common Core Assessments | In this paper ETS, Pearson, and the College Board discuss some important considerations related to the use of adaptive testing within a common core assessment system, particularly as used for summative purposes. | Camara, Wayne
Lazer, Stephen
Mazzeo, John
Sweeny, Kevin
Twing, Jon S.
Way, Walter D.
| 02-2010 |
| Recommendations Related to the Operational Implementation of Performance Assessments Within Ohio’s K-12 Assessment System | The purpose of this paper was to provide discussion and recommendations related to the operational implementation of performance assessments within Ohio’s assessment system. | Burling, Kelly Shasby
Dolan, Robert P.
Frank, Jeri
Full, David
LaMarche, Wesley E.
Nichols, Paul
Niyogi, Nivedita Shilpi
Rogahn, Kurt
Vickers, Daisy
Way, Walter “Denny”
Williams, Natasha J. | 01-2010 |
| Bulletin #11: What is a Balanced Assessment System? | Under President Obama’s education reform agenda, the concept of a balanced assessment system has
received increasing attention. | Nichols, Paul D. | 01-2010 |
| Research Services Quarterly Newsletter - Vol 2, No 4, 2009 | | | 12-2009 |
| Bulletin #10: Methods of Comparability Studies for Computerized and Paper-Based Tests | In recent years, tests have begun being administered by computer. | Wan, Lei
Keng, Leslie
McClarty, Katie
Davis, Laurie | 12-2009 |
| Deriviation of a Profile Reliability Index for an Individual: A Multi-Factor Congeneric Approach with Guttnam Error Type Structures | The paper discusses results and proposes research to substantiate current supporting evidenc for the operational use of the profile reliability approach | Arce-Ferrer, Alvaro J. | 11-2009 |
| Bulletin #9: Computer-Based & Paper-Pencil Test Comparability Studies | In some testing applications, Computer-Based Test (CBT) delivery is gaining popularity over the traditional Paper-
Pencil-Test (PPT) delivery due to the several potential advantages that it offers, such as immediate scoring and reporting of results, more | Wang, Hong Shin, Chingwei David | 11-2009 |
| Bulletin #8: The Use of Separate Answer Sheets | As early as 1932, separate answer sheets for standardized tests were hailed as a financial boon for educational testing. | Ragland, Shelley | 11-2009 |
| Research Services Quarterly Newsletter - Vol 2, No 3, 2009 | | | 09-2009 |
| Bulletin #7: Online Scorer Training | Online Scorer Training Increasingly, technology is being employed to improve the effectiveness and efficiency of delivery, scoring, and reporting of largescale assessments. | Wolfe, Edward W., PhD | 08-2009 |
| Research Services Quarterly Newsletter - Vol 2, No 2, 2009 | | | 07-2009 |
| A Comparison of Training & Scoring in Distributed & Regional Contexts—Reading | This study examined the influence of rater training and scoring context on the following outcomes: (a) training time, (b) scoring time, (c) qualifying rate, (d) quality of ratings, and (e) rater perceptions. | Wolfe, Edward W. | 07-2009 |
| A Comparison of Training & Scoring in Distributed & Regional Contexts—Writing | This study examined the influence of rater training and scoring context on the following outcomes: (a) training time, (b) scoring time, (c) qualifying rate, (d) quality of ratings, and (e) rater perceptions. | Wolfe, Edward W. | 07-2009 |
| Strategies and Processes for Developing Innovative Items in Large-Scale Assessments | In this paper we describe processes for developing high-quality innovative items in the context of large-scale assessments. | Strain-Seymour, Ellen Way, Walter "Denny" Dolan, Robert P. | 06-2009 |
| A Design Pattern for Observational Investigation Assessment Tasks (Large-Scale Assessment Technical Report 2) | Drawing on research development in assessment design, this paper provides a design pattern to help assessment designers create tasks assessing students’ complex scientific reasoning skills in observational investigation.Note: This PDF is downloaded directly from ECDLarge. | Fulkerson, Dennis
Nichols, Paul
In collaboration with industry peers | 05-2009 |
| PADI Online Assessment Design System and Minnesota Science Assessment Glossary of Terms (Large-Scale Assessment Technical Report 1) | This paper presents a Glossary of terms about features of the Application of Evidence-Centered Design to State Large-Scale Science Assessment project, which is partnering with the state of Minnesota. Note: This PDF is downloaded directly from ECDLarge. | Fulkerson, Dennis
In collaboration with industry peers | 05-2009 |
| Weighted Penalty Model for Content Balancing in CAT | This research report proposes a new model called the Weighted Penalty Model (WPM) for content balancing in computer adaptive testing. | Chien, Yuehmei Shin, Chingwei David Swanson, Len Way, Walter Denny | 04-2009 |
| Growth, Precision, and CAT: An Examination of Gain Score Conditional SEM | Monitoring the growth of student learning is a critically important component of modern education. Such growth is typically monitored using gain scores representing differences between two testing occasions, such as prior to and following a year of instruction. | Thompson, Tony D. | 12-2008 |
| Growth, Precision, and CAT: An Examination of Gain Score Conditional SEM | Measurement of student growth is an important topic for K-12 state testing programs, both in terms of school accountability as well as for reporting progress of individual students. | Thompson, Tony D. | 06-2008 |
| Review of Student Growth Models Used by States | Summary: After Secretary Spellings announced the United States Department of Education (USDE) growth pilot program in 2005, nine states have been approved to use student growth in their calculations of Adequate Yearly Progress (AYP): | O'Malley, Kimberly | 06-2008 |
| Effects of Different Training and Scoring Approaches on Human Constructed Response Scoring | This paper summarizes and discusses research studies related to the human scoring of constructed response items that have been conducted recently at a large scale testing company. | Nichols, Paul Vickers, Daisy Way, Walter D. | 04-2008 |
| Person-fit of English Language Learners (ELL) in K-12 High-Stakes Assessments | The No Child Left Behind Act holds states using federal funds accountable for student academic achievement. | Wan, Lei Wu, Brad | 04-2008 |
| Bulletin #6: What Role Does the Consequences of Testing Play in Validity? | Currently, the field of educational measurement appears to have reached broad consensus that validity is a judgment of the degree to which arguments support the interpretations and uses of test scores (Kane, 2006). | Nichols, Paul D. Williams, Natasha | 04-2008 |
| Maintaining Score Equivalence as Tests Transition Online: Issues, Approaches and Trends | The purpose of this paper is to summarize a number of studies that Pearson has conducted with K-12 state departments of education using a particular analysis method referred to as Matched Samples Comparability Analyses (MCSA). | Kong, Jadie
Lin, Chow-Hong
Way, Walter D. | 03-2008 |
| Evidence of Test Score Use in Validity: Roles and Responsibilites | This paper has three goals. | Nichols, Paul D. Williams, Natasha | 03-2008 |
| Score Reporting, Off-the-Shelf Assessments and NCLB: Truly and Unholy Trinity | One consequence resulting from NCLB, particularly as instructional time becomes more precious, is the desire to be more efficient in assessing learning. | Twing, Jon S., PhD | 03-2008 |
| Applying a User-Centered Design Approach to Data Management: Paer and Computer Testing | This paper discusses the application of a user-centered design (UCD) approach to a web-based application system that supports data management components of the high-stakes assessment lifecycle. | Wilson, Jeffrey R., PhD | 03-2008 |
| Maintenance of Vertical Scales | Vertical scaling refers to the process of placing scores of tests that measure similar domains but at different educational levels onto a common scale, a vertical scale. | Kolen, Michael J. Ye, Tong | 03-2008 |
| User-Centered Assessment Design | In this paper, we introduce user-centered assessment design (UCAD), an approach to test design intended to produce assessments that deliver to teachers the kind of complex information on student learning and knowledge that they can combine with sound pedagogical practice to improve student achievement. | Adams, Jeremy Mittelholtz, David Nichols, Paul Van Duesen, Robert | 03-2008 |
| A Tale of Two Modes: A Case Study in User-centered Design’s Role in Comparability and Construct Validity | Introduction: UCD’s Role within User-centered Assessment Design One merit of user-centered assessment design (UCAD) as defined by Nichols et al (2008) is its broadened view of test development. | Strain-Seymour, Ellen, PhD | 03-2008 |
| A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment Data | Equating is a statistical process that is used to adjust scores on test forms so that scores on the forms can be used interchangeably (Kolen & Brennan, 2004), even though the test forms consist of different items. | Tong, Ye Wu, Sz-Shyan Xu, Ming | 03-2008 |
| Field Testing and Equating Designs for State Educational Assessments | The educational accountability movement has spawned unprecedented numbers of new assessments. For example, the No Child Left Behind Act of 2002 (NCLB) required states to test students in grades 3 through 8 and at one grade in high school each year. | Kirkpatrick, Rob Way, Walter D. | 03-2008 |
| An Investigation of the Changes in Item Parameter Estimates for Items Re-field Tested | Large-scale state testing programs typically rely upon a large bank of items to select from when building assessments. | Kong, Xiaojing Jadie McClarty, Katie Larsen Meyers, Jason L. | 03-2008 |
| Perspective™-Integrated Assessment and Instructional Resources System | The Learning Locator is the mechanism that connects students with appropriate learning materials based on their assessment performance. | Meyers, Jason L. Nichols, Paul Shin, David | 03-2008 |
| Usability and Design Considerations for Computer-based Learning and Assessment | The overall success of computer-based products and systems is dependent to a significant extent on their usability and usefulness in the intended context. | Adams, Jeremy Harms, Michael | 03-2008 |
| The Validity Case for Assessing Direct Writing by Computer | Technology continues to provide opportunities for changing how teachers give instruction and how students learn. | Davis, Laurie L., Ph.D. Strain-Seymour, Ellen, Ph.D. Way, Walter D., Ph.D. | 01-2008 |
| Bulletin #5: What is Formative Assessment? | What is formative assessment? Looking across the evolution of the term "formative assessment," the common thread is that a formative assessment is defined by more than the assessment itself. | Burling, Kelly; Meyers, Jason; Nichols, Paul D. | 01-2008 |
| Incremental Validity of Numerical Reasoning over Critical Thinking | This study was conducted to evaluate the incremental validity of numerical reasoning over critical thinking in predicting job performance and overall potential as measured by supervisors’ ratings. | Ejiogu, Kingsley C. Rose, Mark Trent, John Yang, Zhiming | 08-2007 |
| Evidence of Test Score Use in Validity: Roles and Responsibilities | This paper has three goals. | Nichols, Paul D. | 08-2007 |
| Bulletin #4: Alternate Assessment - the 1% Rule; Modified Assessment - the 2% Rule | Since the 2001 reauthorization of the Elementary and Secondary Education Act of 1965, commonly known as No Child Left Behind (NCLB), states must include all students in public school in the statewide accountability system, and are accountable for the achievement of all students. | Burling, Kelly | 06-2007 |
| A Comparison of Methods of Estimating Subscale Scores for Mixed-Format Test | Because the world is complex and resources are often limited, test scores often serve to both rank individuals and provide diagnostic feedback (Wainer, Vevea, Camacho, Reeve, Rosa, Nelson, Swygert, and Thissen, 2000). | Shin, David | 05-2007 |
| Bulletin #3: Griddable Items: Beyond Multiple Choice | Multiple-choice test items along with "fill-in-the-bubble" answer sheets are mainstay formats in testing�formats that are prevalent for practical reasons. | Nichols, Paul; PhD | 03-2007 |
| Bulletin #2: Quality Assurance in Essay Scoring | Applying a score to a student essay is much different than scoring a math test or multiple choice vocabulary quiz. | Twing, Jon S., PhD Vickers, Daisy | 10-2006 |
| An Empirical Investigation of Growth Models | With the recent legislation of NCLB, there has been an increasing interest to measure students’ growth over the course of their schooling. | O'Malley, Kimberly Tong, Ye | 08-2006 |
| Bulletin #1: Universal Design | Developers of large-scale assessments have, for quite some time, stressed the need for participation of populations with unique educational needs, varying cultural experiences, diverse linguistic backgrounds, and numerous special needs. | Harms, Michael, PhD Nichols, Paul, PhD Walsh, Chris | 06-2006 |
| Adolescent/Adult Sensory Profile | The Adolescent/Adult Sensory Profile enables clients from 11 through 65+ years to use a Self-Questionnaire for evaluating their behavioral responses to everyday sensory experiences. | | 06-2006 |
| Understanding the Relationship Between Critical Thinking and Job Performance | This study was conducted to evaluate the relationship between a measure of critical thinking ability and job performance as measured by supervisors’ ratings. | Ejiogu, Kingsley C. Rose, Mark Trent, John Yang, Zhiming | 05-2006 |
| Practical Questions in Introducing Computerized Adaptive Testing for K-12 Assessments | In this paper, a number of practical questions related to introducing CAT for K-12 assessments are discussed. | Way, Walter D. | 04-2006 |
| Score Comparability of Online and Paper Administrations of the Texas Assessment of Knowledge and Skills | The comparability studies presented in this paper illustrate how responsible and psychometrically defensible comparability analyses can be incorporated within the constraints of a high-stakes, operational testing program like TAKS. | Fitzpatrick, Steven
Laughlin Davis, Laurie
Way, Walter D. | 04-2006 |
| Miller Analogies Research Report | Once a mainstay of college admission tests, many educators have come to see the analogy format as obsolete, something like a manual typewriter in the age of word processing and text messaging. | Meagher, Don | 03-2006 |
| Understanding Analogies | Once a mainstay of college admission tests, many educators have come to see the analogy format as obsolete, something like a manual typewriter in the age of word processing and text messaging. | Meagher, Don, EdD | 03-2006 |
| Administering Alternate Assessments | In the United States, a series of federal laws have been enacted that require all students with disabilities to be included in state accountability assessments. | Kreusel, Sheree | 12-2005 |
| A Primer on Assessing the Visually Impaired | The No Child Left Behind Act of 2001 (NCLB)—which is the most recent reauthorization of the Elementary and Secondary Education Act of 1965 (ESEA)—requires that in order to get federal funds, states must hold schools and districts accountable for the achievement of their students as measured by standardized achievement tests. | Case, Betsy J., PhD Jeffries, Janis L. Zucker, Sasha | 12-2005 |
| Accommodations for the Deaf | Raising academic standards for all students and measuring student achievement to hold schools accountable for educational progress are central strategies for promoting educational excellence and equity in our schools. | Case, Betsy J., PhD | 11-2005 |
| Accountability and Educational Progress for Students with Disabilities | The No Child Left Behind Act of 2001 (NCLB), the most recent reauthorization of the Elementary and Secondary Education Act of 1965 (ESEA), has significantly impacted state educational systems and local school districts. | Case, Betsy J., PhD | 09-2005 |
| The Age of Accountability | With the goal of supporting global education, Pearson Inc. (Pearson) was a sponsoring agency of the 2005 China-U.S. Conference on Educational Assessment held in Beijing. | Case, Betsy J., PhD | 09-2005 |
| Transadaptation | In recent decades, the nation's classrooms have seen an increase in the number of students who are not native speakers of English, a group referred to by the No Child Left Behind Act of 2001 (NCLB) as students with limited English proficiency (LEP). | Alaniz, Linda G. Guzman, Luis Miska, Margarita Zucker, Sasha | 09-2005 |
| Inclusive Design for Maximum Accessibility: A Practical Approach to Universal Design | The purpose of this article is to briefly review the literature related to Universal Design for Learning (UDL) and Universal Design for Assessment (UDA), and outline an approach for combining these two philosophies in evaluating large-scale assessment programs. | Hanna, Elizabeth I. | 08-2005 |
| Recent Trends in Comparability Studies | The purpose of this paper is to review the research addressing the comparability of computer-delivered tests and pencil-and-paper tests, and particularly the research since 1993. | Paek, Pamela | 08-2005 |
| Comparing Standards-based Item Banks and Pre-built Tests for Classroom Assessment | The current era of accountability and standards-based reform in education has had many effects on teachers’ approaches to instruction in the classroom. Recently, the importance of assessment has been bolstered by the No Child Left Behind Act of 2001 (NCLB). | Pearson, Inc. | 08-2005 |
| Curriculum Narrowing | The current era of education reform in the United States can be traced to the passage of the Elementary and Secondary Education Act of 1965 (ESEA), which, among its provisions, required states to monitor and assess the educational progress of students. | King, Kelly V. Zucker, Sasha | 08-2005 |
| Systematic Feedback for More Effective Teaching and Learning | The value of taking a test is in getting the results back quickly and in a manner that communicates them clearly. | Jorgensen, Margaret A; Ph.D. | 08-2005 |
| Aligning ELP Assessments to ELP Standards | Intensified attention to alignment between state English language proficiency (ELP) assessments and state ELP standards has primarily been driven by the requirements of the No Child Left Behind Act of 2001 (NCLB). | Johnson, Diane F. | 07-2005 |
| Early Mathematics and EMDA™ | With the advent of standards-based reform and the No Child Left Behind Act of 2001 (NCLB), educational systems are being held accountable to high levels of student achievement as never before. | Pearson, Inc. | 07-2005 |
| Horizontal and Vertical Alignment | Alignment is typically understood as the agreement between a set of content standards and an assessment used to measure those standards. | Case, Betsy, PhD Zucker, Sasha | 07-2005 |
| Methodologies for Alignment | Alignment can be broadly defined as the degree to which the components of an education system work together to achieve the desired goals of stakeholders. | Case, Betsy; PhD Zucker, Sasha | 07-2005 |
| CELF-4/WISC-IV Technical Report | Children who participated in research studies with the Wechsler Intelligence Scales for Children-Fourth Edition Integrated (Wechsler, Kaplan, Fein, Kramer, Morris, Delis, & Maerlender; 2004) were administered the Clinical Evaluation of Language Fundamentals-Fourth Edition (Semel, Wiig, & Secord; 2003). | | 06-2005 |
| Boehm-3 | The Boehm Test of Basic Concepts, Third Edition (Boehm-3) is a group administered assessment for students in kindergarten through second grade. | Boehm, Ann E., Ph.D. | 06-2005 |
| Boehm-3 Preschool | The Boehm Test of Basic Concepts, Third Edition (Boehm-3) is a group administered assessment for students in kindergarten through second grade. | Boehm, Ann E., Ph.D. | 06-2005 |
| Bracken | The Bracken Basic Concept Scale—Revised (BBCS–R) is used to assess the basic concept development of children ages 2 years, 6 months through 7 years, 11 months. | Bracken, Bruce A. | 06-2005 |
| WIAT-II Technical Report: Interpreting Performance on the Reading Comprehension Subtest | Interpreting performance on the Reading Comprehension subtest of the Wechsler Individual Achievement Test®–Second Edition (WIAT®–II, Update 2005) is challenging for some examiners, particularly when a student must reverse to a preceding item set. | Breaux, Kristina C., Ph.D. | 06-2005 |
| School Function Assessment | School professionals recognize that effective school performance depends on a student’s ability to perform a variety of functional tasks that enable him or her to participate in various learning activities. | Coster, Wendy Deeney, Theresa Haley, Stephan Haltiwanger, Jane | 06-2005 |
| Sensory Profile | The Sensory Profile™ provides a standard method for professionals to measure the sensory processing abilities of children 5 to 10 years old (separate cut scores for 3 and 4 year olds are provided in the manual) and to profile the effects of sensory processing on functional performance in the children’s daily lives. | Dunn, Winnie | 06-2005 |
| SCAN-A | SCAN-A: A Test for Auditory Processing Disorders in Adolescents and Adults enables professionals to obtain central auditory test results in approximately 20 minutes for adolescents and adults. | Keith, Robert W. | 06-2005 |
| SCAN-C | The SCAN-C Test for Auditory Processing Disorders in Children-Revised is an individually administered test used to identify children between ages 5 years, 0 months and 11 years, 11 months who have auditory processing disorders. | Keith, Robert W. | 06-2005 |
| WAIS-III Technical Report: Response to Flynn | In Tethering the Elephant: Capital Cases, IQ, and the Flynn Effect (Flynn, 2006), Dr. Flynn states that the WAIS-III© standardization sample is substandard and a 2.34 point adjustment to the FSIQ score is required in post conviction capital murder cases | Weiss, Lawrence G., Ph.D. | 06-2005 |
| PLS-4 Technical Report | The Preschool Language Scale , Fourth Edition (PLS-4) is an individually administered test for identifying children from birth through 6 years, 11 months who have a language disorder or delay. | Pond, Roberta Evatt, M.A. Steiner, Violette G., B.S. Zimmerman, Irla Lee, Ph.D. | 06-2005 |
| RBANS Supplement #1 Report | This supplement provides * subtest means and SDs for the normal standardization sample, * comments on general issues in interpreting performance on the RBANS, * additional information on test-retest interpretation, * further information on “cortical–subcortical deviation” scores, and * updated clinical validity information. | Randolph, Christopher, Ph.D. | 06-2005 |
| Bayley-III Technical Report 1 | The Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III; Bayley, 2006) is designed to measure the developmental status of young children, ages 1 to 42 months. | | 06-2005 |
| Bayley-III Technical Report 2 | The Bayley Scales of Infant and Toddler Development–Third Edition (Bayley–III; Bayley, 2006) measures cognitive, language, motor, social-emotional, and adaptive development and is a revision of its predecessor, the Bayley Scales of Infant Development—Second Edition (BSID–II; Bayley, 1993). | | 06-2005 |
| CELF-4 Technical Report | The Clinical Evaluation of Language Fundamentals®–Fourth Edition (CELF®–4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language disorder or delay. | | 06-2005 |
| DELV Technical Report | The Diagnostic Evaluation of Language Variation (DELV) family of assessments is unique because it is the only language assessment series that accounts for the diversity in American English and identifies children who are at risk for or show signs of a speech or language disorder. | | 06-2005 |
| Infant Toddler Sensory Profile | The Infant/Toddler Sensory Profile provides a standard method for professionals to measure a child’s sensory processing abilities and to profile the effect of sensory processing on functional performance in the child’s daily life. | | 06-2005 |
| ABAS Technical Supplement | The Adaptive Behavior Assessment System (ABAS; Harrison&Oakland, 2000) uses a behavior-rating format to assess adaptive behavior and related skills for individuals 5 through 89 years of age. | | 06-2005 |
| ECHOS Technical Data Report | The Early Childhood Observation System (ECHOS) is a web-based, ongoing observational assessment tool for children in Pre-Kindergarten (ages 3-5) and Kindergarten through Grade 2. | | 06-2005 |
| Early Reading and the Early Reading Diagnostic Assessment, Second Edition (ERDA Second Edition) | For virtually all students, learning to read and write begins long before kindergarten. It is a complex and dynamic process. | Jordan, Dr. R. Rosalie King, Kelly Kirk, David J. | 04-2005 |
| Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Partial Credit Model | Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). | Dodd, Barbara G. (University of Texas at Austin) Laughlin David, Laurie | 03-2005 |
| Evidence for the Interpretation and Use of Scores from an Automated Essay Scorer | This paper examined validity evidence for the scores based on the Intelligent Essay Assessor (IEA), an automated essay-scoring engine developed by Pearson Knowledge Technologies. | Nichols, Paul | 03-2005 |
| WISC-IV Technical Report #4: GAI | This technical report is the fourth in a series intended to introduce the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV;Wechsler, 2003). | Coalson, Diane, Ph.D. Raiford, Susan E., Ph.D. Rolfhus, Eric, Ph.D. Weiss, Lawrence G., Ph.D | 01-2005 |
| WISC-IV Technical Report #4.1: GAI with Canadian Norms | This technical report is the fourth in a series intended to introduce the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV;Wechsler, 2003). | Coalson, Diane, Ph.D. Raiford, Susan E., Ph.D. Rolfhus, Eric, Ph.D. Saklofske, Donald H., Ph.D. Weiss, Lawrence G., Ph.D Zhu, Jianjun, Ph.D. | 01-2005 |
| Alignment in Educational Assessment | In the context of education, alignment can be broadly defined as the degree to which the components of an education system--such as standards, curricula, assessments, and instruction--work together to achieve desired goals (Ananda, 2003; Resnick, Rothman, Slattery, and Vranek, 2003; Webb, 1997b). | Case, Betsy J., PhD Jorgensen, Margaret A., PhD Zucker, Sasha | 12-2004 |
| MicroCog Assessment of Cognitive Functioning | MicroCog™: Assessment of Cognitive Functioning (MicroCog) is a computer-administered cognitive screening instrument that was originally developed for a disk operating system (DOS). | Drozdick, Lisa Holdnack, James Lane, Andre | 12-2004 |
| Toward Complete Assessment | In the United States, there is a wide range of educational assessment systems used for many different purposes. | Jorgensen, Margaret A, Ph.D. Zucker, Sasha | 11-2004 |
| Value Added | “Value-added,” a term originally used in business and economics, has become widely used to describe certain educational assessment and accountability systems. | | 11-2004 |
| Augmentation | Since the introduction of the Stanford Achievement Test in 1923—the first of its kind—large-scale standardized norm-referenced assessments have served a variety of important purposes in education. | Zucker, Sasha Christensen, Ray Ellis, Roy T. Harris, Herb Manning, Duane | 08-2004 |
| A Performance Comparison of Native and Non-native Speakers of English on an English Language Proficiency Test | According to the U.S. Census Bureau, the population of the United States grew by approximately 12.5 percent from 1995 to 2000. Crawford (1997) reports that language diversity has increased dramatically throughout the nation. | Jiao, Hong Stephenson, Agnes, Ph.D. Wall, Nathan | 08-2004 |
| SDRT 4/SDMT 4 Administration Mode Comparability Study | The widespread availability of computers in schools has focused the assessment community on the use of computer-based testing solutions in the classroom. | Brooks, Thomas E., Ph.D. Wang, Shudong; Ph.D. Young, Michael J.; Ph.D. | 08-2004 |
| The Distractor Rationale Taxonomy: Enhancing Multiple-Choice Items in Reading and Mathematics | Recent education reform legislation, especially the No Child Left Behind Act of 2001 (NCLB), has highlighted discussion concerning the relationship between assessment and classroom instruction. | Gardner, Doug A. Jorgensen, Margaret A., PhD King, Kelly V. Zucker, Sasha | 07-2004 |
| Effective Schools | Why do some public schools that educate students from disadvantaged backgrounds make a difference while others fail? | Jones, Terry L. Kirk, David J. | 07-2004 |
| Online or Paper: Does Delivery Affect Results? | The Stanford Diagnostic Reading Test, Fourth Edition (SDRT 4) and Stanford Diagnostic Mathematics Test, Fourth Edition (SDMT 4) were adapted to a computer-based, online format in 2003. | Wang, Shudong, PhD | 06-2004 |
| Accountability and Educational Progress: Including Students with Disabilities and/or Culturally and Linguistically Diverse Students | Just as the People’s Republic of China has its nine-year Compulsory Education Law of 1985, the United States has a pre-eminent law as well. The U.S. law is known as the No Child Left Behind Act of 2001 (NCLB). | Case, Betsy J., PhD | 06-2004 |
| Alternate Assessments for Students with Significant Cognitive Disabilities | Accountability systems are based on measuring the progress of all students. The reform movements and current laws are designed to ensure that all students have opportunities to learn to high standards. | Almond, Patricia J., PhD Case, Betsy J., PhD | 06-2004 |
| New Visions New Futures | Self-determination for students with disabilities began to get federal attention in the mid-1980s. | Case, Betsy J., Ph.D. | 06-2004 |
| Administration Practices for Standardized Assessments | Pearson Inc. (Pearson) develops and distributes a variety of assessments for educational and clinical purposes. To meet the goal of producing highly valid, reliable results for test users, each of these products is developed according to strict guidelines. | Zucker, Sasha Galindo, Margarita Grainger, Elaine Severance, Nancy | 04-2004 |
| Scientifically Based Research | A significant aspect of the No Child Left Behind Act of 2001 (NCLB) is the use of the phrase "scientifically based research" well over 100 times throughout the text of the law. | Zucker, Sasha | 03-2004 |
| The Standards-Referenced Interpretive Framework: Using Assessments for Multiple Purposes | From the outset of the development of a large scale assessment system, a central activity in the design process concerns determining the assessment’s interpretive framework—the way that its results are understood to convey meaningful information. | Young, Michael J., PhD Zucker, Sasha | 03-2004 |
| The Value of the Stanford Scale as a Common Metric | As states and school districts report on the adequate yearly progress (AYP) of student performance as required by the No Child Left Behind Act of 2001 (NCLB), there will be increasing emphasis on documenting student progress along a developmental continuum within selected subject areas. | Jorgensen, Margaret A., PhD | 03-2004 |
| Assessing Young Children | Today’s educational climate of standards and accountability extends even to preschool programs (Bowman, Donovan, and Burns, 2001). | Case, Betsy J., PhD Guddemi, Marcy; PhD | 02-2004 |
| Cognitive Labs | A cognitive lab is a method of studying the mental processes one uses when completing a task such as solving a mathematics problem or interpreting a passage of text. | Case, Betsy J., Ph.D. Sassman, Christy Zucker, Sasha | 02-2004 |
| Fundamentals of Standardized Testing | Tests are a familiar part of classroom instruction. Each week, teachers use a wide variety of tests, such as spelling tests, mathematics pop quizzes, and essay tests. | Zucker, Sasha | 12-2003 |
| Assessing English Language Proficiency: Using Valid Results to Optimize Instruction | The No Child Left Behind Act of 2001 (NCLB) has focused increased attention on the appropriate assessment of English language learners (ELL students) in U.S. public schools. | Johnson, Diane F. Jorgensen, Margaret A., Ph.D. Stephenson, Agnes, Ph.D. Young, Michael J., Ph.D. | 11-2003 |
| Establishing Performance Levels for the Stanford English Language Proficiency Test (Stanford ELP) | The Stanford English Language Proficiency Test (Stanford ELP) measures the English proficiency of students in kindergarten through grade 12 whose first language is not English. | Stephenson, Agnes, Ph.D. | 10-2003 |
| Academic & Social English for ELL Students | To better identify the AL that should be assessed with an ELP test, we need to refocus on the main purpose of an ELP test: to assess students' general English language ability. | Johnson, Diane F. | 09-2003 |
| History of the No Child Left Behind Act of 2001 (NCLB) | In August of 1981, the National Commission on Excellence in Education was chartered under the authority of 20 U.S.C. 1233a to, among other purposes and functions, "review and synthesize the data and scholarly literature on the quality of learning and teaching in the nation's schools, colleges, and universities, both public and private, with special concern for the educational experience of teen-age youth" (U.S. Department of Education, 1983a). | Hoffmann, Jenny Jorgensen, Margaret A., PhD | 08-2003 |
| Maximizing Equity and Access in Test Construction | President Bush’s education agenda (The White House, 2001) reinforced what some states were already addressing in their policies aimed at improving education. | Emrick Massad, Carolyn, Ph.D. | 07-2003 |
| Universal Design | The concept of universal design has its roots in the field of architecture. | Case, Betsy, PhD | 06-2003 |
| DAS Technical Report: Ability-Achievement Discrepancy Table for use with the WIAT-II | The Wechsler Individual Achievement Test®—Second Edition (WIAT–II; Psychological Corporation, 2002) and the Differential Ability Scales® (DAS; Elliott, 1990) were administered to 100 children, 6 to 16 years old. | Diehl, Kim Elliott, Colin O'Donnell, Louise Rolfhus, Eric Weiss, Larry | 06-2003 |
| Timed Versus Untimed Testing Conditions and Student Performance | The adherence to fixed time limits during group administrations of norm-referenced standardized tests has been an accepted practice from the early part of the 20th century (Anastasi, 1976; pp. 32-34). | Brooks, Thomas E., PhD Case, Betsy J., PhD Young, Michael J., PhD | 06-2003 |
| Augmentation: An Implementation Strategy for the No Child Left Behind Act of 2001 | With the passage of the No Child Left Behind Act of 2001 (NCLB), states have the opportunity to redefine and enhance their assessment programs to support the diagnostic needs of students, educators, and parents. Pearson Education, Inc. (Pearson) has developed a strategy that combines the best features of norm-referenced and standards-based test designed to ensure that states comply with the annual reporting requirements of NCLB. | Hicks-Herr, Stacy Hoffmann, Jenny Jorgensen, Margaret A, PhD | 06-2003 |
| WISC-IV Technical Report #1: Theoretical Model and Test Blueprint | This technical report is the first in a series intended to introduce the Wechsler Intelligence Scale for Children - Fourth Edition (WISC-IV). | Rolfhus, Eric, Ph.D Weiss, Lawrence G., Ph.D Williams, Paul E., Psy.D. | 06-2003 |
| WISC-IV Technical Report #2: Psychometric Properties | This technical report is the second in a series intended to introduce the Wechsler Intelligence Scale for Children - Fourth Edition (WISC-IV). | Rolfhus, Eric, Ph.D Weiss, Lawrence G., Ph.D Williams, Paul E., Psy.D. | 06-2003 |
| WISC-IV Technical Report #3: Clinical Validity | This is the third in a series of technical reports on WISC-IV. | Rolfhus, Eric, Ph.D Weiss, Lawrence G., Ph.D Williams, Paul E., Psy.D. | 06-2003 |
| Calculator Use on Stanford Series Mathematics Tests | Calculator use has, over the years, become increasingly integrated into mathematics instruction and testing. | Brooks, Thomas E., Ph.D. Case, Betsy J., Ph.D. Cerrillo, Tracy, Ph.D. Severance, Nancy Wall, Nathan Young, Michael J., Ph.D. | 05-2003 |
| It's About Time: Stanford Achievement Test Series, Tenth Edition (Stanford 10) | Pearson decided to make the Stanford Achievement Test Series, Tenth Edition (Stanford 10) an untimed test for several compelling reasons. | Case, Betsy J., PhD | 04-2003 |
| Accommodations for METROPOLITAN8 | The trend toward the inclusion of students with disabilities and limited English proficient (LEP) students in state and district-wide assessment programs became a requirement in 1997 with the reauthorization of the Individuals with Disabilities Act (IDEA). | Case, Betsy J., PhD | 03-2003 |
| Accommodations for Stanford 9 | For more than 25 years, federal law has guaranteed a free and appropriate public education to children with disabilities. | Case, Betsy J., PhD Slawski, Edward J. | 03-2003 |
| The New Norm-Referenced Test Model | Traditionally, there have been distinct differences in the purpose, use and development guidelines for a norm-referenced test (NRT) and for a criterionreferenced test (CRT). | Jorgensen, Margaret A., PhD McBee, Maridyth, PhD | 03-2003 |
| Color Blindness | Color vision is determined by the discrimination of three qualities of color: hue (such as red vs. green), saturation (that is, pure vs. blended colors), and brightness (that is, vibrant vs. dull reflection of light) (Arditi, 1999a). | Case, Betsy J., PhD | 02-2003 |
| Accommodations on Stanford 10 for Limited English Proficient (LEP) Students | Title I of the Elementary and Secondary Education Act (ESEA) was amended by the No Child Left Behind Act (NCLB) in January 2002. Under NCLB, all students are to be included in the measurement of progress toward state achievement standards. | Case, Betsy J., PhD | 02-2003 |
| Accommodations on Stanford 10 for Students with Disabilities | Requirements for including all students with disabilities (SWD) in assessments stem from a number of federal laws, including Section 504 of the Vocational Rehabilitation Act of 1973 (Section 504), Title II of the Americans with Disabilities Act of 1990 (ADA), Title I of the Elementary and Secondary Education Act (ESEA), and the Individuals with Disabilities Education Act of 1997 (IDEA). | Case, Betsy J., PhD | 02-2003 |
| Test and Answer Document Design and Layout The New Norm-Referenced Test Model | Pearson Inc. (Pearson) drew from a variety of sources to create and validate its innovative test and answer document design for the Stanford 10 Assessment Series (Stanford 10). | Case, Betsy J., PhD | 02-2003 |
| Exploring the Use of Item Bank Information to Improve IRT Item Parameter Estimation | On occasion, the sample of students available for calibrating a set of assessment items may not be optimal. | Ansley, Timothy Hall, Erika | |
| WISC-IV Technical Report #5: WISC-IV and Children's Memory Scale | | Drozdick, Lisa W. Holdnack, James Rolfhus, Eric Weiss, Larry | |
| Response Probability Criterion and Subgroup Performance | In the standard setting literature, there has been much debate about the most appropriate response probability (RP) to use in an item mapping procedure such as the Bookmark Standard Setting Procedure. | Egan, Karla Mueller, Canda D. Schneider, M. Christina | |
| WISC-IV Technical Report #7: Extended Norms | | Cayton, Tom, PhD Gabel, Amy, PhD Weiss, Larry, PhD Zhu, Jianjun, PhD | |
| A Comparison of Item and Testlet Selection Procedures in Computerized Adaptive Testing | Testlet response theory (TRT) is a measurement model that can capture local dependency in testlet-based tests. | Chen, Tzu-An Ann Dodd, Barbara G. Ho, Tsung-Han Keng, Leslie | |
| WISC-IV Technical Report #6: Using the Cognitive Proficiency Index in Psychoeducational Assessment | The Cognitive Proficiency Index (CPI) summarizes performance on the WISC-IV working memory and processing speed indices in a single score. | Gabel, Amy Dilworth, Ph.D Weiss, Lawrence G., PhD | |
| ABAS-Second Edition | The Adaptive Behavior Assessment System-Second Edition (ABAS-II) provides a comprehensive norm-referenced assessment of the adaptive skills of individuals ages birth to 89 years. | Harrison, Patti L. Oakland, Thomas | |
| DEAP Technical Report | The Diagnostic Evaluation of Articulation and Phonology, U.S. Edition (DEAP) is a comprehensive, individually administered, norm-referenced battery designed to provide differential diagnoses of speech disorders in children ages 3.0-8.11 years. | Crosbie, Sharon Dodd, Barbara Holm, Alison Ozanne, Anne | |
| A Generalization of Stratified α that Allows for Correlated Measurement Errors between Subtests | This paper presents a generalization of Stratified α that allows for correlated measurement errors between some subtest scores that make up a composite score. | Keng, Leslie Miller, G. Edward O'Malley, Kimberly Turhan, Ahmet | |
| The VIP (Validity Indicator Profile) Test in Court | VIP users may be called upon to present their clinical findings and conclusions, either in person or by report, to a judge, jury, disability board, or agency that makes decisions about remunerative awards. | | |
| Research Services Quarterly Newsletter - Vol 1, No 1, 2008 | | | |
| Research Services Quarterly Newsletter - Vol 1, No 2, 2008 | | | |
| Research Services Quarterly Newsletter - Vol 1, No 3, 2008 | | | |
| Research Services Quarterly Newsletter - Vol 2, No 1, 2009 | | | |