Pearson
Always Learning

Research

Research

Professional Certification Exams


Pearson's research publications for educators, parents, students, researchers and policy makers. Sort by Title, Author, and Date. Click any document below to view in PDF format.

Get Acrobat Reader

The following research papers require the use of Adobe® Acrobat® in order to view and/or print them. If you do not have Acrobat installed on your computer, it can be downloaded free of charge from the Adobe Web site. Just click on the button to the right.

Find Research:

TitleAbstractAuthor(s)Date
Bulletin #25: College Readiness IndicatorsThis paper outlines current student-level indicators at the high school and middle school levels that predict college success. In this bulletin, indicators are divided into three categories: assessment scores (e.g., SAT® exam scores), transcript attributes (e.g., course rigor), and additional indicators (e.g., attendance) that impact achievement.Cromwell, Ashley M.
Larsen McClarty, Katie
Larson, Sarah J.
05-2013
Bulletin #24: Learning Progressions: An Overview of Current Validation MethodsLearning progressions represent a set of skills or pieces of knowledge ordered sequentially from least to most complex. This sequence can guide instruction as well as assessment content. This paper describes several useful methods for validating learning progressions including validating the relationship between the progression and student knowledge as well as between the progression and associated assessments.Soto, Amanda C.
Taylor, Melinda A.
05-2013
Bulletin #23: What is College and Career Readiness? State Requirements for High School Graduation and State Public University AdmissionsThis paper compares the minimum requirements for high school graduation in each state with admission requirements for the state’s main (or “flagship”) university campus. In 80% of the states, the high school graduation requirements do not meet the minimum standards necessary for admission to their own state universities.Conforti, Peter A.05-2013
Bulletin #22: What is College and Career Readiness? A Summary of State DefinitionsThis paper outlines each state’s definition of college and career readiness and shows whether they associate with the Common Core State Standards Initiative (CCSSI), ACT College and Career Readiness System (ACCRS), or both. The general definition that emerges can not only facilitate interstate discussions on multiple levels, but also provide schools, districts, educators, students, and other stakeholders with clear objectives to effectively prepare students for postsecondary endeavors.Conforti, Peter A.05-2013
A Universal Design for Learning-based Framework for Designing Accessible Technology-Enhanced AssessmentsThe increased capabilities offered by digital technologies offer new opportunities to evaluate students’ deeper knowledge and skills and on constructs that are difficult to measure using traditional methods. Such assessments can also incorporate tools and interfaces that improve accessibility for diverse students, as well as inadvertently introduce new accessibility barriers. Designing these technology-enhanced tasks according to universal design principles is one way to address these accessibility concerns, but requires a grounded understanding of students’ diverse abilities and the ways they interact with the tasks. A thorough consideration of the factors that impact construct validity, with an emphasis on identifying and eliminating sources of construct-irrelevant variance, is essentialDolan, Robert P.
Burling, Kelly
Harms, Michael
Strain-Seymour, Ellen
Way, Walter (Denny)
Rose, David H.
04-2013
Computer-based Assessment of Collaborative Problem-Solving Skills: Human-to-Agent versus Human-to-Human ApproachCollaborative problem solving (CPS) is a critical competency for college and career readiness. Students emerging from schools into the workforce and public life will be expected to have CPS skills as well as the ability to perform that collaboration in various group compositions and environments. However, structuring standardized computer-based assessment of CPS skills, specifically for large-scale assessment programs, is challenging. The aim of this study was to explore patterns in student CPS performance and motivation in human-to-agent (H-A) settings compared to human-to-human (H-H) settings. One hundred seventy-nine 14-year-old students from the United States, Singapore, and Israel participated in the study.Rosen, Yigal, Ph.D.
Tager, Maryam
03-2013
Halo Effects and Analytic Scoring: A Summary of Two Empirical StudiesThis document summarizes the results of a research study that examines rater halo and how much unique information is provided by multiple analytic scores. Specifically, that study investigated whether unique information is provided by analytic scores assigned to student writing beyond what is depicted by holistic scores and to what degree multiple analytic scores assigned by a single rater display evidence of a halo effect. The authors analyze scores assigned to middle-school student responses to an expository writing prompt that were assigned by six groups of raters—four groups assigned single analytic scores, one group assigned multiple analytic scores, and one group assigned holistic scores. The results suggest that there is evidence of a halo effect when raters assign multiple analyticLai, Emily R.
Wolf, Edward W.
Vickers, Daisy H.
03-2013
An Adaptive-within-Testlet Item Selection Method with Both Testlet Level and Test Level Content Balancing in CATThe purpose of this study is to propose a heuristic item selection procedure, which selects a testlet and a subset within the selected testlet to be administered with consideration of content balancing and being adaptive within the testlet.Chien, Yuehmei Shin
Chingwei, David
01-2013
Research Services Quarterly Newsletter - Vol 5, No 3  11-2012
Halo Effects and Analytic Scoring: A Summary of Two Empirical StudiesWe address the issue of whether unique information is provided by analytic scores assigned to student writing, beyond what is depicted by holistic scores, and to what degree multiple analytic scores assigned by a single rater display evidence of a halo effect.Lai, Emily R.
Wolf, Edward W.
Vickers, Daisy H.
11-2012
Where is the Value in Value-Added Modeling?This paper first provides an overview of value-added modeling (VAM), including definitions and descriptions of three general types of value-added models. The paper then summarizes the rationales for and against incorporating VAM into an accountability framework. Finally, we provide a recommendation that VAM estimates be used in conjunction with other measures to form a composite.Dan Murphy10-2012
Online Scoring vs. Materials Scoring for Portfolio Assessments: An Exploration of Score StabilityThe study is designed to investigate whether scores assigned to portfolio submissions are comparable between materials-based scoring and online scoring conditions, and to evaluate how scorers perceive the ease of using the online scoring platform and in facilitating the scoring process.Hua Wei10-2012
Establishing an Evidence-based Validity Argument for Performance AssessmentRecent initiatives have proposed to use performance tasks in ambitious new ways, including monitoring student growth and evaluating teacher effectiveness.Lai, Emily R.
Wei, Hua
Hall, Erika L.
Fulkerson, Dennis
09-2012
Responses to Claims Raised by Walter Stroup

In response to claims made by Walter Stroup about the Texas statewide assessments, experts from Pearson's assessment team serving the Texas program authored this brief to enumerate the flaws in Dr. Stroup's conclusions, as well as highlight the strengths of Texas' system of standards and assessments.

For media inquiries, contact Susan Aspey at communications@pearsoned.com or 800-745-8489

Test, Measurement & Research Services08-2012
Research Services Quarterly Newsletter - Vol 5, No 2, 2012  07-2012
A Literature Review of Gaming in EducationThis research report provides an overview of the theoretical and empirical evidence behind five key claims about the use of digital games in education.McClarty, Katie Larsen
Orr, Aline
Frey, Peter M.
Dolan, Robert P.
Vassilev, Victoria
McVay, Aaron
06-2012
Value-Added Models in the Evaluation of Teacher Effectiveness: A Comparison of Models and OutcomesThis study compared five value-added models and illustrated the impact of model choice on the estimates of teacher effectiveness.Wei, Hua
Hembry, Tracey
Murphy, Daniel L.
McBride, Yuanyuan
06-2012
A Comparison of Three Content Balancing Methods for Fixed and Variable Length Computerized Adaptive TestsThe purpose of this study is to compare the WPM method to the WDM method under various conditions including the simple and complicated content constraint structure, different CAT settings such as item pool, item exposure control specification, and theta estimation options for both fixed- and variable-length CAT tests.Shin, Chingwei David
Chien, Yuehmei
Way, Walter Denny
04-2012
Improving Text Complexity Measurement through the Reading Maturity MetricThe purposes of this paper are to describe how Word Maturity has been incorporated into Pearson’s text complexity measure, to present initial comparisons between this new measure of text complexity and traditional readability measures, and to address measurement issues in the development and use of text complexity measurements.Landauer, Tom
Way, Walter D.
04-2012
The Case for Performance-Based Tasks without EquatingThis paper proposes a model for performance-based assessments that assumes random selection of performance-based tasks (PBTs) from a large pool, and that assumes tasks are comparable without equating PBTs.Way, Walter D.
Murphy, Daniel
Powers, Sonya
Keng, Leslie
04-2012
Assessing 21st Century Skills: Integrating Research FindingsThis paper synthesizes research evidence pertaining to several so-called 21st century skills: critical thinking, creativity, collaboration, metacognition, and motivation.Lai, Emily R.
Viering, Michaela
04-2012
Creating Curriculum-Embedded, Performance-Based Assessments for Measuring 21st Century Skills in K-5 StudentsThis paper will share the author’s experiences working with a large and diverse school district to design curriculum-embedded, performance-based assessments (PBAs) that measure 21st century skills in K-5 students.Lai, Emily R.04-2012
Research Services Quarterly Newsletter - Vol 5, No 1, 2012  04-2012
Linking Two Assessment Systems Using Common-Item IRT Method and Equipercentile Linking MethodWhen states move from one assessment system to another, it is often necessary to establish a concordance between the two assessments for accountability purposes. The purpose of this study is to model two alternative approaches to transitioning performance standards, both of which can be executed using data from regularly scheduled operational administrations.Kirkpatrick, Rob
Turhan, Ahmet
Lin, Jie A
04-2012
The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL ModelThis study investigates the impact of IPC in the context of operational testing programs that employ the 3PL model, alternative equating procedures, and different item re-use policies.Meyers, Jason L.
Murphy, Stephen
Goodman, Joshua
Turhan, Ahmet
04-2012
Putting Ducks in a Row: Methods for Empirical Alignment of Performance ScoringUsing historical state data, this report evaluates nine different methods of aligning performance standards and discusses the effects of selecting different methods as well as the potential implications for interpretations of student progress and school success.McClarty, Katie Larsen
Murphy, Daniel
Keng, Leslie
Turhan, Ahmet
Tong, Ye
04-2012
Population Invariance of Vertical Scaling ResultsIn this report, the population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3–10 and a state mathematics test spanning grades 3–8.Powers, Sonya
Turhan, Ahrmet
Binici, Salih (Florida State University)
04-2012
Connecting English Langage Learning and Academic Performance: A Prediction StudyThe purpose of this study was to investigate the use of English language proficiency and academic reading assessment scores to predict the future academic success of English learner (EL) students.Kong, Jadie
Powers, Sonya
Starr, Laura
Williams, Natasha
04-2012
Bulletin #21: Evidence Based Standard Setting: Establishing Cut Scores by Integrating Research Evidence with Expert Content JudgmentsIn this bulletin, we describe the processes and practices associated with Evidence Based Standard Setting, which draw directly from the concept of evidence-based medicine.Beimers, Jennifer N.
Way, Walter D.
McClarty, Katie Larsen
Miles, Julie A.
01-2012
Research Services Quarterly Newsletter - Vol 4, No 4, 2012  01-2012
Pearson Feedback: Development of Open Technology StandardsThis report includes Pearson’s feedback to the recent post by the U.S. Department of Education (ED) concerning the development of open technology standards for managing and delivering student assessments and assessment results.”Jon S. Twing, PhD11-2011
Research Services Quarterly Newsletter - Vol 4, No 3, 2011  10-2011
Overview of Student Growth ModelsThis paper provides an overview of student growth modeling and describes how states use student growth models in the federal accountability system.O'Malley, Kimberly J., Ph.D.
Murphy, Stephen, Ph.D.
Larsen McClarty, Katie, Ph.D.
Murphy, Daniel, Ph.D.
McBride, Yuanyuan, Ph.D.
09-2011
Research Services Quarterly Newsletter - Vol 4, No 2, 2011  07-2011
Critical Thinking: A Literature ReviewCritical thinking includes the component skills of analyzing arguments, making inferences using inductive or deductive reasoning, judging or evaluating, and making decisions or solving problems.Lai, Emily R.06-2011
Collaboration: A Literature ReviewCollaboration is the “mutual engagement of participants in a coordinated effort to solve a problem together.”Lai, Emily R.06-2011
Pearson's Text Complexity MeasurePearson's Knowledge Technologies group has developed a new measure of text complexity that is fundamentally different from current readability measures.Landauer, Thomas K.05-2011
Bulletin #20: Performance-based Assessment: Some New Thoughts on an Old IdeaThe purpose of this bulletin is to review arguments in favor and arguments against the use of performance-based assessments to assess student achievement in light of proposed test score usesLai, Emily R.05-2011
Top Ten: Transitioning English Language Arts AssessmentThis document is designed to identify the top ten considerations that states and consortia will need to address as they plan the transition of their ELA assessments.Becker, Delise
Bay-Borelli, Michael
Brinkerhoff, Lee
Crain, Kellie
Davis, Laurie
Fuhrken, Charles
Hartmann, Tiffany
Larkin, Jay
O’Malley, Kimberly
Trevvett, Suzanne
05-2011
Pearson's Automated Scoring of Writing, Speaking, and MathematicsThis document describes several examples of current item types that Pearson has designed and fielded successfully with automatic scoring.Streeter, Lynn
Bernstein, Jared
Foltz, Peter
DeLand, Donald
05-2011
Impact of Group Differences on Equating Accuracy and the Adequacy of Equating AssumptionsThis study compared four curvilinear equating methods including frequency estimation, chained equipercentile, IRT true score, and IRT observed score equating.Powers, Sonya04-2011
Motivation: A Literature ReviewMotivation refers to reasons that underlie behavior that is characterized by willingness and volition.Lai, Emily R.04-2011
Comparing Methods for Detecting Unstable Anchor Items with Net DIF and Global DIF ConceptionsThis study is to compare different approaches for detecting misbehavior anchor items in IRT equating using Rasch and partial credit models.Lau, C. Allen
Arce, Alvaro J.
04-2011
Expanding the Model of Item-Writing Expertise: Cognitive Processes and Requisite Knowledge StructuresIn this paper, we expand the cognitive model of item writing to not only include cognitive processes but to also include requisite knowledge structures used by item writers.Fulkerson, Dennis (Pearson)
Nichols, Paul (Center for Assessment)
Snow, Eric (SRI International)
04-2011
Bulletin #19: Making Sense of the Metrics: Student Growth, Value-added Models and Teacher EffectivenessThe goal of this bulletin is to define student growth, value-added models, and teacher effectiveness, three terms often confused.O'Malley, Kimberly
McClarty, Katie
Magda, Tracey
Burling, Kelly
04-2011
Metacognition: A Literature ReviewMetacognition is defined most simply as "thinking about thinking."Lai, Emily R.04-2011
Through-Course Common Core Assessments in the United States: Can Summative Assessment Be Formative?In this paper, we present a design for enhancing the formative uses of summative through-course assessments.Way, Walter D.
Larsen McClarty, Katie
Murphy, Dan
Ken, Leslie
Fuhrken, Charles
04-2011
Research Services Quarterly Newsletter - Vol 4, No 1, 2011  04-2011
Application of Latent Trait Models to Identifying Substantively Interesting RatersThis study demonstrates how existing latent trait modeling procedures can identify groups of raters who may be of substantive interest to those studying the experiential, cognitive, and contextual aspects of ratings.Wolfe, Edward W.
McVay, Aaron
04-2011
Investigating Content and Construct Representation of a Common-item Design When Creating a Vertically Scaled TestThis study investigated how well the guideline of content and construct representation was maintained while evaluating two stability assessment criteria (Robust z and 0.3-logit difference).Hardy, M. Assunta (BYU)
Young, Michael J. (Pearson)
Yi, Qing (Pearson)
Sudweeks, Richard R. (BYU)
Bahr, Damon L. (BYU)
04-2011
Statistical Properties of 3PL Robust Z: An Investigation with Real and Simulated Data SetsThe purpose of this paper was to inspect statistical properties of the robust z approach in the context of 3PL equating with the common item non-equivalent group design.Arce, Alvaro J.
Lau, C. Allen
04-2011
Comparison of Asymptotic and Bootstrap Item Fit Indices in Identifying Misfit to the Rasch ModelIn this study, our results indicate that bootstrap critical values allow for greater statistical power in diagnosing item misfit caused by varying item slopes and lower asymptotes.Wolfe, Edward W.
McGill, Michael T.
04-2011
Does Size Matter? A Study on the Use of Netbooks in K-12 Assessments.In this paper, we analyze a study conducted during the spring 2010 administration of the Texas End-of-Course (EOC) assessments to evaluate the feasibility of using netbooks in the context of K-12 assessments.King, Leslie
Kong, Xiaojing Jadie
Bleil, Bryan
04-2011
Investigating Common-Item Screening Procedures in Developing a Vertical ScaleCreating a vertical scale involves several decisions on assessment designs and statistical analyses to determine the most appropriate vertical scale.Johnson, Marc
Yi, Qing
04-2011
Considerations for Performance Scoring When Designing and Developing Next Generation AssessmentsThis white paper explores the interactions between test design and scoring approach, and the implications for performance scoring quality, cost, and efficiency in next generation assessments.Jones, Marianne
Vickers, Daisy
03-2011
Cognitive Lab Evaluation of Innovative Items in Mathematics and English Language Arts Assessment of Elementary, Middle, and High School StudentsThis research report examines a study in which a set of prototype items were developed to align with specific Common Core State Standards and administered to students in a series of cognitive labs. The report details results and offers implications and recommendations for future use.Dolan, Robert P.
Goodman, Joshua
Strain-Seymour, Ellen
Adams, Jeremy
Sethuraman, Sheela
03-2011
Assessment Technology StandardsPearson’s response to the United States Department of Education’s Request for Information.Twing, Jon01-2011
Research Services Quarterly Newsletter - Vol 3, No 4, 2011  01-2011
Considerations For Developing Test Specifications For Common Core AssessmentsThe purpose of this paper is to describe the role that test specifications play in supporting the development of valid and reliable large-scale summative academic achievement assessments.Bay-Borelli, Michael
Rozunick, Christine
Way, Walter "Denny"
Weisman, Eric
12-2010
Bulletin #18: An Investigation of an Assessment-Centered Learning Environment with Formative UseLearning environments centered on assessments provide opportunities for feedback that yield information with potential benefit for improving learning and instruction.Arce, Alvaro J.12-2010
Thoughts on an Assessment of Common Core StandardsETS, Pearson, and the College Board have collaborated in this paper to raise key assessment design questions and discuss some ideas for a systematic high-level assessment design that satisfies many of the needs expressed by stakeholders.Camara, Wayne
Lazer, Stephen
Mazzeo, John
Sweeney, Kevin
Twing, Jon
Way, Walter "Denny"
10-2010
Research Services Quarterly Newsletter - Vol 3, No 3, 2010  10-2010
Rater Effects as a Function of Rater Training ContextThis study examined the influence of rater training and scoring context on the manifestation of rater effects in a group of trained raters.Wolfe, Edward W.
McVay, Aaron
10-2010
A Cognitive Lab Report for the American Diploma Project Algebra I End-of-Course ExamThis cognitive lab study was an exploratory study that allowed for an in-depth investigation of students’ familiarity with ADP Algebra I Exam items as well as the strategies that students are engaging in when attempting to solve the problems.Test, Measurement & Research Services10-2010
A Cognitive Lab Report for the American Diploma Project Algebra II End-of-Course ExamThis cognitive lab study was an exploratory study that allowed for an in-depth investigation of students’ familiarity with ADP Algebra II exam items as well as the strategies that students are engaging in when attempting to solve the problems.Test, Measurement & Research Services10-2010
Next-Generation Assessment Interoperability StandardsThe intent of this document is to elevate awareness and understanding of the importance of assessment interoperability standards and to begin addressing the necessary evolution of these standards to support next-generation assessments.Dolan, Bob
Strain-Seymour, Ellen
Deokar, Ashman
Ostler, Wayne
10-2010
Universally Designed Computer Based Testing: UD-CBT GuidelinesThis report's table of contents is hyperlinked for your convenience. These Universally Designed Computer Based Testing Guidelines aim to help item and test developers understand the cognitive processes involved in interacting with different item stimuli and response methods and, thereby, help identify sources of construct irrelevant variance.Burling, Kelly
Dolan, Bob
Hanna, Elizabeth
Harms, Michael
Nichols, Amy
Strain-Seymour, Ellen
Way, Denny
In collaboration with CAST
10-2010
An Exploratory Teacher Survey Related to the American Diploma Project Algebra I End-of-Course ExamA survey-based exploratory study was conducted to better understand the gaps between curriculum and instruction, and what content knowledge is expected of students on the American Diploma Project (ADP) Algebra I End-of-Course Exam.Test, Measurement & Research Services09-2010
An Exploratory Teacher Survey Related to the American Diploma Project Algebra II End-of-Course ExamA survey-based exploratory study was conducted to better understand the gaps between curriculum and instruction, and what content knowledge is expected of students on the American Diploma Project (ADP) Algebra II End-of-Course Exam.Test, Measurement & Research Services09-2010
Bulletin #17: A Comparison of Distributed and Regional ScoringDistributed scoring provides access to a wider pool of readers than those that could be included through regional scoring alone, thereby allowing for a larger number of readers to be recruited and permitting greater selectivity in reader recruitment. This has the potential for increased efficiency in training time for readers and could facilitate shorter turnaround times in performance scoring, which would, in turn, shorten the time between test administration and the reporting of test scores.Keng, Leslie
Davis, Laurie L.
Ragland, Shelley
09-2010
American Diploma Project Algebra II End-of-Course Exam: Standard Setting Briefing BookThe Briefing Book includes an overview of 1) the American Diploma Project, 2) the ADP Algebra II End-of-Course Exam, 3) the standard setting process, and 4) the validity studies conducted to inform standard setting. Concurrent, cross sectional, and judgment studies are also included.Pearson07-2010
Research Services Quarterly Newsletter - Vol 3, No 2, 2010  07-2010
Bulletin #16: Pearson’s Automated ScoringPearson’s automated scoring technology, the Intelligent Essay Assessor (IEA), delivers fast, accurate, and valid assessment scores.Knowledge Technologies07-2010
Automated Scoring for the Assessment of Common Core StandardsThis paper discusses automated scoring as a means for helping to achieve valid and efficient measurement of abilities that are best measured by constructed-response (CR) items.Williamson, David M.
Bennett, Randy E.
Lazer Stephen
Bernstein, Jared
Foltz, Peter W.
Landauer, Thomas K.
Rubin, David P.
Way, Walter D.
Sweeney, Kevin
07-2010
Bulletin #15: Capturing Item Writers’ ExpertiseThe efficient development of quality innovative items may be hindered by inexperienced item writers who are not familiar with the challenges and nuances of innovative item types. The study of expert item writers offers the possibility of capturing and “bottling” the knowledge and skills acquired by these experts over years of hard work.Fulkerson, Dennis
Nichols Paul
06-2010
Investigating Approaches to Estimate an Individual's Strand/objective Score Profile Reliability: A Monte Carlo StudyThe paper studies performance of generalizability and classical test theory reliability approaches to estimate reliability of an individual's strand/objective score profile.Arce-Ferrer, Alvaro J.05-2010
Bulletin #14: What Impact Does Calculator Use Have On Test Results?Calculators are commonly used in mathematics and science instruction. In fact, over 20 years ago, two studies commissioned by the College Board (Kupin & Whittington, 1988; Pfeiffenberger & Zolandz, 1989) indicated that, at that time, nearly all math and science college instructors permitted use of calculators for all types of course work and at least some testsWolfe, Edward W.05-2010
Thoughts on Linking and Comparing Assessments of Common Core StandardsThe purpose of this paper is to discuss the types of comparisons that can and cannot be made among students who take different assessments supposedly developed to measure a single set of standards.Lazer, Stephen
Mazzeo, John
Way, Walter D.
Twing, Jon S.
Camara, Wayne
Sweeney, Kevin
05-2010
Performance of Ability Estimation Methods for Writing Assessments under Conditions of MultidimensionalityAn increasing number of large scale assessments contain constructed response items such as essays for the advantages they offer over traditional multiple-choice measures. Writing assessments in particular often contain a mixture of multiple-choice and essay items. These mixed-format assessments pose many technical challenges for psychometricians. This study directly builds upon the Meyers et al. (2009) study by investigating how ability estimation, essay scoring approach, measurement model, and proportion of points allocated to multiple choice items and the essay item on mixed-format assessments interact to recover ability and item parameter estimates under different degrees of multidimensionality.Meyers, Jason L.
Turhan, Ahmet
Fitzpatrick, Steven J.
05-2010
What Item Writers Think When Writing Items: Towards A Theory OF Item Writing ExpertiseThe study of expert item writers offers the possibility of “bottling” the knowledge and skills acquired by these experts over years of hard work. The descriptions of the identified conceptual knowledge and skills of expert item writers could be incorporated into item writing workshops in order to equip new item writers with the tools necessary to produce quality figural response items.Fulkerson, Dennis
Nichols, Paul
Mittelholtz, David
05-2010
Running Head: Predicting ELP A Multi-level Modeling Approach to Predicting Performance on a State ELA AssessmentThe purpose of this study was to examine on a State English Language Proficiency Examination for grades K-12 (a) the performance of students in low SES environments vs. high SES environments as measured by school Title I participation, (b) the performance of males vs. females, (c) the effect of ethnicity( Hispanic vs. non-Hispanic students), and (d) any interaction effects.Brown, Raymond S.
Nguyen, T.
Stephenson, A.
05-2010
Comparisons of Test Characteristic Curve Alignment Criteria of the Anchor Set and the Total Test: Maintaining Test Scale and Impacts on Student PerformanceThe current paper investigates a tenet of the traditional view on the psychometric characteristics of such anchor sets. Specifically, the traditional guideline, without any specificity, states that the test characteristic curve (TCC) of the anchor set and the total test should be closely overlapped.Karkee, Thakur B., Ph. D
Fatica, Kevin
Murphy, Stephen T., Ph. D.
05-2010
Running Head: IMPACT OF DIFFERENT ANCHOR STABILITY METHODS The Impact of Different Anchor Stability Methods on Equating Results and Student PerformanceThe key objective of this study is to demonstrate a methodological procedure or strategy for examining the different anchor stability procedures and the accompanying results and to evaluate the impact on the final RSSS tables and reported cut scores (i.e., performance levels). For our study we did not include the bivariate plots for the old and new parameter values.Murphy, Stephen
Little, Ian
Fan, Meichu
Lin, Chow-Hong
Kirkpatrick, Rob
05-2010
Improving the Post-Smoothing of Test Norms with Kernel SmoothingThe traditional methodology of apost-smoothing to develop norms used on educational and clinic products is to hand-smooth the scale scores or their distributions. This approach is very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the scores or distributions are adjusted based on personal judgment. Different persons, or same person at different times, will make significantly different judgments. By contrast, the kernel smoothing method is a nonparametric approach, which is more flexible, less subjective, and easier to replicate.Lin, Anli
Yi, Qing
Young, Michael J.
05-2010
The Modified Briefing Book Standard Setting Method: Using Validity Data as a Basis for Setting Cut ScoresThis paper focuses on two aspects of the modified briefing book standard setting process developed to meet this need: 1) the validity research conducted to support the standard setting; and 2) the standard setting itself, through which the validity research and associated pertinent information was organized and presented to the panelists, and resulting process through which these data were used to elicit cut score judgments.Miles, Julie A.
Beimers, Jennifer N.
Way, Walter D.
05-2010
Impact of Non-representative Anchor Items on Scale StabilityThis study attempts to fill this gap by simulating item response data over multiple administrations under the common-item nonequivalent groups design and examining the effects of non-representative anchor items on scale stability.Wei, Hua05-2010
The Hazards of Newness: A Portrait of Challenges Faced by New High School English TeachersThis paper reports findings of a survey study designed to examine how high school English teachers are assigned to teach particular grades and track levels, whether these teachers have their own classrooms, and how they and their students perceive one another.Bieler, Deborah
Holmes, Stephen
Wolfe, Edward W.
05-2010
IRT Proficiency Estimators and Their ImpactIn the current study, we further examined the statistical properties of the various IRT estimators, especially focusing on their practical impact on the reported scores. We 4 also investigated a few practical scenarios, where the testing focus is on assessing college readiness, assessing students’ minimal competency, or providing estimates for students who have failed a previous exam (retesters).Tong, Ye
Kolen, Michael J.
05-2010
Correlates of Mathematics Achievement in Developed and Developing Countries: An HLM Analysis of TIMSS 2003 Eighth-grade Mathematics ScoresThe purpose of this study was to investigate correlates of math achievement in both developed and developing countries. Specifically, two developed countries and two developing countries that participated in the TIMSS 2003 eighth-grade math assessment were selected for this study. For each country, contextual factors at both the student and the teacher/school levels were used to construct Correlates of Math Achievement 3 models that yield country-specific findings related to students’ math performance.Phan, Ha
Sentovich, Christina
Kromrey, Jeffrey
Dedrick, Robert
Ferron, John
05-2010
AutoCorreleation in the COFM. The effects of Autocorrelation on the Curve-of-factors Growth ModelThis simulation study examined the performance of the curve-of-factors model (COFM) when autocorrelation and grwth processes were present in the first-level factor sturcture. In addition to the standard curve-of-factors growth model, two new models were examined: one COFM that included a first-order autoagressive atuocorrelation parameter, and a second model that included first-order autoregressive and voving average autocorrelation parameters.Murphy, Daniel J.
Beretvas, S Natasha
Pituch, Keenan A
05-2010
Distractor Rationale Taxonomy: Diagnostic Assessment of Reading with Ordered Multiple-Choice ItemsThe distractor rataionale taxonomy (DRT) examined in this study is an understanding-level-driven distractor analysis system for multiple-choice items. The DRT purposely creates distrators at different comprehension levels to pinpoint sources of misunderstanding.Lin, Jie
Lee Chu, Kwang
Meng, Ying
05-2010
Designing and Operating a Common High School Assessment SystemThis paper attempts to lay out some of the important issues to be faced by states and consortia as they consider implementation of a high school assessment system within the current Race to the Top framework.Camara, Wayne
Sweeney, Kevin
Twing, Jon S.
Way, Walter D.
Lazer, Stephen
Mazzeo, John
04-2010
Informing Design Patterns Using Research on Item Writing Expertise (Large-Scale Assessment Technical Report 9)This technical report presents a study in which verbal reports from expert item writers were collected and analyzed. Findings from the study are used to suggest modifications to the development of design patterns. Note: This PDF is downloaded directly from ECDLarge.Fulkerson, Dennis
Nichols, Paul
03-2010
Leveraging Evidence-Centered Design in Large-Scale Test Development (Large-Scale Assessment Technical Report 4)This report depicts ECD as a series of integrated layers describing an assessment design process that includes analyzing and modeling domains, specifying arguments in terms of student, task and evidence models, and implementing the assessment and executing operational processes. Note: This PDF is downloaded directly from ECDLarge.Fulkerson, Dennis
Nichols, paul
in collaboration with industry peers
03-2010
Narrative Structures in the Development of Scenario-Based Science Assessments (Large-Scale Assessment Technical Report 3)A study was conducted to determine if the explication of Narrative Structures in storyboard development improves the quality and efficiency of storyboard writing. Research and evaluative findings suggest that Narrative Structure recognition and use may aid in the storyboard writing process. Note: This PDF is downloaded directly from ECDLarge.Fulkerson, Dennis
Nichols, Paul
In collaboration with industry peers
03-2010
Bulletin #13: Comparability of Computerized Adaptive and Paper-Pencil TestsWhen a traditional Paper-Pencil Test (PPT) is delivered by computer, two types of computerization can be implemented. One is a linear Computer-Based Test (CBT) in which the paper version of the test is presented and administered via computers. The other type of computerization is the Computerized Adaptive Testing (CAT) in which not only the medium of administration changes from paper to computer but also the test delivery algorithm turns from linear to adaptive.Hong Wang, University of Pittsburgh
Chingwei David Shin, Pearson
03-2010
Research Services Quarterly Newsletter - Vol 3, No 1, 2010  03-2010
Bulletin #12: What is a learning progression?Learning progressions describe in words and examples what it means to move over time toward more expert understanding. This bulletin discusses research, assessment development, and the Pearson Foundation support for ongoing efforts relating to learning progressions.Nichols, Paul D.02-2010
Some Considerations Related to the Use of Adaptive Testing for the Common Core AssessmentsIn this paper ETS, Pearson, and the College Board discuss some important considerations related to the use of adaptive testing within a common core assessment system, particularly as used for summative purposes.Camara, Wayne
Lazer, Stephen
Mazzeo, John
Sweeny, Kevin
Twing, Jon S.
Way, Walter D.
02-2010
Recommendations Related to the Operational Implementation of Performance Assessments Within Ohio’s K-12 Assessment SystemThe purpose of this paper was to provide discussion and recommendations related to the operational implementation of performance assessments within Ohio’s assessment system.Burling, Kelly Shasby
Dolan, Robert P.
Frank, Jeri
Full, David
LaMarche, Wesley E.
Nichols, Paul
Niyogi, Nivedita Shilpi
Rogahn, Kurt
Vickers, Daisy
Way, Walter “Denny”
Williams, Natasha J.
01-2010
Bulletin #11: What is a Balanced Assessment System?Under President Obama’s education reform agenda, the concept of a balanced assessment system has received increasing attention.Nichols, Paul D.01-2010
Research Services Quarterly Newsletter - Vol 2, No 4, 2009  12-2009
Bulletin #10: Methods of Comparability Studies for Computerized and Paper-Based TestsIn recent years, tests have begun being administered by computer.Wan, Lei
Keng, Leslie
McClarty, Katie
Davis, Laurie
12-2009
Deriviation of a Profile Reliability Index for an Individual: A Multi-Factor Congeneric Approach with Guttnam Error Type StructuresThe paper discusses results and proposes research to substantiate current supporting evidenc for the operational use of the profile reliability approachArce-Ferrer, Alvaro J.11-2009
Bulletin #9: Computer-Based & Paper-Pencil Test Comparability StudiesIn some testing applications, Computer-Based Test (CBT) delivery is gaining popularity over the traditional Paper- Pencil-Test (PPT) delivery due to the several potential advantages that it offers, such as immediate scoring and reporting of results, moreWang, Hong
Shin, Chingwei David
11-2009
Bulletin #8: The Use of Separate Answer SheetsAs early as 1932, separate answer sheets for standardized tests were hailed as a financial boon for educational testing.Ragland, Shelley11-2009
Research Services Quarterly Newsletter - Vol 2, No 3, 2009  09-2009
Bulletin #7: Online Scorer TrainingOnline Scorer Training Increasingly, technology is being employed to improve the effectiveness and efficiency of delivery, scoring, and reporting of largescale assessments.Wolfe, Edward W., PhD08-2009
Research Services Quarterly Newsletter - Vol 2, No 2, 2009  07-2009
A Comparison of Training & Scoring in Distributed & Regional Contexts—ReadingThis study examined the influence of rater training and scoring context on the following outcomes: (a) training time, (b) scoring time, (c) qualifying rate, (d) quality of ratings, and (e) rater perceptions.Wolfe, Edward W.07-2009
A Comparison of Training & Scoring in Distributed & Regional Contexts—WritingThis study examined the influence of rater training and scoring context on the following outcomes: (a) training time, (b) scoring time, (c) qualifying rate, (d) quality of ratings, and (e) rater perceptions.Wolfe, Edward W.07-2009
Strategies and Processes for Developing Innovative Items in Large-Scale AssessmentsIn this paper we describe processes for developing high-quality innovative items in the context of large-scale assessments.Strain-Seymour, Ellen
Way, Walter "Denny"
Dolan, Robert P.
06-2009
A Design Pattern for Observational Investigation Assessment Tasks (Large-Scale Assessment Technical Report 2)Drawing on research development in assessment design, this paper provides a design pattern to help assessment designers create tasks assessing students’ complex scientific reasoning skills in observational investigation.Note: This PDF is downloaded directly from ECDLarge.Fulkerson, Dennis
Nichols, Paul
In collaboration with industry peers
05-2009
PADI Online Assessment Design System and Minnesota Science Assessment Glossary of Terms (Large-Scale Assessment Technical Report 1)This paper presents a Glossary of terms about features of the Application of Evidence-Centered Design to State Large-Scale Science Assessment project, which is partnering with the state of Minnesota. Note: This PDF is downloaded directly from ECDLarge.Fulkerson, Dennis
In collaboration with industry peers
05-2009
Weighted Penalty Model for Content Balancing in CATThis research report proposes a new model called the Weighted Penalty Model (WPM) for content balancing in computer adaptive testing.Chien, Yuehmei
Shin, Chingwei David
Swanson, Len
Way, Walter Denny
04-2009
Growth, Precision, and CAT: An Examination of Gain Score Conditional SEMMonitoring the growth of student learning is a critically important component of modern education. Such growth is typically monitored using gain scores representing differences between two testing occasions, such as prior to and following a year of instruction.Thompson, Tony D.12-2008
Growth, Precision, and CAT: An Examination of Gain Score Conditional SEMMeasurement of student growth is an important topic for K-12 state testing programs, both in terms of school accountability as well as for reporting progress of individual students.Thompson, Tony D.06-2008
Review of Student Growth Models Used by StatesSummary: After Secretary Spellings announced the United States Department of Education (USDE) growth pilot program in 2005, nine states have been approved to use student growth in their calculations of Adequate Yearly Progress (AYP):O'Malley, Kimberly06-2008
Effects of Different Training and Scoring Approaches on Human Constructed Response ScoringThis paper summarizes and discusses research studies related to the human scoring of constructed response items that have been conducted recently at a large scale testing company.Nichols, Paul
Vickers, Daisy
Way, Walter D.
04-2008
Person-fit of English Language Learners (ELL) in K-12 High-Stakes AssessmentsThe No Child Left Behind Act holds states using federal funds accountable for student academic achievement.Wan, Lei
Wu, Brad
04-2008
Bulletin #6: What Role Does the Consequences of Testing Play in Validity?Currently, the field of educational measurement appears to have reached broad consensus that validity is a judgment of the degree to which arguments support the interpretations and uses of test scores (Kane, 2006).Nichols, Paul D.
Williams, Natasha
04-2008
Maintaining Score Equivalence as Tests Transition Online: Issues, Approaches and TrendsThe purpose of this paper is to summarize a number of studies that Pearson has conducted with K-12 state departments of education using a particular analysis method referred to as Matched Samples Comparability Analyses (MCSA).Kong, Jadie
Lin, Chow-Hong
Way, Walter D.
03-2008
Evidence of Test Score Use in Validity: Roles and ResponsibilitesThis paper has three goals.Nichols, Paul D.
Williams, Natasha
03-2008
Score Reporting, Off-the-Shelf Assessments and NCLB: Truly and Unholy TrinityOne consequence resulting from NCLB, particularly as instructional time becomes more precious, is the desire to be more efficient in assessing learning.Twing, Jon S., PhD03-2008
Applying a User-Centered Design Approach to Data Management: Paer and Computer TestingThis paper discusses the application of a user-centered design (UCD) approach to a web-based application system that supports data management components of the high-stakes assessment lifecycle.Wilson, Jeffrey R., PhD03-2008
Maintenance of Vertical ScalesVertical scaling refers to the process of placing scores of tests that measure similar domains but at different educational levels onto a common scale, a vertical scale.Kolen, Michael J.
Ye, Tong
03-2008
User-Centered Assessment DesignIn this paper, we introduce user-centered assessment design (UCAD), an approach to test design intended to produce assessments that deliver to teachers the kind of complex information on student learning and knowledge that they can combine with sound pedagogical practice to improve student achievement.Adams, Jeremy
Mittelholtz, David
Nichols, Paul
Van Duesen, Robert
03-2008
A Tale of Two Modes: A Case Study in User-centered Design’s Role in Comparability and Construct ValidityIntroduction: UCD’s Role within User-centered Assessment Design One merit of user-centered assessment design (UCAD) as defined by Nichols et al (2008) is its broadened view of test development.Strain-Seymour, Ellen, PhD03-2008
A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment DataEquating is a statistical process that is used to adjust scores on test forms so that scores on the forms can be used interchangeably (Kolen & Brennan, 2004), even though the test forms consist of different items.Tong, Ye
Wu, Sz-Shyan
Xu, Ming
03-2008
Field Testing and Equating Designs for State Educational AssessmentsThe educational accountability movement has spawned unprecedented numbers of new assessments. For example, the No Child Left Behind Act of 2002 (NCLB) required states to test students in grades 3 through 8 and at one grade in high school each year.Kirkpatrick, Rob
Way, Walter D.
03-2008
An Investigation of the Changes in Item Parameter Estimates for Items Re-field TestedLarge-scale state testing programs typically rely upon a large bank of items to select from when building assessments.Kong, Xiaojing Jadie
McClarty, Katie Larsen
Meyers, Jason L.
03-2008
Perspective™-Integrated Assessment and Instructional Resources SystemThe Learning Locator is the mechanism that connects students with appropriate learning materials based on their assessment performance.Meyers, Jason L.
Nichols, Paul
Shin, David
03-2008
Usability and Design Considerations for Computer-based Learning and AssessmentThe overall success of computer-based products and systems is dependent to a significant extent on their usability and usefulness in the intended context.Adams, Jeremy
Harms, Michael
03-2008
The Validity Case for Assessing Direct Writing by ComputerTechnology continues to provide opportunities for changing how teachers give instruction and how students learn.Davis, Laurie L., Ph.D.
Strain-Seymour, Ellen, Ph.D.
Way, Walter D., Ph.D.
01-2008
Bulletin #5: What is Formative Assessment?What is formative assessment? Looking across the evolution of the term "formative assessment," the common thread is that a formative assessment is defined by more than the assessment itself.Burling, Kelly;
Meyers, Jason;
Nichols, Paul D.
01-2008
Incremental Validity of Numerical Reasoning over Critical ThinkingThis study was conducted to evaluate the incremental validity of numerical reasoning over critical thinking in predicting job performance and overall potential as measured by supervisors’ ratings.Ejiogu, Kingsley C.
Rose, Mark
Trent, John
Yang, Zhiming
08-2007
Evidence of Test Score Use in Validity: Roles and ResponsibilitiesThis paper has three goals.Nichols, Paul D.08-2007
Bulletin #4: Alternate Assessment - the 1% Rule; Modified Assessment - the 2% RuleSince the 2001 reauthorization of the Elementary and Secondary Education Act of 1965, commonly known as No Child Left Behind (NCLB), states must include all students in public school in the statewide accountability system, and are accountable for the achievement of all students.Burling, Kelly06-2007
A Comparison of Methods of Estimating Subscale Scores for Mixed-Format TestBecause the world is complex and resources are often limited, test scores often serve to both rank individuals and provide diagnostic feedback (Wainer, Vevea, Camacho, Reeve, Rosa, Nelson, Swygert, and Thissen, 2000).Shin, David05-2007
Bulletin #3: Griddable Items: Beyond Multiple ChoiceMultiple-choice test items along with "fill-in-the-bubble" answer sheets are mainstay formats in testing�formats that are prevalent for practical reasons.Nichols, Paul; PhD03-2007
Bulletin #2: Quality Assurance in Essay ScoringApplying a score to a student essay is much different than scoring a math test or multiple choice vocabulary quiz.Twing, Jon S., PhD
Vickers, Daisy
10-2006
An Empirical Investigation of Growth ModelsWith the recent legislation of NCLB, there has been an increasing interest to measure students’ growth over the course of their schooling.O'Malley, Kimberly
Tong, Ye
08-2006
Bulletin #1: Universal DesignDevelopers of large-scale assessments have, for quite some time, stressed the need for participation of populations with unique educational needs, varying cultural experiences, diverse linguistic backgrounds, and numerous special needs.Harms, Michael, PhD
Nichols, Paul, PhD
Walsh, Chris
06-2006
Adolescent/Adult Sensory ProfileThe Adolescent/Adult Sensory Profile enables clients from 11 through 65+ years to use a Self-Questionnaire for evaluating their behavioral responses to everyday sensory experiences. 06-2006
Understanding the Relationship Between Critical Thinking and Job PerformanceThis study was conducted to evaluate the relationship between a measure of critical thinking ability and job performance as measured by supervisors’ ratings.Ejiogu, Kingsley C.
Rose, Mark
Trent, John
Yang, Zhiming
05-2006
Practical Questions in Introducing Computerized Adaptive Testing for K-12 AssessmentsIn this paper, a number of practical questions related to introducing CAT for K-12 assessments are discussed.Way, Walter D.04-2006
Score Comparability of Online and Paper Administrations of the Texas Assessment of Knowledge and SkillsThe comparability studies presented in this paper illustrate how responsible and psychometrically defensible comparability analyses can be incorporated within the constraints of a high-stakes, operational testing program like TAKS.Fitzpatrick, Steven
Laughlin Davis, Laurie
Way, Walter D.
04-2006
Miller Analogies Research ReportOnce a mainstay of college admission tests, many educators have come to see the analogy format as obsolete, something like a manual typewriter in the age of word processing and text messaging.Meagher, Don03-2006
Understanding AnalogiesOnce a mainstay of college admission tests, many educators have come to see the analogy format as obsolete, something like a manual typewriter in the age of word processing and text messaging.Meagher, Don, EdD03-2006
Administering Alternate AssessmentsIn the United States, a series of federal laws have been enacted that require all students with disabilities to be included in state accountability assessments.Kreusel, Sheree12-2005
A Primer on Assessing the Visually ImpairedThe No Child Left Behind Act of 2001 (NCLB)—which is the most recent reauthorization of the Elementary and Secondary Education Act of 1965 (ESEA)—requires that in order to get federal funds, states must hold schools and districts accountable for the achievement of their students as measured by standardized achievement tests.Case, Betsy J., PhD
Jeffries, Janis L.
Zucker, Sasha
12-2005
Accommodations for the DeafRaising academic standards for all students and measuring student achievement to hold schools accountable for educational progress are central strategies for promoting educational excellence and equity in our schools.Case, Betsy J., PhD11-2005
Accountability and Educational Progress for Students with DisabilitiesThe No Child Left Behind Act of 2001 (NCLB), the most recent reauthorization of the Elementary and Secondary Education Act of 1965 (ESEA), has significantly impacted state educational systems and local school districts.Case, Betsy J., PhD09-2005
The Age of AccountabilityWith the goal of supporting global education, Pearson Inc. (Pearson) was a sponsoring agency of the 2005 China-U.S. Conference on Educational Assessment held in Beijing.Case, Betsy J., PhD09-2005
TransadaptationIn recent decades, the nation's classrooms have seen an increase in the number of students who are not native speakers of English, a group referred to by the No Child Left Behind Act of 2001 (NCLB) as students with limited English proficiency (LEP).Alaniz, Linda G.
Guzman, Luis
Miska, Margarita
Zucker, Sasha
09-2005
Inclusive Design for Maximum Accessibility: A Practical Approach to Universal DesignThe purpose of this article is to briefly review the literature related to Universal Design for Learning (UDL) and Universal Design for Assessment (UDA), and outline an approach for combining these two philosophies in evaluating large-scale assessment programs.Hanna, Elizabeth I.08-2005
Recent Trends in Comparability StudiesThe purpose of this paper is to review the research addressing the comparability of computer-delivered tests and pencil-and-paper tests, and particularly the research since 1993.Paek, Pamela08-2005
Comparing Standards-based Item Banks and Pre-built Tests for Classroom AssessmentThe current era of accountability and standards-based reform in education has had many effects on teachers’ approaches to instruction in the classroom. Recently, the importance of assessment has been bolstered by the No Child Left Behind Act of 2001 (NCLB).Pearson, Inc.08-2005
Curriculum NarrowingThe current era of education reform in the United States can be traced to the passage of the Elementary and Secondary Education Act of 1965 (ESEA), which, among its provisions, required states to monitor and assess the educational progress of students.King, Kelly V.
Zucker, Sasha
08-2005
Systematic Feedback for More Effective Teaching and LearningThe value of taking a test is in getting the results back quickly and in a manner that communicates them clearly.Jorgensen, Margaret A; Ph.D.08-2005
Aligning ELP Assessments to ELP StandardsIntensified attention to alignment between state English language proficiency (ELP) assessments and state ELP standards has primarily been driven by the requirements of the No Child Left Behind Act of 2001 (NCLB).Johnson, Diane F.07-2005
Early Mathematics and EMDA™With the advent of standards-based reform and the No Child Left Behind Act of 2001 (NCLB), educational systems are being held accountable to high levels of student achievement as never before.Pearson, Inc.07-2005
Horizontal and Vertical AlignmentAlignment is typically understood as the agreement between a set of content standards and an assessment used to measure those standards.Case, Betsy, PhD
Zucker, Sasha
07-2005
Methodologies for AlignmentAlignment can be broadly defined as the degree to which the components of an education system work together to achieve the desired goals of stakeholders.Case, Betsy; PhD
Zucker, Sasha
07-2005
CELF-4/WISC-IV Technical ReportChildren who participated in research studies with the Wechsler Intelligence Scales for Children-Fourth Edition Integrated (Wechsler, Kaplan, Fein, Kramer, Morris, Delis, & Maerlender; 2004) were administered the Clinical Evaluation of Language Fundamentals-Fourth Edition (Semel, Wiig, & Secord; 2003). 06-2005
Boehm-3The Boehm Test of Basic Concepts, Third Edition (Boehm-3) is a group administered assessment for students in kindergarten through second grade.Boehm, Ann E., Ph.D.06-2005
Boehm-3 PreschoolThe Boehm Test of Basic Concepts, Third Edition (Boehm-3) is a group administered assessment for students in kindergarten through second grade.Boehm, Ann E., Ph.D.06-2005
BrackenThe Bracken Basic Concept Scale—Revised (BBCS–R) is used to assess the basic concept development of children ages 2 years, 6 months through 7 years, 11 months.Bracken, Bruce A.06-2005
WIAT-II Technical Report: Interpreting Performance on the Reading Comprehension SubtestInterpreting performance on the Reading Comprehension subtest of the Wechsler Individual Achievement Test®–Second Edition (WIAT®–II, Update 2005) is challenging for some examiners, particularly when a student must reverse to a preceding item set.Breaux, Kristina C., Ph.D.06-2005
School Function AssessmentSchool professionals recognize that effective school performance depends on a student’s ability to perform a variety of functional tasks that enable him or her to participate in various learning activities.Coster, Wendy
Deeney, Theresa
Haley, Stephan
Haltiwanger, Jane
06-2005
Sensory ProfileThe Sensory Profile™ provides a standard method for professionals to measure the sensory processing abilities of children 5 to 10 years old (separate cut scores for 3 and 4 year olds are provided in the manual) and to profile the effects of sensory processing on functional performance in the children’s daily lives.Dunn, Winnie06-2005
SCAN-ASCAN-A: A Test for Auditory Processing Disorders in Adolescents and Adults enables professionals to obtain central auditory test results in approximately 20 minutes for adolescents and adults.Keith, Robert W.06-2005
SCAN-CThe SCAN-C Test for Auditory Processing Disorders in Children-Revised is an individually administered test used to identify children between ages 5 years, 0 months and 11 years, 11 months who have auditory processing disorders.Keith, Robert W.06-2005
WAIS-III Technical Report: Response to FlynnIn Tethering the Elephant: Capital Cases, IQ, and the Flynn Effect (Flynn, 2006), Dr. Flynn states that the WAIS-III© standardization sample is substandard and a 2.34 point adjustment to the FSIQ score is required in post conviction capital murder casesWeiss, Lawrence G., Ph.D.06-2005
PLS-4 Technical ReportThe Preschool Language Scale , Fourth Edition (PLS-4) is an individually administered test for identifying children from birth through 6 years, 11 months who have a language disorder or delay.Pond, Roberta Evatt, M.A.
Steiner, Violette G., B.S.
Zimmerman, Irla Lee, Ph.D.
06-2005
RBANS Supplement #1 ReportThis supplement provides
* subtest means and SDs for the normal standardization sample,
* comments on general issues in interpreting performance on the RBANS,
* additional information on test-retest interpretation,
* further information on “cortical–subcortical deviation” scores, and
* updated clinical validity information.
Randolph, Christopher, Ph.D.06-2005
Bayley-III Technical Report 1The Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III; Bayley, 2006) is designed to measure the developmental status of young children, ages 1 to 42 months. 06-2005
Bayley-III Technical Report 2The Bayley Scales of Infant and Toddler Development–Third Edition (Bayley–III; Bayley, 2006) measures cognitive, language, motor, social-emotional, and adaptive development and is a revision of its predecessor, the Bayley Scales of Infant Development—Second Edition (BSID–II; Bayley, 1993). 06-2005
CELF-4 Technical ReportThe Clinical Evaluation of Language Fundamentals®–Fourth Edition (CELF®–4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language disorder or delay. 06-2005
DELV Technical ReportThe Diagnostic Evaluation of Language Variation (DELV) family of assessments is unique because it is the only language assessment series that accounts for the diversity in American English and identifies children who are at risk for or show signs of a speech or language disorder. 06-2005
Infant Toddler Sensory ProfileThe Infant/Toddler Sensory Profile provides a standard method for professionals to measure a child’s sensory processing abilities and to profile the effect of sensory processing on functional performance in the child’s daily life. 06-2005
ABAS Technical SupplementThe Adaptive Behavior Assessment System (ABAS; Harrison&Oakland, 2000) uses a behavior-rating format to assess adaptive behavior and related skills for individuals 5 through 89 years of age. 06-2005
ECHOS Technical Data ReportThe Early Childhood Observation System (ECHOS) is a web-based, ongoing observational assessment tool for children in Pre-Kindergarten (ages 3-5) and Kindergarten through Grade 2. 06-2005
Early Reading and the Early Reading Diagnostic Assessment, Second Edition (ERDA Second Edition)For virtually all students, learning to read and write begins long before kindergarten. It is a complex and dynamic process.Jordan, Dr. R. Rosalie
King, Kelly
Kirk, David J.
04-2005
Strategies for Controlling Item Exposure in Computerized Adaptive Testing with the Partial Credit ModelExposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT).Dodd, Barbara G. (University of Texas at Austin)
Laughlin David, Laurie
03-2005
Evidence for the Interpretation and Use of Scores from an Automated Essay ScorerThis paper examined validity evidence for the scores based on the Intelligent Essay Assessor (IEA), an automated essay-scoring engine developed by Pearson Knowledge Technologies.Nichols, Paul03-2005
WISC-IV Technical Report #4: GAIThis technical report is the fourth in a series intended to introduce the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV;Wechsler, 2003).Coalson, Diane, Ph.D.
Raiford, Susan E., Ph.D.
Rolfhus, Eric, Ph.D.
Weiss, Lawrence G., Ph.D
01-2005
WISC-IV Technical Report #4.1: GAI with Canadian NormsThis technical report is the fourth in a series intended to introduce the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV;Wechsler, 2003).Coalson, Diane, Ph.D.
Raiford, Susan E., Ph.D.
Rolfhus, Eric, Ph.D.
Saklofske, Donald H., Ph.D.
Weiss, Lawrence G., Ph.D
Zhu, Jianjun, Ph.D.
01-2005
Alignment in Educational AssessmentIn the context of education, alignment can be broadly defined as the degree to which the components of an education system--such as standards, curricula, assessments, and instruction--work together to achieve desired goals (Ananda, 2003; Resnick, Rothman, Slattery, and Vranek, 2003; Webb, 1997b).Case, Betsy J., PhD
Jorgensen, Margaret A., PhD
Zucker, Sasha
12-2004
MicroCog Assessment of Cognitive FunctioningMicroCog™: Assessment of Cognitive Functioning (MicroCog) is a computer-administered cognitive screening instrument that was originally developed for a disk operating system (DOS).Drozdick, Lisa
Holdnack, James
Lane, Andre
12-2004
Toward Complete AssessmentIn the United States, there is a wide range of educational assessment systems used for many different purposes.Jorgensen, Margaret A, Ph.D.
Zucker, Sasha
11-2004
Value Added“Value-added,” a term originally used in business and economics, has become widely used to describe certain educational assessment and accountability systems. 11-2004
AugmentationSince the introduction of the Stanford Achievement Test in 1923—the first of its kind—large-scale standardized norm-referenced assessments have served a variety of important purposes in education.Zucker, Sasha
Christensen, Ray
Ellis, Roy T.
Harris, Herb
Manning, Duane
08-2004
A Performance Comparison of Native and Non-native Speakers of English on an English Language Proficiency TestAccording to the U.S. Census Bureau, the population of the United States grew by approximately 12.5 percent from 1995 to 2000. Crawford (1997) reports that language diversity has increased dramatically throughout the nation.Jiao, Hong
Stephenson, Agnes, Ph.D.
Wall, Nathan
08-2004
SDRT 4/SDMT 4 Administration Mode Comparability StudyThe widespread availability of computers in schools has focused the assessment community on the use of computer-based testing solutions in the classroom.Brooks, Thomas E., Ph.D.
Wang, Shudong; Ph.D.
Young, Michael J.; Ph.D.
08-2004
The Distractor Rationale Taxonomy: Enhancing Multiple-Choice Items in Reading and MathematicsRecent education reform legislation, especially the No Child Left Behind Act of 2001 (NCLB), has highlighted discussion concerning the relationship between assessment and classroom instruction.Gardner, Doug A.
Jorgensen, Margaret A., PhD
King, Kelly V.
Zucker, Sasha
07-2004
Effective SchoolsWhy do some public schools that educate students from disadvantaged backgrounds make a difference while others fail?Jones, Terry L.
Kirk, David J.
07-2004
Online or Paper: Does Delivery Affect Results?The Stanford Diagnostic Reading Test, Fourth Edition (SDRT 4) and Stanford Diagnostic Mathematics Test, Fourth Edition (SDMT 4) were adapted to a computer-based, online format in 2003.Wang, Shudong, PhD06-2004
Accountability and Educational Progress: Including Students with Disabilities and/or Culturally and Linguistically Diverse StudentsJust as the People’s Republic of China has its nine-year Compulsory Education Law of 1985, the United States has a pre-eminent law as well. The U.S. law is known as the No Child Left Behind Act of 2001 (NCLB).Case, Betsy J., PhD06-2004
Alternate Assessments for Students with Significant Cognitive DisabilitiesAccountability systems are based on measuring the progress of all students. The reform movements and current laws are designed to ensure that all students have opportunities to learn to high standards.Almond, Patricia J., PhD
Case, Betsy J., PhD
06-2004
New Visions New FuturesSelf-determination for students with disabilities began to get federal attention in the mid-1980s.Case, Betsy J., Ph.D.06-2004
Administration Practices for Standardized AssessmentsPearson Inc. (Pearson) develops and distributes a variety of assessments for educational and clinical purposes. To meet the goal of producing highly valid, reliable results for test users, each of these products is developed according to strict guidelines.Zucker, Sasha
Galindo, Margarita
Grainger, Elaine
Severance, Nancy
04-2004
Scientifically Based ResearchA significant aspect of the No Child Left Behind Act of 2001 (NCLB) is the use of the phrase "scientifically based research" well over 100 times throughout the text of the law.Zucker, Sasha03-2004
The Standards-Referenced Interpretive Framework: Using Assessments for Multiple PurposesFrom the outset of the development of a large scale assessment system, a central activity in the design process concerns determining the assessment’s interpretive framework—the way that its results are understood to convey meaningful information.Young, Michael J., PhD
Zucker, Sasha
03-2004
The Value of the Stanford Scale as a Common MetricAs states and school districts report on the adequate yearly progress (AYP) of student performance as required by the No Child Left Behind Act of 2001 (NCLB), there will be increasing emphasis on documenting student progress along a developmental continuum within selected subject areas.Jorgensen, Margaret A., PhD03-2004
Assessing Young ChildrenToday’s educational climate of standards and accountability extends even to preschool programs (Bowman, Donovan, and Burns, 2001).Case, Betsy J., PhD
Guddemi, Marcy; PhD
02-2004
Cognitive LabsA cognitive lab is a method of studying the mental processes one uses when completing a task such as solving a mathematics problem or interpreting a passage of text.Case, Betsy J., Ph.D.
Sassman, Christy
Zucker, Sasha
02-2004
Fundamentals of Standardized TestingTests are a familiar part of classroom instruction. Each week, teachers use a wide variety of tests, such as spelling tests, mathematics pop quizzes, and essay tests.Zucker, Sasha12-2003
Assessing English Language Proficiency: Using Valid Results to Optimize InstructionThe No Child Left Behind Act of 2001 (NCLB) has focused increased attention on the appropriate assessment of English language learners (ELL students) in U.S. public schools.Johnson, Diane F.
Jorgensen, Margaret A., Ph.D.
Stephenson, Agnes, Ph.D.
Young, Michael J., Ph.D.
11-2003
Establishing Performance Levels for the Stanford English Language Proficiency Test (Stanford ELP)The Stanford English Language Proficiency Test (Stanford ELP) measures the English proficiency of students in kindergarten through grade 12 whose first language is not English.Stephenson, Agnes, Ph.D.10-2003
Academic & Social English for ELL StudentsTo better identify the AL that should be assessed with an ELP test, we need to refocus on the main purpose of an ELP test: to assess students' general English language ability.Johnson, Diane F.09-2003
History of the No Child Left Behind Act of 2001 (NCLB)In August of 1981, the National Commission on Excellence in Education was chartered under the authority of 20 U.S.C. 1233a to, among other purposes and functions, "review and synthesize the data and scholarly literature on the quality of learning and teaching in the nation's schools, colleges, and universities, both public and private, with special concern for the educational experience of teen-age youth" (U.S. Department of Education, 1983a).Hoffmann, Jenny
Jorgensen, Margaret A., PhD
08-2003
Maximizing Equity and Access in Test ConstructionPresident Bush’s education agenda (The White House, 2001) reinforced what some states were already addressing in their policies aimed at improving education.Emrick Massad, Carolyn, Ph.D.07-2003
Universal DesignThe concept of universal design has its roots in the field of architecture.Case, Betsy, PhD06-2003
DAS Technical Report: Ability-Achievement Discrepancy Table for use with the WIAT-IIThe Wechsler Individual Achievement Test®—Second Edition (WIAT–II; Psychological Corporation, 2002) and the Differential Ability Scales® (DAS; Elliott, 1990) were administered to 100 children, 6 to 16 years old.Diehl, Kim
Elliott, Colin
O'Donnell, Louise
Rolfhus, Eric
Weiss, Larry
06-2003
Timed Versus Untimed Testing Conditions and Student PerformanceThe adherence to fixed time limits during group administrations of norm-referenced standardized tests has been an accepted practice from the early part of the 20th century (Anastasi, 1976; pp. 32-34).Brooks, Thomas E., PhD
Case, Betsy J., PhD
Young, Michael J., PhD
06-2003
Augmentation: An Implementation Strategy for the No Child Left Behind Act of 2001With the passage of the No Child Left Behind Act of 2001 (NCLB), states have the opportunity to redefine and enhance their assessment programs to support the diagnostic needs of students, educators, and parents. Pearson Education, Inc. (Pearson) has developed a strategy that combines the best features of norm-referenced and standards-based test designed to ensure that states comply with the annual reporting requirements of NCLB.Hicks-Herr, Stacy
Hoffmann, Jenny
Jorgensen, Margaret A, PhD
06-2003
WISC-IV Technical Report #1: Theoretical Model and Test BlueprintThis technical report is the first in a series intended to introduce the Wechsler Intelligence Scale for Children - Fourth Edition (WISC-IV).Rolfhus, Eric, Ph.D
Weiss, Lawrence G., Ph.D
Williams, Paul E., Psy.D.
06-2003
WISC-IV Technical Report #2: Psychometric PropertiesThis technical report is the second in a series intended to introduce the Wechsler Intelligence Scale for Children - Fourth Edition (WISC-IV).Rolfhus, Eric, Ph.D
Weiss, Lawrence G., Ph.D
Williams, Paul E., Psy.D.
06-2003
WISC-IV Technical Report #3: Clinical ValidityThis is the third in a series of technical reports on WISC-IV.Rolfhus, Eric, Ph.D
Weiss, Lawrence G., Ph.D
Williams, Paul E., Psy.D.
06-2003
Calculator Use on Stanford Series Mathematics TestsCalculator use has, over the years, become increasingly integrated into mathematics instruction and testing.Brooks, Thomas E., Ph.D.
Case, Betsy J., Ph.D.
Cerrillo, Tracy, Ph.D.
Severance, Nancy
Wall, Nathan
Young, Michael J., Ph.D.
05-2003
It's About Time: Stanford Achievement Test Series, Tenth Edition (Stanford 10)Pearson decided to make the Stanford Achievement Test Series, Tenth Edition (Stanford 10) an untimed test for several compelling reasons.Case, Betsy J., PhD04-2003
Accommodations for METROPOLITAN8The trend toward the inclusion of students with disabilities and limited English proficient (LEP) students in state and district-wide assessment programs became a requirement in 1997 with the reauthorization of the Individuals with Disabilities Act (IDEA).Case, Betsy J., PhD03-2003
Accommodations for Stanford 9For more than 25 years, federal law has guaranteed a free and appropriate public education to children with disabilities.Case, Betsy J., PhD
Slawski, Edward J.
03-2003
The New Norm-Referenced Test ModelTraditionally, there have been distinct differences in the purpose, use and development guidelines for a norm-referenced test (NRT) and for a criterionreferenced test (CRT).Jorgensen, Margaret A., PhD
McBee, Maridyth, PhD
03-2003
Color BlindnessColor vision is determined by the discrimination of three qualities of color: hue (such as red vs. green), saturation (that is, pure vs. blended colors), and brightness (that is, vibrant vs. dull reflection of light) (Arditi, 1999a).Case, Betsy J., PhD02-2003
Accommodations on Stanford 10 for Limited English Proficient (LEP) StudentsTitle I of the Elementary and Secondary Education Act (ESEA) was amended by the No Child Left Behind Act (NCLB) in January 2002. Under NCLB, all students are to be included in the measurement of progress toward state achievement standards.Case, Betsy J., PhD02-2003
Accommodations on Stanford 10 for Students with DisabilitiesRequirements for including all students with disabilities (SWD) in assessments stem from a number of federal laws, including Section 504 of the Vocational Rehabilitation Act of 1973 (Section 504), Title II of the Americans with Disabilities Act of 1990 (ADA), Title I of the Elementary and Secondary Education Act (ESEA), and the Individuals with Disabilities Education Act of 1997 (IDEA).Case, Betsy J., PhD02-2003
Test and Answer Document Design and Layout The New Norm-Referenced Test ModelPearson Inc. (Pearson) drew from a variety of sources to create and validate its innovative test and answer document design for the Stanford 10 Assessment Series (Stanford 10).Case, Betsy J., PhD02-2003
Exploring the Use of Item Bank Information to Improve IRT Item Parameter EstimationOn occasion, the sample of students available for calibrating a set of assessment items may not be optimal.Ansley, Timothy
Hall, Erika
 
WISC-IV Technical Report #5: WISC-IV and Children's Memory Scale Drozdick, Lisa W.
Holdnack, James
Rolfhus, Eric
Weiss, Larry
 
Response Probability Criterion and Subgroup PerformanceIn the standard setting literature, there has been much debate about the most appropriate response probability (RP) to use in an item mapping procedure such as the Bookmark Standard Setting Procedure.Egan, Karla
Mueller, Canda D.
Schneider, M. Christina
 
WISC-IV Technical Report #7: Extended Norms Cayton, Tom, PhD
Gabel, Amy, PhD
Weiss, Larry, PhD
Zhu, Jianjun, PhD
 
A Comparison of Item and Testlet Selection Procedures in Computerized Adaptive TestingTestlet response theory (TRT) is a measurement model that can capture local dependency in testlet-based tests.Chen, Tzu-An Ann
Dodd, Barbara G.
Ho, Tsung-Han
Keng, Leslie
 
WISC-IV Technical Report #6: Using the Cognitive Proficiency Index in Psychoeducational AssessmentThe Cognitive Proficiency Index (CPI) summarizes performance on the WISC-IV working memory and processing speed indices in a single score.Gabel, Amy Dilworth, Ph.D
Weiss, Lawrence G., PhD
 
ABAS-Second EditionThe Adaptive Behavior Assessment System-Second Edition (ABAS-II) provides a comprehensive norm-referenced assessment of the adaptive skills of individuals ages birth to 89 years.Harrison, Patti L.
Oakland, Thomas
 
DEAP Technical ReportThe Diagnostic Evaluation of Articulation and Phonology, U.S. Edition (DEAP) is a comprehensive, individually administered, norm-referenced battery designed to provide differential diagnoses of speech disorders in children ages 3.0-8.11 years.Crosbie, Sharon
Dodd, Barbara
Holm, Alison
Ozanne, Anne
 
A Generalization of Stratified α that Allows for Correlated Measurement Errors between SubtestsThis paper presents a generalization of Stratified α that allows for correlated measurement errors between some subtest scores that make up a composite score.Keng, Leslie
Miller, G. Edward
O'Malley, Kimberly
Turhan, Ahmet
 
The VIP (Validity Indicator Profile) Test in CourtVIP users may be called upon to present their clinical findings and conclusions, either in person or by report, to a judge, jury, disability board, or agency that makes decisions about remunerative awards.  
Research Services Quarterly Newsletter - Vol 1, No 1, 2008   
Research Services Quarterly Newsletter - Vol 1, No 2, 2008   
Research Services Quarterly Newsletter - Vol 1, No 3, 2008   
Research Services Quarterly Newsletter - Vol 2, No 1, 2009