Population Invariance of Vertical Scaling Results

In this report, the population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3–10 and a state mathematics test spanning grades 3–8.

Powers, Sonya, Turhan, Ahrmet, Binici, Salih (Florida State University) 04-01-2012
The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model

This study investigates the impact of IPC in the context of operational testing programs that employ the 3PL model, alternative equating procedures, and different item re-use policies.

Meyers, Jason L., Murphy, Stephen, Goodman, Joshua, Turhan, Ahmet 04-01-2012
Impact of Group Differences on Equating Accuracy and the Adequacy of Equating Assumptions

This study compared four curvilinear equating methods including frequency estimation, chained equipercentile, IRT true score, and IRT observed score equating.

Powers, Sonya 04-30-2011
Comparing Methods for Detecting Unstable Anchor Items with Net DIF and Global DIF Conceptions

This study is to compare different approaches for detecting misbehavior anchor items in IRT equating using Rasch and partial credit models.

Lau, C. Allen, Arce, Alvaro J. 04-11-2011
Comparison of Asymptotic and Bootstrap Item Fit Indices in Identifying Misfit to the Rasch Model

In this study, our results indicate that bootstrap critical values allow for greater statistical power in diagnosing item misfit caused by varying item slopes and lower asymptotes.

Wolfe, Edward W., McGill, Michael T. 04-01-2011
Statistical Properties of 3PL Robust Z: An Investigation with Real and Simulated Data Sets

The purpose of this paper was to inspect statistical properties of the robust z approach in the context of 3PL equating with the common item non-equivalent group design.

Arce, Alvaro J., Lau, C. Allen 04-01-2011
Investigating Common-Item Screening Procedures in Developing a Vertical Scale

Creating a vertical scale involves several decisions on assessment designs and statistical analyses to determine the most appropriate vertical scale.

Johnson, Marc, Yi, Qing 04-01-2011
Investigating Content and Construct Representation of a Common-item Design When Creating a Vertically Scaled Test

This study investigated how well the guideline of content and construct representation was maintained while evaluating two stability assessment criteria (Robust z and 0.3-logit difference).

Hardy, M. Assunta (BYU), Young, Michael J. (Pearson), Yi, Qing (Pearson), Sudweeks, Richard R. (BYU), Bahr, Damon L. (BYU) 04-01-2011
Improving the Post-Smoothing of Test Norms with Kernel Smoothing

The traditional methodology of apost-smoothing to develop norms used on educational  and clinic products is to hand-smooth the scale scores or their distributions. This approach is  very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the  scores or distributions are adjusted based on personal judgment. Different persons, or same  person at different times, will make significantly different judgments. By contrast, the kernel  smoothing method is a nonparametric approach, which is more flexible, less subjective, and  easier to replicate.

Lin, Anli, Yi, Qing, Young, Michael J. 05-01-2010
IRT Proficiency Estimators and Their Impact

In the current study, we further examined the statistical properties of the various  IRT estimators, especially focusing on their practical impact on the reported scores. We  4  also investigated a few practical scenarios, where the testing focus is on assessing college  readiness, assessing students’ minimal competency, or providing estimates for students  who have failed a previous exam (retesters).

Tong, Ye, Kolen, Michael J. 05-01-2010
Field Testing and Equating Designs for State Educational Assessments

The educational accountability movement has spawned unprecedented numbers of new assessments. For example, the No Child Left Behind Act of 2002 (NCLB) required states to test students in grades 3 through 8 and at one grade in high school each year.

Kirkpatrick, Rob, Way, Walter D. 03-01-2008
An Investigation of the Changes in Item Parameter Estimates for Items Re-field Tested

Large-scale state testing programs typically rely upon a large bank of items to select from when building assessments.

Kong, Xiaojing Jadie, McClarty, Katie Larsen, Meyers, Jason L. 03-01-2008
A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment Data

Equating is a statistical process that is used to adjust scores on test forms so that scores on the forms can be used interchangeably (Kolen & Brennan, 2004), even though the test forms consist of different items.

Tong, Ye, Wu, Sz-Shyan, Xu, Ming 03-01-2008