首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Randomly selected fifth, seventh, ninth, and eleventh graders (sixty from each grade) were givenanability test. The score and the time taken were used to test the hypotheses of no negative linear relationship and no curvilinear relationship between test score and test time. Although no significant linear relationships were found, significant curvilinear regressions of time on score were found in grades seven and nine. The strength of these significant relationships were relatively low in both grades.  相似文献   

2.
通过对函数S-粗集和动态规划算法的研究,提出了相似度和可信度概念,给出了非标准化试题实现评分的方案和步骤,其中关键步骤是迁移处理和计算最长公共子序列长度。主要阐述了基于函数S-粗集的迁移处理,并分析了计算最长公共子序列长度解的结构和计算方法,最后分别给出了迁移函数和计算最长公共子序列长度函数的源程序。  相似文献   

3.
4.
Current thinking on validity suggests that educational institutions and individuals should evaluate their uses of test scores in the context of their fundamental goals. Regression coefficients and other traditional criterion-related validity statistics provide relevant information, but often do not, by themselves, address the fundamental reasons for using test scores. Formal decision theory models provide a logically rigorous way to do this, but they are difficult to implement in practice. This article considers a simplification of formal decision theory models, in which one estimates the proportion of examinees for whom positive outcomes result from a use of test scores. For uses involving selection, the proportion of examinees with positive outcomes can be calculated by applying traditional regression coefficients to the marginal distribution of scores in the unselected population. The incremental usefulness of using a particular variable can be judged by comparing its proportion to that associated with no selection and to that associated with using another variable, either alone or jointly. Examples, related to college admission and retention, are given to illustrate these ideas.  相似文献   

5.
ABSTRACT

Previous studies have shown that several key variables influence student achievement in geometry, but no research has been conducted to determine how these variables interact. A model of achievement in geometry was tested on a sample of 102 high school students. Structural equation modeling was used to test hypothesized relationships among variables linked to successful problem solving in geometry. These variables, including motivation, achievement emotions, pictorial representation, and categorization skills, were examined for their influence on geometry achievement. Results indicated that the model fit well. Achievement emotions, specifically boredom and enjoyment, had a significant influence on student motivation. Student motivation influenced students’ use of pictorial representations and achievement. Pictorial representation also directly influenced achievement. Categorization skills had a significant influence on pictorial representations and student achievement. The implications of these findings for geometry instruction and for future research are discussed.  相似文献   

6.
Are variations in test-preparation practices from school to school undermining the meaningfulness of achievement test results? Is there pressure to raise achievement test scores by the use of educationally unsound practices? What uses of achievement test scores are most common? Do teachers and administrators have reasonably accurate views of test score uses?  相似文献   

7.
In the service of educational accountability, student achievement tests are being used to measure constructs quite unlike those envisioned by test developers. Scores are compared to cut points to create classifications like “proficient”; scores are combined over time to measure growth; student scores are aggregated to measure the effectiveness of teachers, schools, and school districts; indices are created to measure college and career readiness. These and other new uses rely on derived scores created to measure new constructs. The field of educational and psychological measurement has largely ignored these significant, consequential measurement applications. The conceptual frameworks and analytical tools of educational and psychological measurement should be used to study such derived scores and the validity of their uses and interpretations.  相似文献   

8.
《Educational Assessment》2013,18(4):377-399
A group's average test score is often used to evaluate different educational approaches, curricula, teachers, and schools. Studies of group test scores over time often try to measure "value-added" by holding constant certain student characteristics such as race, parents' education, or socioeconomic status; however, the important statistical phenomenon of regression to the mean is often ignored. There is a substantial literature on the importance of regression to the mean in a variety of contexts, including individual test scores. Here, we look at regression to the mean in group averages. If this regression is not taken into account, changes in a group's average test score over time may be misinterpreted as changes in the group's average ability rather than natural and expected fluctuations in scores about ability. California Academic Performance Index scores are used to illustrate this argument.  相似文献   

9.
How does schooling affect the development of intelligence in children? How should the amount of schooling be considered when developing norms for turning intelligence test performance into IQ scores?  相似文献   

10.
11.
在应试教育的背景下,考试分数的作用被无限夸大。考试分数的强化窄化了评价视域,简化了课程目标,进而异化了基础教育。异化的教育又将考试分数的虚高价值进一步推向极致,最终形成教育的怪圈。本文在对考试分数的不当使用进行案例分析的基础上,提出正确理解与把握评价的几对关系,以期对走出教育的怪圈有所启示。  相似文献   

12.
考前复习是每个应考者都要做的,怎样去进行考前复习可能决定了会取得怎样的考试成绩。有机会获得系统、规范的考前复习培训的应考者尽量采用这种方法去进行考前复习,它对取得好成绩有较大的帮助。本文则主要探讨考前复习对提高考试成绩的重要性。  相似文献   

13.
What has happened in recent decades to the test scores of American students? What can be said about the causes of trends in scores? Do these trends provide an indictment of the effectiveness of American schools? How should test scores be used to inform the debate about education policy?  相似文献   

14.
15.
A sample of 22, 923 students who had taken the SAT and the GRE General Test was classified by the four general undergraduate fields of study and by sex. The authors performed several analyses to determine the degree of differential impact that sex and field of study might have on GRE-Verbal, GRE-Quantitative, and GRE-Analytical scores after controlling on SAT-Verbal and SAT-Mathematical scores. They found, first, that the correlations of SAT-Verbal with GRE-Verbal scores and SAT-Mathematical with GRE-Quantitative scores were extremely high, .86 in the total sample and ranging from the low to middle .80s in the eight subgroups. The impact of curriculum and sex, after controlling on SAT scores, was found to be low on GRE- Verbal scores but relatively high on GRE-Quantitative scores, with students in heavily quantitative fields enjoying an advantage over their peers in less quantitative fields of study. The impact was moderate on GRE-Analytical scores. Further studies designed to "purify" the fields of study and include only clearly verbal fields and clearly mathematical fields showed small additional impact. An additional study indicated a generally slight effect of the institution attended on GRE-Quantitative scores after controlling for sex, major field of study, and initial ability.  相似文献   

16.
考试是教学质量监控中十分重要的环节,是实现学校教学评价和教育教学目标的一种重要手段。考试不仅是对学生学习效果的检测,也是对教师教学质量的检查,还是对学校管理的监控,通过对学生考试成绩的统计分析,可以帮助教师及时发现教学中存在的问题及薄弱环节,及时调整教学内容,改进教学方法;可以对教学管理制度、教学运行体制进行有效得当的检验、监控和纠偏的作用。  相似文献   

17.
Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the Graduate Record Examination if they were to finish before time expires. At the higher-ability levels, even more guessing was required because the questions administered to higher-ability examinees were typically more time consuming. Because the scoring model is not designed to cope with extended strings of guesses, substantial errors in ability estimates can be introduced when CATs have strict time limits. Furthermore, examinees who are administered tests with a disproportionate number of time-consuming items appear to get lower scores than examinees of comparable ability who are administered tests containing items that can be answered more quickly, though the issue is very complex because of the relationship of time and difficulty, and the multidimensionality of the test.  相似文献   

18.
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common‐item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common‐item equating methodology to standard setting ratings to account for systematic differences between standard setting panels) has received almost no attention in the literature. Identity equating was also examined to provide context. Data from a standard setting form of a large national certification test (N examinees = 4,397; N panelists = 13) were split into content‐equivalent subforms with common items, and resampling methodology was used to investigate the error introduced by each approach. Common‐item equating (circle‐arc and nominal weights mean) was evaluated at samples of size 10, 25, 50, and 100. The standard setting approaches (resetting and rescaling the standard) were evaluated by resampling (N = 8) and by simulating panelists (N = 8, 13, and 20). Results were inconclusive regarding the relative effectiveness of resetting and rescaling the standard. Small‐sample equating, however, consistently produced new form cut scores that were less biased and less prone to random error than new form cut scores based on resetting or rescaling the standard.  相似文献   

19.
Grades and Test Scores: Accounting for Observed Differences   总被引:1,自引:0,他引:1  
Why do grades and test scores often differ? A framework of possible differences is proposed in this article. An approximation of the framework was tested with data on 8,454 high school seniors from the National Education Longitudinal Study. Individual and group differences in grade versus test performance were substantially reduced by focusing the two measures on similar academic subjects, correcting for grading variations and unreliability, and adding teacher ratings and other information about students. Concurrent prediction of high school average was thus increased from 0.62 to 0.90; differential prediction in eight subgroups was reduced to 0.02 letter‐grades. Grading variation was a major source of discrepancy between grades and test scores. Other major sources were teacher ratings and Scholastic Engagement, a promising organizing principle for understanding student achievement. Engagement was defined by three types of observable behavior: employing school skills, demonstrating initiative, and avoiding competing activities. While groups varied in average achievement, group performance was generally similar on grades and tests. Major factors in achievement were similarly constituted and similarly related from group to group. Differences between grades and tests give these measures complementary strengths in high‐stakes assessment. If artifactual differences between the two measures are not corrected, common statistical estimates of validity and fairness are unduly conservative.  相似文献   

20.
《教育实用测度》2013,26(2):103-118
Assessment instruments of the future will probably be composed of a combination of different types of questions. Even though different kinds of questions require different scoring procedures, there may be a need to have those different scores combined as a composite. In this article, we describe how mixtures of such scores may be efficaciously combined. Also, if no post hoc adjustment is desired, we provide two characterizations of measurement effectiveness to aid in making unadjusted score combinations efficient. In addition, we explore the implications for test construction of some typical findings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号