Imagine a cash strapped organisation working in the education sector, exploring ways to evaluate the impact of its intervention. Since the ultimate aim of any intervention is to improve student learning outcomes, it would seem but natural for policy makers and educational organizations to use state-administered school leaving exams for impact evaluation purposes. Even universities and employers list state exam scores as one of the primary criteria for admission of candidates. As data on student achievement already exists, they do not have to incur the cost of designing and conducting tests especially for the evaluation, thus keeping evaluation costs relatively low. Yearly scheduling of public examinations makes it easier to compare results across years, substantially reducing time and effort invested in conducting large scale study administered tests over multiple years.
So, it wasn’t surprising that we found ourselves looking into the possibility of analysing the vast pool of data available as state exam scores. After an in-depth review of literature and several conversations with education experts, we decided that even though state exam scores may appear like the obvious choice, they may not be the wisest, especially in the context of low-income countries.
Formats for storing data
In developing countries, high stakes school leaving examinations are conducted and evaluated by state or central education boards. However, they may not always have a centralised mechanism to store and disseminate these scores. To overcome this problem, evaluators often approach multiple officials at different levels of the system hoping to get access to exam scores. Despite the effort to aggregate scores from various sources, they might not be successful in procuring scores from all schools under study. This is because officials may not be willing to share data with non-profits, especially if schools are performing poorly. Organisations then need to invest in government advocacy and relationship building efforts with state boards. In Uganda, it was relatively easier to get access to school level data but it was usually in hard copies or pdf files, adding to data entry costs. In addition to this, granularity of data varied from official to official. While some districts maintained it at student level, others kept it school and district level, making it difficult to compare schools and track their progress over time. Moreover, while analysing PLE test scores, our research team encountered many instances with math errors in official reports. The total number of students who took the exam did not match the sum of students in each division. In such a case, when official data is unreliable, it does more harm to analyse erroneous data than not analyse it at all.
In India, on the other hand, data obtained required extensive cleaning and formatting before it could be analysed. Absence of standardised and centralised data storing systems meant that more often than not, data was available in different formats, and often unreliable, making it difficult to undertake comparative analysis of any kind.
Guidelines for designing and scoring the exams
A cursory glance at the marking scheme would lead one to credit the boards for delineating the structure of the exam and the skills it purports to measure. However, a closer look would reveal that at least 50% of the paper focuses on rote learning and only 35% of the paper measures conceptual understanding and high order thinking skills of students. The question then raised is: What do we mean by student learning and what are its components?
Even if the examination boards have a clear definition of student learning and guidelines to measure it, they may not have the capacity to check psychometric properties, that is, the reliability and validity of these exam scores. In the absence of standardised benchmarks to evaluate the quality of the test, it is common to see year on year variation in difficulty levels. This in turn makes it difficult to compare yearly results. It also makes longitudinal analysis at any level, whether student, school or district level unreliable.
Although examination boards do have prescribed guidelines for scoring, the actual procedure is quite opaque. Answers are evaluated by comparing them to ‘ideal’ answers but there are no documents explaining how the ideal answers are chosen or formulated. As a result, greater marks are awarded to students whose language and content are closer to the ‘ideal’ thus promoting rote learning. A report compiled by Geeta Kingdon (2017) clearly illustrates the failure of public examinations to reflect student learning outcomes. The ‘moderation policy’ followed by state boards has become synonymous with lenient checking and inflated scores. To preserve its electoral mandate, incumbent governments are often reluctant to curb falsification and misreporting of marks because a fall in passing rate would reflect poorly on them. However, this ends up compromising the quality and reliability of exam scores in accurately assessing learning outcomes of students.
Link between exam scores and student learning
While the above-mentioned issues require serious consideration, there is a larger concern for using state exam scores to measure program impact. Public exams test for learning gains acquired during the academic year. However, in most cases, majority of students have learning levels below the grade in which they study making it difficult to identify progress below that grade. J-PAL conducted an evaluation study using both study-administered tests and school exams to measure impact of Mindspark, technology aided tutoring platform for students and teachers, in India (Muralidharan, Singh, and Ganimian 2017). It revealed improvement in learning outcomes using study-administered tests but no such results were found on analysing public exam scores. This is because while learning did take place, it remained below grade-level and thus was not reflected in exams which capture learning achievements only for that particular grade. Our research team too was unable to find significant differences in the percentage of students failed in STIR schools when compared with non-STIR schools using PLE data. This shows that the link between exam scores and learning outcomes may not be as strong as we expected it to be. Hence, it will be a challenge to capture learning outcomes using state exam scores.
It is true that we would like public exam scores to be an accurate reflection of student learning outcomes, but it’s also true that institutions may not have the capacity to collect, process and store data in standardised formats. Additionally, state exams are designed to capture student learning pertaining to a particular grade. This limits the scope for using them to assess below-grade learning outcomes. In such cases, study-administered tests, which are tailored specifically to a particular intervention, would be a better alternative to high-stakes school-leaving exams when evaluating impact. However, study-administered tests raise costs of conducting evaluations. In cases where financial and logistical constraints may not permit use of study-administered tests, organisations will have to weigh the benefits of lower costs of using exam scores, with the loss in quality and reliability that accompanies its use. In suitable situations, exam scores may be used in conjunction with other data sources like classroom observations and teacher needs assessments which measure teachers’ pedagogical and content knowledge. This will allow evaluators to triangulate results obtained from analysing student exam scores. However, evaluators should always keep the quality of tests in mind and be cognizant of the fact that links between the intervention and test scores may be quite weak, making it difficult to estimate impact.
Insights and Impact Team