When an end-line evaluation shows lower scores than a baseline study

Carita Cruz, Sanna Ryökkynen & Päivi Pynnönen

25.03.2025

Quite puzzling. You conducted a baseline study, had a series of webinars, instructed, supported teachers’ learning through learning assignments, and collected feedback. The mission was completed. The feedback is great. The participants felt they learned a lot. They were happy about the process, etc. BUT: in the end-line study, the participants estimated their competencies to be lower than before the training. What went wrong? Did you make them dumber, or is this a candid camera?

The problematic nature of baseline studies constantly torments researchers. In the project world, baseline studies provide an information base. After the project, you repeat it as an end-line study so that you can assess the change. In many cases, we do not have direct access to use, for example, direct observation as a method, but we have to rely on the respondents’ self-evaluation. Here, we discuss the challenges of baseline studies in vocational education teacher training projects in emerging economic countries.

The validity and reliability of baseline studies

A “baseline” is defined as the measurements of key conditions. It is made before a project begins, giving a line from which change and progress can be assessed. A baseline study is a pre-operation condition for indicators expressed in the project’s logical framework. The change can be measured and evidenced by comparing the conditions after the project. Without a baseline, evaluating the progress or change is difficult. Generally, the baseline is carried out after the project has been designed and before the participants have been selected. The baseline is related to the project’s indicators. Furthermore, the baseline studies should be designed participatory (CoPraxis, 2011, 2; Freudenthal & Narrowe, 1993, 12-13; IFRC 2013, 2; Intrac 2017, 1; UNWFP, n.d., 10).

For example, if you cannot travel to a partner country, the baseline can be conducted as a self-assessment using the Webropol tool. The online self-evaluation gives us our first challenge. Who will answer to the study? Do we know if the respondents are those who they should be? In anonymous questionnaires, anyone accessing the link can answer the baseline study. If we aim to enhance the competencies of particular teachers, we obviously should have them as respondents. Because of limited skills in English (the common language used in many projects), unclear instructions, time constraints, or other misunderstandings, the forms end up being filled by people other than the targeted participants. Moreover, the creator of the questionnaire might not be a native English speaker either, meaning that the losses in translations are multiplicated. The concepts might be understood differently by the respondents, or they are used to describe different things. There are native languages where concepts we try to catch might be missing – although the lack of concepts does not mean that there would not exist practices in use.

Other respondents might answer the baseline questionnaires rather than the end-line questionnaires. Cases are encountered where all questionnaires are responded to by the principal or rector of the institution on behalf of the staff as a rule. Sometimes, in very fragile contexts where many international aid institutions operate, the respondents are bombarded with dozens and dozens of questionnaires, and they might not give too much attention to each. In less-literate cultures, impersonal inquiries in written form may be disdained, while direct personal contacts are preferred. Some respondents might not be able to use the online questionnaire due to the internet challenges, and the questionnaires might be filled in printed form, which another person submits.

Our second piece of the puzzle is whether the contents are relevant. We can research and benchmark other baseline studies on the TVET teachers’ (TVET = technical and vocational education and training) competencies and international practices and then ask for feedback on the pre-questionnaire from the selected group. We have learned in practice, that the comments you receive might be too scarce to give sufficient further advice.

Many times, the results are very good in the beginning. The respondents seem to be very competent in all the fields touched on in the baseline study. For example, in a recent project on TVET teachers’ skills in inclusion, the topics we used for example in harvesting TVET teachers’ skills in inclusion can cover for instance, leadership in inclusion, strategic planning and management of inclusion, teachers’ pedagogical capacity in inclusion, accessibility of the TVET centers, linkages with the working life, and community and other organizations. In this example project, the baseline expressed an average of as high as 4,24/5 for competence in inclusion. You would end up wondering if the training is needed at all if the competencies are already so brilliantly in place. Then the endline self-assessment gave an average of very little difference, more or less the same, or even declined competencies. Furthermore, the results vary if we use all data or only from those institutions that gave a research permit. Sometimes, we cannot make the comparison at all due to the low number of respondents in the endline study.

The high scores can, of course, reflect the real competence of the respondents, which means that we have the wrong target group to train. The training should be addressed to those teachers, whose skills in inclusion benefit from participating in further training, not to those who are already competent. On the other hand, one of the reasons for high scores in baseline could be the tendency in some contexts to use only the higher scores of the Likert scale, while in some contexts, the whole range of the scale is used more evenly. Respondents may tend to answer according to what they assume they should respond and what they think is expected from them. At the time of the baseline study, there might not yet be any personal links between us and the participants, so mutual trust is not yet built. Afterwards, discussing the results with the respondents, they are more open to admitting that many of the questions might have been misunderstood, starting from the definition of inclusion, accessibility, and other concepts. Many participants admit that they did not know what they were commenting.

When the structured end-line studies evidence very little change for the better (or even some for the worse), the open feedback from the participants can reflect the real change and improvement of the skills. By then, the trust is built during the process and more honest and concise answers are expressed in the end-line study. As Aristotle wrote, “The more you know, the more you realize you don’t know.” This might apply to the competencies in inclusion and competencies in conducting the baseline studies as well.

Lessons learned: What would we do differently now?

Firstly, we would consider, if the baseline study is genuinely needed. Can we get the same information from other sources, from assessments, situational analyses, or feasibility studies made by other organisations? Would a need assessment be enough? If the project design requires a baseline study, we could consider, what would be the most appropriate method regarding the respondents’ accessibility to the internet in fragile contexts or the researchers’ accessibility to the site for face-to-face interviews.

Secondly, we would give a longer time for the co-creation phase of the baseline study contents to keep the topics relevant for the respondents. Although this changes the overall idea of the baseline – as the understanding would then be altered already in the process – the results could be much more adequate.

Furthermore, the issues that arise from their reality and context would have come up instead of the generic topics used globally. Of course, the comparability against global practices would be missed, but on the other hand, the better relevance of the training contents planned on the baseline study would be gained. In some contexts, co-creation is not so widely used and the participants may not be familiar with it. They might be used to rely on what the facilitator has planned and raising the willingness to participatory approach is an issue itself.

Thirdly, although the participants were required to be proficient in English, we would translate the questions into the respondents’ mother tongues. Many concepts are abstract and best understood in one’s own language.

Fourthly, although we also would miss many important topics and the “whole picture” of the phenomenon, we would narrow the topics to those most near to the respondents’ everyday teaching practice and the strengths and challenges in inclusion.

Finally, we would use a combination of methods, not only the online survey. Even if we couldn’t travel to meet respondents for direct observations, visual or tactile methods, or direct interviews, we could pay more attention to varying the inquiry methods.

Every baseline study process experience is unique. This article was inspired by our experiences in projects in Somalia. We are grateful for all the opportunities we have had to learn more and more about the topic.

Authors

Carita Cruz, Senior Adviser, International RDI

Sanna Ryökkynen, Principal Research Scientist

Päivi Pynnönen, Senior Lecturer

References

International Federation of Red Cross and Red Crescent Societies (IRFC). (2013). Baseline Basics. https://www.ifrc.org/document/baseline-basics

Intrac for civil society. (2017). Baselines. https://www.intrac.org/app/uploads/2024/11/Baselines.pdf

Freudenthal, S. & Narrowe, J. (1993). Baseline Study Handbook. Focus on the Field. SIDA. https://www.ircwash.org/resources/baseline-study-handbook-focus-field

United Nations World Food Programme (UNWFP). (n.d.). How to Plan a Baseline Study. Monitoring and Evaluation Guidelines. https://focusintl.com/data/documents/RBM015-mekb_module_7.pdf

CoPraxis. (2011). Recognizing Good Practices in the Development of Baseline Studies. A development practice bulletin by the Just Governance Group. https://justgovernancegroup.org/wp-content/uploads/2019/04/3-1.pdf

When an end-line evaluation shows lower scores than a baseline study

The validity and reliability of baseline studies

Lessons learned: What would we do differently now?

Authors

References

LISÄÄ AIHEEN YMPÄRILTÄ / RELATED POSTS

HAMK UNLIMITED

TIETOA SIVUSTOSTA

The validity and reliability of baseline studies

Lessons learned: What would we do differently now?

Authors

References

LISÄÄ AIHEEN YMPÄRILTÄ / RELATED POSTS

Footer

HAMK UNLIMITED

TIETOA SIVUSTOSTA