Teaching English Language

Teaching English Language

Test Fairness in Test Context Framework: Context Matters!

Document Type : Original Article

Authors
1 Department of English, Humanities faculty, Bu-Ali Sina University
2 English Department, Humanities Faculty, Bu-Ali Sina University
10.22132/tel.2026.550960.1992
Abstract
High-stakes English standard tests like TOEFL or IELTS play a key role in making some life-changing decisions regarding people’s immigration, university admissions, and employment opportunities. Given the profound consequences of these tests, ensuring their fairness is of critical importance. However, despite the multifaceted nature of the test fairness construct, existing literature has predominantly addressed it at micro-level aspects, often overlooking broader social and cultural implications of test administration, and the influence of test contextual factors. Against this backdrop, the current study aimed to develop a context specific test context framework in the Iranian EFL setting. Building upon Kunnan's Test Context Framework (TCF, 2008b), the researchers held interviews with 30 standardized high stakes English general
proficiency test takers and TEFL educationalists in this qualitative study. Following template analysis procedure, the content analysis of the interview data resulted in the expansion of TCF to a Revised Test Context Framework (RTCF). The RTCF identified 14 distinctive context types along with their corresponding constituents. The proposed RTCF offers implications at
multiple levels for all beneficiaries of high stakes language testing: it provides policy makers with a framework to evaluate test fairness across a variety of contexts, guides researchers in probing into underrepresented contextual variables, such as sociocultural dimensions of language use, and equips test developers as well as specialists with the knowledge to design context-sensitive fair testing practices in high-stakes language testing.
Keywords

Ahmadi Safa, M. Ansari, F. (2026). Test Fairness in Online Assessment: Insights from Iranian EFL Teachers Perspective. Teaching English as a Second Language Quarterly, 45(1), 81-108. https://doi.org/10.22099/tesl.2025.52797.3395
Akhavan Masoumi, G., & Sadeghi, K. (2020). Impact of test format on vocabulary test performance of EFL learners: The role of gender. Language Testing in Asia, 10(1), 2. https://doi.org/10.1186/s40468-020-00099-x
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Bachman, L. F. (2006). Generalizability: A journey into the nature of empirical research in applied linguistics. In M. Chalhoub-Deville, C. Chapelle, & P. Duff (Eds.), Inference and generalizability in applied linguistics: Multiple perspectives (pp. 165–207). John Benjamins.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford University Press.
Bachman, L. F., & Purpura, J. E. (2008). Language assessments: Gate-keepers or door openers? In B. Spolsky & F. M. Hult (Eds.), Handbook of Educational Linguistics (pp.  456–468). Blackwell Publishers. https://doi.org/10.1002/9780470694138.ch32
Beheshti, Sh., & Ahmadi Safa, M. (2023). Reconceptualization of Test Fairness Model: A Grounded Theory Approach. Iranian Journal of Language Teaching Research, 11(2),119–146. https://doi.org/10.30466/ijltr.2023.121333
Berliner, D., and Biddle, B. (1995). The manufactured crisis: Myths, fraud, and the attack on America’s public schools. Perseus Books.
Bond, H. (1924). Intelligence tests and propaganda. Crisis, 28(2), 6164.
Brigham, C. C. (1975). A study of American intelligence. Kraus Reprint Co.
Brooks, J., McCluskey, S., Turley, E., & King, N. (2015). The utility of template analysis in qualitative psychology research. Qualitative Research in Psychology, 12(2), 202–222. https://doi.org/10.1080/14780887.2014.955224
Chik, A., & Besser, S. (2011). International language test taking among young learners: A Hong Kong case study. Language Assessment Quarterly, 8(1), 73–91. https://doi.org/10.1080/15434303.2010.537417
Cook, V. (2008). Second language learning and language teaching. Hodder Education.
Crabtree, B. F., & Miller, W. L. (1999). Doing Qualitative Research (2nd ed.). Sage.
Davari, H., & Aghagolzadeh, F. (2015). To teach or not to teach? Still an open question for the Iranian education system. In C. Kennedy (Ed.), English language teaching in the Islamic Republic of Iran: Innovations, trends and challenges (pp. 10-19). British Council.
Davies, A. (2008). Ethics, professionalism, rights and codes. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of Language and Education (2nd ed., Vol. 7, pp. 429-443). Springer. https://doi.org/10.1007/978-0-387-30424-3_1911.
Dörnyei, Z. (2007). Research Methods in Applied Linguistics: Quantitative, Qualitative and Mixed Methodologies. Oxford University Press.
Eades, D. (2005). Applied Linguistics and Language Analysis in Asylum Seeker Cases. Applied Linguistics, 26(4), 503–526. https://doi.org/10.1093/applin/ami021
Eades, D., Fraser, H., Siegel, J., McNamara, T., & Baker, B. (2003). Linguistic identification in the determination of nationality: A preliminary report. Language Policy, 2(2), 179–199. https://doi.org/10.1023/A:1024640612273
Fulcher, N.G. & Davidson, Fred. (2007). Language testing and assessment: An advanced resource book. Routledge.
García-Peñalvo, F. J., Corell, A., Abella-García, V., & Grande-de-Prado, M. (2021). Recommendations for mandatory online assessment in higher education during the COVID-19 pandemic. In D. Burgos, A.  Tlili,  & A.  Tabacco  (Eds.), Radical Solutions for Education in a Crisis Context (pp. 85–98). Springer. https://doi.org/10.1007/978-981-15-7869-4_6
Garrison, M. J. (2020). Standardized testing, innovation, and social reproduction. In Encyclopedia of Educational Innovation (pp.1-7). Springer. https://doi.org/10.1007/978-981-13-2262-4_118-21
Geranpayeh, A. (2014). Detecting plagiarism and cheating. In A.J. Kunnan (Ed.), The companion to language assessment (pp. 980–993). Wiley.
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27. https://doi.org/10.111/j.1745-3992.2004.tb00149.x
Hamid, M. O., Hardy, I., & Reyes, V. (2019). Test-takers’ perspectives on a global test of English: Questions of fairness, justice and validity. Language Testing in Asia. 9(1), 1-20. https://doi.org/10.1186/s40468-019-0092-9
House, E. R. (1990) ‘Ethics of evaluation studies.’ In H. J. Walberg & G. C. Haertel (Eds.), The International Encyclopedia of Educational Evaluation. (PP.91–94). Pergamon Press
International Language Testing Association. (2007). ILTA guidelines for practice in English. ILTA
Karami, H. (2013). The quest for fairness in language testing. Educational Research and Evaluation: An International Journal on Theory and Practice, 19(2–3), 158–169. https://doi.org/10.1080/13803611.2013.767621
Khan, A., Hassan, N., & Cheng, L. (2025). Investigating the contextual factors mediating washback effects of a learning-oriented English language assessment in Malaysia. Language Testing in Asia, 15(20). https://doi.org/10.1186/s40468-025-00359-8
Kiany, G. R., Mirhosseini, S.A., & Navidinia, H. (2010). Foreign language education policies in Iran: Pivotal macro considerations. Journal of English Language Teaching and Learning, 2(2), 49-70.
Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan, (Ed.), Fairness and validation in language assessment (pp. 1-13). Cambridge University Press.
Kunnan, A. J. (2008a). Large scale language assessments. In: Hornberger, N.H. (Eds), Encyclopedia of Language and Education. Springer. https://doi.org/10.1007/978-0-387-30424-3_173
Kunnan, A. J. (2008b). Towards a model of test evaluation: Using the Test Fairness and Wider Context frameworks. In L. Taylor & C. Weir (Eds.), Multilingualism and assessment: Achieving transparency, assuring quality, sustaining diversity (pp. 229–251). Cambridge University Press.
Kunnan, A. J. (2009). Testing for citizenship: The U.S. Naturalization Test. Language Assessment Quarterly, 6, 89–97.
Kunnan, A. J. (2010). Test fairness and Toulmin’s argument structure. Language Testing, 27(2), 183–189. https://doi.org/10.1177/0265532209349468
Kunnan, A. J. (2016). Large-scale language assessment. In R. M. Paige & D. L. Lange (Eds.), Handbook of research in second language teaching and learning (pp. 34-48). Routledge. https://doi.org/10.4324/9781315716893-34
Kunnan, A. J. (2018). Evaluating language assessment. Routledge.
Kunnan, A. J. (2020). A case for an ethics-based approach to evaluate language assessments. In G. J. Ockey & B. A. Green (Eds.), Another generation of fundamental considerations in language assessment: A Festschrift in honor of Lyle F. Bachman (pp. 77-93). Springer Nature Singapore Pte Ltd. https://doi.org/10.1007/978-981-15-8952-2_6
McCallin, R. C. (2006). Test administration. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 625–652). Lawrence Erlbaum Associates
McNamara, T. (2001). Language Testing. Oxford University Press.
McNamara, T. (2007). Language Testing: A Question of Context. In J. Fox, M. Wesche, D. Bayliss, L. Cheng, C. E. Turner, & C. Doe (Eds.), Language Testing Reconsidered (pp. 131–138). University of Ottawa Press. https://doi.org/10.2307/j.ctt1ckpccf.13
McNamara, T. and C. Roever (2006). Language Testing: The Social Dimension. Blackwell Publishing.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). Macmillan Publishing Co, Inc; American Council on Education.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Sage Publications, Inc.
Murdoch, S. (2007). IQ: A smart history of a failed idea. John Wiley & Sons.
Pennycook, A. (2017). The cultural politics of English as an international language (1st ed.). Routledge. https://doi.org/10.4324/9781315225593
Pusawati, I. (2014). Fairness issues in a standardized English test for nonnative speakers of English. TESOL Journal, 5 (3), 555-572. https://doi.org/10.1002/tesj.157
Reardon, S. F., Kalogrides, D., Fahle, E. M., Podolsky, A., & Zárate, R. C. (2018). The relationship between test item format and gender achievement gaps on math and ELA tests in fourth and eighth grades. Educational Researcher, 47(5), 284–294. https://doi.org/10.3102/0013189X18762105
Roe, J., Perkins, M.,Chonua, G. K., & Bhatia, A. (2023). Student perceptions of peer cheating behaviour during COVID-19 induced online teaching and assessment. Higher Education Research & Development, 43(4), 1-15. https://doi.org/10.1080/07294360.2023.2258820
Rothstein, R. (2004). Class and schools: Using social, economic and educational reform to close the Black-White achievement gap. Economic Policy Institute
Salaberry, M. R., Weideman, A., & Hsu, W.-L. (2023).Ethics and context in second language testing: Rethinking validity in theory and practice. Routledge. https://doi.org/10.4324/9781003384922
Sanday, P. (1999). On the causes of IQ differences between groups and implications for social policy. In A. Montagu (Ed.), Race and IQ (expanded ed., pp. 276–307). Oxford University Press.
Shohamy, E. (2001). The power of tests: Critical language testing. Routledge. https://doi.org/10.4324/9781315837970
Shohamy, E. (2007). Language tests as language policy tools. Assessment in Education: Principles, Policy & Practice, 14(1), 117–130. https://doi.org/10.1080/09695940701272948
Shohamy, E. (2010). The Power of Tests: A Critical Perspective on the Uses of Language Tests. Routledge.
Shohamy, E., & Pennycook, A. (2019). Extending Fairness and Justice in Language Tests. In C. Roever, & G. Wigglesworth (Eds.), Social Perspectives on Language Testing: Papers in Honour of Tim McNamara (pp.29–45). Peter Lang AG.
Tajeddin, Z., & Chamani, F. (2020). Foreign language education policy (FLEP) in Iran: Unpacking state mandates in major national policy documents. Journal of Teaching Language Skills (JTLS), 39(3.1), 185-215. https://doi.org/10.22099/jtls.2021.38870.2904
Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25 (3), 246–280. https://doi.org/10.1080/08957347.2012.68765
Van der Heijden, J. (2013). Testing skilled migrants’ English: Ridiculous and insulting. Independent Australia.
Weir, C. J. (2005). Language testing and validation: an evidence-based approach. MacMillan Palgrave.
Wollack, J. A., & Case, S. M. (2016). Maintaining fairness through test administration. In N. J. Dorans & L. Cook (Eds.), Fairness in educational assessment and measurement (pp. 33–53). Routledge.
Zhaleh,K. , Estaji,M. and Chory,R. M. (2025). Justice and Fairness are not the Same Construct: Evidence from Revalidating the Teacher Classroom Justice Scale on University EFL Students in Iran. Teaching English Language, 19(1), 41-80.
           https://doi.org/ 10.22132/tel.2025.473990.1675

Articles in Press, Accepted Manuscript
Available Online from 29 June 2026

  • Receive Date 03 October 2025
  • Revise Date 09 February 2026
  • Accept Date 13 February 2026