The Impact of Rater Bias on the Language Performance Assessment Scores of Iranian Foreign Language Teacher Candidates


English Department, Faculty of Paramedical Sciences, Shiraz University of Medical Sciences


Utilizing the scores obtained from a teacher entrance test used in an English language institute as a means of selection, the researchers selected 100 out of 121 female teacher candidates to participate in this study. Furthermore, a reading, writing, and listening test was administered to the candidates to exclude those candidates with low and high proficiency. Based on the results obtained from the tests, the number of participants decreased to 30 and they were requested to come in for oral interviews; they were those who were interviewed twice by two different groups of male and female raters. The results analyzed through correlational analysis and descriptive statistics indicated that the interview scores of the teacher candidates, as measured by the first group of female raters () was in high correlation with those in class performance (p ). However, there was no correlation both between the interview scores of the female teacher candidates (), assessed by the male raters and their in class performance (p ) and the () - () pair. It can be concluded that rater bias might have had an effect on the Iranian female teacher candidate's test scores. The subjects were also divided into attractive and unattractive groups and further assessed by the fourth and fifth group of female and male raters to indicate whether female sex appeal affects the test scores or not. The results showed that the mean differences between the AF-AM (attractive female-attractive male) pairs were significant although the mean differences between the NAF-NAM (nonattractive female-non-attractive male) (P=.05, t=1.131) were not significant.


Alaei, M. M., Ahmadi, M., & Zadeh, N. S. (2014). The impact of rater's personality traits on holistic and analytic scores: Does genre make any difference too? Procedia Social and Behavioral Sciences, 98, 1240–1248.##
Ang-Aw, H., & Meng Goh, C. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RECL Journal, 42(1), 31-51.##
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford (England): New York: Oxford University Press.##
Berk, R. A. (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.##
Brown, A. & Hill, K. (1996). Interviewer style and candidate performance in the IELTS oral interview. IELTS Australia Reports Round 1.##
Buckingham, A. (1997). Oral language testing: do the age, status and gender of the interlocutor make a difference? Unpublished MA dissertation, University of Reading.##
Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21, 1-44.##
Congdon, P. (2006). Bayesian Statistical Modeling. West Sussex, UK: John Wiley & Sons, Ltd.##
Eckes, T. (2005). Examining rater effects in test of writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment, 2, 197-221.##
Engelhard, G. Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. M. Haladyna (Eds.), Large scale assessments for all students: Validity, technical adequacy, and implementation (pp. 261-288). Mahwah, N. J.: Lawrence Erlbaum Associates.##
Farhady, H. (1979). Test bias in language placement examinations. University of California, Los Angeles.##
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.##
Johnson, J., & Lim, G. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26, 485-505.##
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167-178.##
Khoshsima, H., & Roostami Abusaeidi, A.A. (2015). English and non-English major teachers' assessment of oral proficiency: a case of Iranian maritime English learners. Iranian Journal of English for Academic Purposes, 1(4), 26-36.##
Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge: Cambridge University Press.##
Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 15-72.##
Locke, C. (1984). The influence of the interviewer on student performance in tests of foreign language oral/aural skills. Unpublished MA project, University of Reading.##
Lynch, B. K & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of ESL speaking skills of immigrants. Sage publications.##
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.##
Lunz, M. E., Stahl, J. A., & Wright, B. D. (1991). The invariance of judge severity calibrations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.##
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.##
McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14, 142–51.##
McNamara, T. F., & Adams, R. J. (1991). Exploring rater behavior with Rasch techniques. Paper presented at the 13th annual Language Testing Research Colloquium, Princeton, NJ.##
Morton, J., Wigglesworth, G. and Williams, D. (1997). Approaches to the evaluation of interviewer behaviour in oral tests. In Brindley, G. and Wigglesworth, G., editors, Access: Issues in Language Test Design and Delivery. Sydney: National Centre for English Language Teaching and Research, 175-96.##
Moon, T. R., & Hughes, K. R. (2005). Training and scoring issues involved in large-scale writing assessments. Educational Measurement: Issues and Practice, 21(2), 15-19.##
orr, m. (2005). the fce speaking Test: Using rater reports to help interpret test scores. System, 143-154.##
O'Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System,28, 373–86.##
O'Neill, T. R., & Lunz, M. E. (1996). Examining the invariance of rater and project calibrations using a Multi-facet Rasch Model. Paper presented at the Annual Meeting of the American Educational Research Association, New York.##
Porter, D. (1991a). Affective factors in language testing. In Alderson, C.J. and North, B. (Eds.), Language Testing in the 1990s (pp. 32-40). London: Modern English Publications.##
Porter, D. (1991b). Affective factors in the assessment of oral interaction: gender and status. In Arnivan, S. (Ed.), Current developments in language testing. Anthology series 25 (pp. 92-102). Singapore: SEAMEO Regional Language Centre.##
Porter, D. & Shen Shu-Hung (1991). Sex, status and style in the interview. The Dolphin, 21, 117–28.##
Reynolds, C. R. & R.T. Brown (1984). Perspectives on bias in mental testing. New York: Plenum Press.##
Schellenberg, S. J. (2004, February). Test bias or cultural bias: Have we really learned anything. Annual meeting of the national council for measurement in education. San Diego, California.##
Shawcross, P. (2007, March 12). What do we mean by the washback effect of testing?  Retrieved  September 2017 from:
Son, B.  (2010). Examining rater bias: An evaluation of possible factors influencing elicited imitation ratings. (MA Thesis).##
Sunderland, J. (1995). Gender and language testing. Language Testing Update, 17, 24–35.##
Wang, B., (2010). On rater agreement and rater training. English Language Teaching, 3, 108-112.##
Young, R., & Milanovic, M. (1992). Discourse validation in oral proficiency interviews. Studies in Second Language Acquisition, 14, 403-24.##
Zhang, Y., & Elder (2011). Judgement of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28(1), 31-50.##
Zhao, K. (2017, March). Investigating the effects of rater's second language learning background and familiarity with test-taker's first language on speaking test scores. Retrieved from