Alaei, M. M., Ahmadi, M., & Zadeh, N. S. (2014). The impact of rater's personality traits on holistic and analytic scores: Does genre make any difference too? Procedia Social and Behavioral Sciences, 98, 1240–1248.
Ang-Aw, H., & Meng Goh, C. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RECL Journal, 42(1), 31-51.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford (England): New York: Oxford University Press.
Berk, R. A. (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.
Brown, A. & Hill, K. (1996). Interviewer style and candidate performance in the IELTS oral interview. IELTS Australia Reports Round 1.
Buckingham, A. (1997). Oral language testing: do the age, status and gender of the interlocutor make a difference? Unpublished MA dissertation, University of Reading.
Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21, 1-44.
Congdon, P. (2006). Bayesian Statistical Modeling. West Sussex, UK: John Wiley & Sons, Ltd.
Eckes, T. (2005). Examining rater effects in test of writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment, 2, 197-221.
Engelhard, G. Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. M. Haladyna (Eds.), Large scale assessments for all students: Validity, technical adequacy, and implementation (pp. 261-288). Mahwah, N. J.: Lawrence Erlbaum Associates.
Farhady, H. (1979). Test bias in language placement examinations. University of California, Los Angeles.
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.
Johnson, J., & Lim, G. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26, 485-505.
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167-178.
Khoshsima, H., & Roostami Abusaeidi, A.A. (2015). English and non-English major teachers' assessment of oral proficiency: a case of Iranian maritime English learners. Iranian Journal of English for Academic Purposes, 1(4), 26-36.
Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge: Cambridge University Press.
Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 15-72.
Locke, C. (1984). The influence of the interviewer on student performance in tests of foreign language oral/aural skills. Unpublished MA project, University of Reading.
Lynch, B. K & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of ESL speaking skills of immigrants. Sage publications.
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.
Lunz, M. E., Stahl, J. A., & Wright, B. D. (1991). The invariance of judge severity calibrations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.
McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14, 142–51.
McNamara, T. F., & Adams, R. J. (1991). Exploring rater behavior with Rasch techniques. Paper presented at the 13th annual Language Testing Research Colloquium, Princeton, NJ.
Morton, J., Wigglesworth, G. and Williams, D. (1997). Approaches to the evaluation of interviewer behaviour in oral tests. In Brindley, G. and Wigglesworth, G., editors, Access: Issues in Language Test Design and Delivery. Sydney: National Centre for English Language Teaching and Research, 175-96.
Moon, T. R., & Hughes, K. R. (2005). Training and scoring issues involved in large-scale writing assessments. Educational Measurement: Issues and Practice, 21(2), 15-19.
orr, m. (2005). the fce speaking Test: Using rater reports to help interpret test scores. System, 143-154.
O'Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System,28, 373–86.
O'Neill, T. R., & Lunz, M. E. (1996). Examining the invariance of rater and project calibrations using a Multi-facet Rasch Model. Paper presented at the Annual Meeting of the American Educational Research Association, New York.
Porter, D. (1991a). Affective factors in language testing. In Alderson, C.J. and North, B. (Eds.), Language Testing in the 1990s (pp. 32-40). London: Modern English Publications.
Porter, D. (1991b). Affective factors in the assessment of oral interaction: gender and status. In Arnivan, S. (Ed.), Current developments in language testing. Anthology series 25 (pp. 92-102). Singapore: SEAMEO Regional Language Centre.
Porter, D. & Shen Shu-Hung (1991). Sex, status and style in the interview. The Dolphin, 21, 117–28.
Reynolds, C. R. & R.T. Brown (1984). Perspectives on bias in mental testing. New York: Plenum Press.
Schellenberg, S. J. (2004, February). Test bias or cultural bias: Have we really learned anything. Annual meeting of the national council for measurement in education. San Diego, California.
Shawcross, P. (2007, March 12). What do we mean by the washback effect of testing? Retrieved September 2017 from: http://icao.int/icao/en/anb/meetings/ials2/Docs/15.Shawcross.pdf
Son, B. (2010). Examining rater bias: An evaluation of possible factors influencing elicited imitation ratings. (MA Thesis).
Sunderland, J. (1995). Gender and language testing. Language Testing Update, 17, 24–35.
Wang, B., (2010). On rater agreement and rater training. English Language Teaching, 3, 108-112.
Young, R., & Milanovic, M. (1992). Discourse validation in oral proficiency interviews. Studies in Second Language Acquisition, 14, 403-24.
Zhang, Y., & Elder (2011). Judgement of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28(1), 31-50.
Zhao, K. (2017, March). Investigating the effects of rater's second language learning background and familiarity with test-taker's first language on speaking test scores. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7256&context=etd