Alaei, M. M., Ahmadi, M., & Zadeh, N. S. (2014). The impact of rater's personality traits on holistic and analytic scores: Does genre make any difference too? Procedia Social and Behavioral Sciences, 98, 1240–1248.##
Ang-Aw, H., & Meng Goh, C. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RECL Journal, 42(1), 31-51.##
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford (England): New York: Oxford University Press.##
Berk, R. A. (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.##
Brown, A. & Hill, K. (1996). Interviewer style and candidate performance in the IELTS oral interview. IELTS Australia Reports Round 1.##
Buckingham, A. (1997). Oral language testing: do the age, status and gender of the interlocutor make a difference? Unpublished MA dissertation, University of Reading.##
Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21, 1-44.##
Congdon, P. (2006). Bayesian Statistical Modeling. West Sussex, UK: John Wiley & Sons, Ltd.##
Eckes, T. (2005). Examining rater effects in test of writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment, 2, 197-221.##
Engelhard, G. Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. M. Haladyna (Eds.), Large scale assessments for all students: Validity, technical adequacy, and implementation (pp. 261-288). Mahwah, N. J.: Lawrence Erlbaum Associates.##
Farhady, H. (1979). Test bias in language placement examinations. University of California, Los Angeles.##
Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.##
Johnson, J., & Lim, G. (2009). The influence of rater language background on writing performance assessment. Language Testing, 26, 485-505.##
Karami, H. (2011). Detecting gender bias in a language proficiency test. International Journal of Language Studies, 5(2), 167-178.##
Khoshsima, H., & Roostami Abusaeidi, A.A. (2015). English and non-English major teachers' assessment of oral proficiency: a case of Iranian maritime English learners. Iranian Journal of English for Academic Purposes, 1(4), 26-36.##
Lynch, B. K. (1996). Language program evaluation: Theory and practice. Cambridge: Cambridge University Press.##
Lazaraton, A. (1996). Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing, 13, 15-72.##
Locke, C. (1984). The influence of the interviewer on student performance in tests of foreign language oral/aural skills. Unpublished MA project, University of Reading.##
Lynch, B. K & McNamara, T. F. (1998). Using G-theory and many-facet Rasch measurement in the development of performance assessments of ESL speaking skills of immigrants. Sage publications.##
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.##
Lunz, M. E., Stahl, J. A., & Wright, B. D. (1991). The invariance of judge severity calibrations. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.##
McNamara, T. F. (1996). Measuring second language performance. New York: Longman.##
McNamara, T. F., & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing, 14, 142–51.##
McNamara, T. F., & Adams, R. J. (1991). Exploring rater behavior with Rasch techniques. Paper presented at the 13th annual Language Testing Research Colloquium, Princeton, NJ.##
Morton, J., Wigglesworth, G. and Williams, D. (1997). Approaches to the evaluation of interviewer behaviour in oral tests. In Brindley, G. and Wigglesworth, G., editors, Access: Issues in Language Test Design and Delivery. Sydney: National Centre for English Language Teaching and Research, 175-96.##
Moon, T. R., & Hughes, K. R. (2005). Training and scoring issues involved in large-scale writing assessments. Educational Measurement: Issues and Practice, 21(2), 15-19.##
orr, m. (2005). the fce speaking Test: Using rater reports to help interpret test scores. System, 143-154.##
O'Sullivan, B. (2000). Exploring gender and oral proficiency interview performance. System,28, 373–86.##
O'Neill, T. R., & Lunz, M. E. (1996). Examining the invariance of rater and project calibrations using a Multi-facet Rasch Model. Paper presented at the Annual Meeting of the American Educational Research Association, New York.##
Porter, D. (1991a). Affective factors in language testing. In Alderson, C.J. and North, B. (Eds.), Language Testing in the 1990s (pp. 32-40). London: Modern English Publications.##
Porter, D. (1991b). Affective factors in the assessment of oral interaction: gender and status. In Arnivan, S. (Ed.), Current developments in language testing. Anthology series 25 (pp. 92-102). Singapore: SEAMEO Regional Language Centre.##
Porter, D. & Shen Shu-Hung (1991). Sex, status and style in the interview. The Dolphin, 21, 117–28.##
Reynolds, C. R. & R.T. Brown (1984). Perspectives on bias in mental testing. New York: Plenum Press.##
Schellenberg, S. J. (2004, February). Test bias or cultural bias: Have we really learned anything. Annual meeting of the national council for measurement in education. San Diego, California.##
Shawcross, P. (2007, March 12). What do we mean by the washback effect of testing? Retrieved September 2017 from: http://icao.int/icao/en/anb/meetings/ials2/Docs/15.Shawcross.pdf##
Son, B. (2010). Examining rater bias: An evaluation of possible factors influencing elicited imitation ratings. (MA Thesis).##
Sunderland, J. (1995). Gender and language testing. Language Testing Update, 17, 24–35.##
Wang, B., (2010). On rater agreement and rater training. English Language Teaching, 3, 108-112.##
Young, R., & Milanovic, M. (1992). Discourse validation in oral proficiency interviews. Studies in Second Language Acquisition, 14, 403-24.##
Zhang, Y., & Elder (2011). Judgement of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28(1), 31-50.##
Zhao, K. (2017, March). Investigating the effects of rater's second language learning background and familiarity with test-taker's first language on speaking test scores. Retrieved from https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=7256&context=etd##