Teaching English Language

Teaching English Language

آموزش ارزیاب با کمک سامانه‌های رهگیری چشم: مطالعه موردی یک ارزیاب مبتدی

نوع مقاله : مقاله پژوهشی

10.22132/tel.2025.472698.1668
چکیده
پژوهش حاضر بر مفهوم آموزش ارزیاب با کمک سامانه‌های رهگیری چشم متمرکز بوده است. یک ارزیاب مبتدی در برنامه‌ای آموزشی شرکت کرد که بر اساس رهگیری حرکات چشمان او طراحی شده بود. بلافاصله پس از ارزیابی یک نمونه انشا در هر جلسه، بازخوردی از رهگیری چشم به‌صورت نقشه حرارتی بر اساس حرکات چشم او ارائه می‌شد. این نقشه حرارتی مورد بحث قرار می‌گرفت تا به ارزیاب کمک کند رفتار خود را هنگام ارزیابی درک کند و مشخص شود کدام توصیفگرهای جدول معیارها و بخش‌های انشا را بیشتر مورد توجه قرار داده است. یافته‌ها نشان داد که در جلسات اولیه، ارزیاب تحت تأثیر اثر تقدم بود؛ یعنی عمدتاً بر دو معیار اول (محتوا و سازماندهی) تمرکز داشت. افزون بر این، در ابتدا در تصمیم‌گیری درباره محدوده نمره با مشکل مواجه بود و به‌جای توصیفگرها، توجه زیادی به محدوده نمره دهی می‌کرد. با این حال، پس از چند جلسه آموزش، رفتار خود را تعدیل کرد و سعی نمود بر تمامی معیارها و توصیفگرهای معادل تمرکز کند. یافته‌های این تحقیق می‌تواند به مدرسان آموزش ارزیاب در سازماندهی مؤثرتر برنامه‌های ارزیابی با استفاده از سامانه‌های رهگیری چشم برای بررسی رفتار ارزیابان کمک کند.
کلیدواژه‌ها

Ary, D., Jacobs, L. C., Irvine, C. K. S., & Walker, D. (2018). Introduction to research in education (10th Ed.). Cengage Learning.
Ashraf, H., Sodergren, M. H., Merali, N., Mylonas, G., Singh, H., & Darzi, A. (2018). Eye-tracking technology in medical education: A systematic review. Medical teacher40(1), 62-69.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University Press.
Ballard, L. (2017). The effects of primacy on rater cognition: An eye-tracking study. Michigan State University.
Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly7(1), 54-74.
Bejar, I. I., Williamson, D. M., & Mislevy, R. J. (2006). Human scoring. Automated scoring of complex tasks in computer-based testing, 49-82.
Chen, K. T., Prouzeau, A., Langmead, J., Whitelock-Jones, R. T., Lawrence, L., Dwyer, T., ... & Goodwin, S. (2023, May). Gazealytics: A Unified and Flexible Visual Toolkit for Exploratory and Comparative Gaze Analysis. In Proceedings of the 2023 Symposium on Eye Tracking Research and Applications (pp. 1-7). Preprint available at arXiv:2303.17202.
Conklin, K. & Pellicer-Sánchez, A. (2016). Using eye-tracking in applied linguistics and second language acquisition research. Second Language Research, 32(3), 453-467.
Cumming, A. (1990). Expertise in evaluating second-language compositions. Language Testing7(1), 31-51.
Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal86(1), 67-96.
DeRemer, M. (1998). Writing assessment: Raters’ elaboration of the rating task. Assessing Writing, 5, 7–29.
Deygers, B., & Van Gorp, K. (2015). Determining the scoring validity of a co-constructed CEFR-based rating scale. Language Testing32(4), 521-541.
Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgments of writing ability. ETS Research Bulletin Series1961(2), i-93.
Dogan, C. D., & Uluman, M. (2017). A Comparison of Rubrics and Graded Category Rating Scales with Various Methods Regarding Raters' Reliability. Educational sciences: Theory and practice17(2), 631-651.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly, 2(3), 197–221.
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang.
Eckstein, G., Casper, R., Chan, J., & Blackwell, L. (2018). Assessment of L2 student writing: Does teacher disciplinary background matter? Journal of Writing Research10(1), 1-23.
Elder, C., Knoch, U., Barkhuizen, G., & Von Randow, J. (2005). Individual feedback to enhance rater training: Does it work?. Language Assessment Quarterly: An International Journal2(3), 175-196.
Elder, C., Barkhuizen, G., Knoch, U., & Von Randow, J. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing24(1), 37-64.
Engelhard Jr, G. (2013). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. Routledge.
Erguvan, I. D., & DÜNYA, B. A. (2021). Gathering evidence on e-rubrics: Perspectives and many facet Rasch analysis of rating behavior. International Journal of Assessment Tools in Education8(2), 454-474.
Erlam, R., von Randow, J., & Read, J. (2013). Investigating an online rater training program: product and process. Papers in Language Testing and Assessment2(1), 1-29.
Godfroid, A. (2019). Investigating instructed second language acquisition using L2 learners’ eye-tracking data. In The Routledge handbook of second language research in classroom learning (pp. 44-57). Routledge.
Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think‐alouds and eye tracking: Absence of evidence is not evidence of absence. Language Learning65(4), 896-928.
Godfroid, A., Winke, P., & Conklin, K. (2020). Exploring the depths of second language processing with eye tracking: An introduction. Second Language Research36(3), 243-255.
Gyamfi, G., Hanna, B. E., & Khosravi, H. (2022). The effects of rubrics on evaluative judgement: a randomised controlled experiment. Assessment & Evaluation in Higher Education47(1), 126-143.
Hamp-Lyons, L. (2007). Worrying about rating. Assessing Writing1(12), 1-9.
Harsch, C., & Martin, G. (2012). Adapting CEF-descriptors for rating purposes: Validation by a combined rater training and scale revision approach. Assessing Writing17(4), 228-250.
Jacobs, H., Zinkgraf, S., Wormuth, D., Hartfiel, V., & Hughey, J. (1981). Testing ESL composition: A practical approach. Rowley. Newbury House.
Janssen, G., Meier, V., & Trace, J. (2015). Building a better rubric: Mixed methods rubric revision. Assessing writing26, 51-66.
Jin, K. Y., & Eckes, T. (2022). Detecting differential rater functioning in severity and centrality: The dual DRF facets model. Educational and Psychological Measurement82(4), 757-781.
Johnson, J. S., & Lim, G. S. (2009). The influence of rater language background on writing performance assessment. Language Testing26(4), 485-505.
King, A. J., Bol, N., Cummins, R. G., & John, K. K. (2019). Improving visual behavior research in communication science: An overview, review, and reporting recommendations for using eye-tracking methods. Communication Methods and Measures13(3), 149-177.
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing26(2), 275-304.
Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior—a longitudinal study. Language Testing28(2), 179-200.
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing writing12(1), 26-43.
Li, Y., Wei, C., & Ma, T. (2019). Towards explaining the regularization effect of initial large learning rate in training neural networks. Advances in Neural Information Processing Systems3(2), 1-49.
Linacre, J. M. (2004). Optimizing rating scale effectiveness. In E. V. Smith & R.M. Smith (Eds.), Introduction to Rasch measurement (pp. 257–578). JAM Press.
Low, A. R. L., & Aryadoust, V. (2021). Investigating test-taking strategies in listening assessment: A comparative study of eye-tracking and self-report questionnaires. International Journal of Listening, 35(1), 1-20.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing19(3), 246-276.
Lumley, T. (2005). Assessing second language writing: The rater’s perspective. P. Lang.
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language testing12(1), 54-71.
Luoma, S. (2004). Assessing speaking. Cambridge University Press.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement4(4), 386-422.
Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of applied measurement5(2), 189-227.
Rayner, K. (1978). Eye movements in reading and information processing. Psychological bulletin85(3), 618.
Rayner, K. (2009). Eye movements in reading: Models and data. Journal of eye movement research2(5), 1.
Saito, H. (2008). EFL classroom peer assessment: Training effects on rating and commenting. Language testing25(4), 553-581.
Saslow, J., & Ascher, A. (2015). Top notch (3rd ed.). Pearson Education.
Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing25(4), 465-493.
Shin, Y. S. (2009). A FACETS analysis of rater characteristics and rater bias in measuring L2 writing performance. English Language & Literature Teaching16(1), 123-142.
Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The effect of raters’ background and training on the reliability of direct writing tests. The Modern Language Journal76(1), 27-33.
Stewart, A. J., Pickering, M. J., & Sturt, P. (2004). Using eye movements during reading as an implicit measure of the acceptability of brand extensions. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition18(6), 697-709.
Suto, I. (2012). A critical review of some qualitative research methods used to explore rater cognition. Educational Measurement: Issues and Practice31(3), 21-30.
Vaughan. C. (1991). Holistic assessment: What goes on in the rater's mind? In L. Hamp-Lyons (Ed.) Assessing second language writing in academic contexts, 111-125.
Wang, J., & Engelhard Jr, G. (2019). Exploring the impersonal judgments and personal preferences of raters in rater-mediated assessments with unfolding models. Educational and Psychological Measurement79(4), 773-795.
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing11(2), 197-223.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing15(2), 263-287.
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing6(2), 145-178.
Weigle, S. C. (2002). Assessing writing. Cambridge University Press.
Wind, S. A. (2019a). A nonparametric procedure for exploring differences in rating quality across test-taker subgroups in rater-mediated writing assessments. Language Testing36(4), 595-616.
Wind, S. A. (2019b). Examining the impacts of rater effects in performance assessments. Applied Psychological Measurement43(2), 159-171.
Wind, S. A., & Peterson, M. E. (2018). A systematic review of methods for evaluating rating quality in language assessment. Language Testing35(2), 161-192.
Winke, P., & Brunfaut, T. (Eds.). (2021). The Routledge handbook of second language acquisition and language testing. Routledge.
Winke, P., & Lim, H. (2015). ESL essay raters’ cognitive processes in applying the Jacobs et al. rubric: An eye-movement study. Assessing Writing25, 38-54.
Wolfe, E. W. (1997). The relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing4(1), 83-106.
Yan, X. (2014). An examination of rater performance on a local oral English proficiency test: A mixed-methods approach. Language Testing, 31(4), 501–527.
Youn, S. J. (2018). Rater variability across examinees and rating criteria in paired speaking assessment. Papers in Language Testing and Assessment7(1), 32-60.
دوره 19، شماره 2
تیر 1404
صفحه 511-539

  • تاریخ دریافت 20 مرداد 1403
  • تاریخ بازنگری 28 مرداد 1404
  • تاریخ پذیرش 31 مرداد 1404