Improving inter-rater reliability in culturally diverse research teams

How does a culturally diverse group of raters achieve inter-rater reliability (IRR)? How do our individual value systems that might stem from our cultural backgrounds impact the research process and establishment of IRR?

3 min readApr 15, 2019

Message board at the 2019 AERA Annual Meeting in Toronto, Canada representing thousands of participants from all corners of the world

These were the questions that drove the autoethnographic study of a culturally diverse research team of the College Education Quality (CEQ) led by Dr. Corbin Campbell. The findings of this study authored by Dr. Campbell and I were presented at the 2019 American Educational Research Association Annual Meeting in Toronto, Canada. In this conference report, I will shed light on some of the underlying assumptions behind this study and key findings that contribute to discussions on establishing IRR.

We set out with a plan to improve the inter-rater reliability among the members of our research team. In doing so we wanted to understand: (1) whether our group of raters represented different value orientations, and if yes, (2) how raters with different value orientations perceived the value-laden items of the observational rubric. And finally, (3) how better understanding of the raters’ individual value systems and their perceptions of the observational rubric can help us improve the IRR during and after the frame of reference (FOR) training provided to the research team prior to the observational site visit.

According to the Theory of Basic Values developed by Shalom H. Schwartz, there are ten types of basic values that are universally observed across most cultures as distinct value orientations. Knowing this, we tested whether our group of raters represented distinctly varying individual value systems. To do this, we administered online Schwartz’s Values Survey (SVS) among group members and found differing value orientations that range between personally-focused and socially-focused individual value systems.

Group photo: CEQ Research Team during a site visit at a medium-size liberal arts college in the South

After establishing that there are indeed differences in the ways our raters relate to the world through values, we started interviewing raters to learn about how they perceive the value-laden items on the observational rubric within the classroom climate, culturally relevant teaching, and prior knowledge constructs. The interview data informed our understanding of the raters’ perceptions of the conceptual constructs and the ways they perceive these constructs differently stemming from the differing value orientations.

Based on the results of the SVS, we also created individual values profiles for each rater in the group. We used these profiles to understand how raters with somewhat similar value orientations will rate similarly and vice versa raters with somewhat opposing value orientations will rate dissimilarly. As a result of the analysis of rater discrepancy, we find that after the FOR training, possibly due to better understanding of the conceptual constructs, IRR improves across three time points. The findings of this mixed methods study contributes to our understanding of common response bias that stems from the systematic difference among raters. To get more information about our study, please refer to the presentation slides Dr. Campbell and I presented at the 2019 AERA Annual Meeting below.

The slides from our presentation at the AERA 2019 on April 7, 2019

Improving inter-rater reliability in culturally diverse research teams

How does a culturally diverse group of raters achieve inter-rater reliability (IRR)? How do our individual value systems that might stem from our cultural backgrounds impact the research process and establishment of IRR?

Written by Abbas Abbasov, PhD

No responses yet