Visualizing Inter-rater Reliability

Background on Reporting Inter-rater Reliability Qualitative studies often report inter-reliability (IRR) scores as a measure of the trustworthiness of coding, or an assurance to readers that they might follow the researcher’s codebook and expect to find similar results. How these scores get reported varies widely. Often, I see just the range of scores reported, hopefully with Cohen’s kappa calculated in addition to the more straightforward percent agreement. The kappa is important because it takes into account variance in the frequency of a code.