Comparison of variability in breast density assessment by BI-RADS category according to the level of experience


Comparison of variability in breast density assessment by BI-RADS category according to the level of experience

Breast imaging experience corresponds to greater agreement with automated density measurement

Visual assessment of breast density often depends on which radiologist performs the reading, and has been shown to vary widely. Such variability isn’t desirable—accurate assessment of a woman’s density is important for determining if mammography alone is likely to be her, as well as for estimating her future risk of breast cancer. While automated objective methods exist, the American College of Radiology’s Breast Imaging Reporting and Data System (BI-RADS) remains the most widely used method for estimating density. Hye-Joung Eom and colleagues from the Asan Medical Center in Seoul, South Korea, sought to find out how different levels of experience in breast imaging can affect reader’s agreement to an objective density score output by Volpara.

Their study involved comparisons of density measured from 1000 screening mammograms acquired from their center. Density was assigned by six readers according to the 5th edition of BI-RADS. A key aspect was the fact that the six readers varied vastly in their level of breast imaging experience. Two of them were breast imaging experts, each with over five years’ experience in reading mammograms. Two were general radiologists, who had less than five years of mammography experience. The last two were medical students with no experience in breast imaging; they were taught to perform BI-RADS readings on a set of 80 mammograms (with equal distribution of the four density categories in the training set). Each reader performed their assessment twice, with a two-month interval between the exams, to determine if their readings were internally consistent. Density was also measured using Volpara as a reference standard. Volpara measures volumetric density on a continuous scale and classifies it as one of four Volpara Density Grades (VDG; a-d), designed to correlate with ACR BI-RADS scores.

The readers had high agreement to their own readings (ranging from kappa 0.74 to 0.95) meaning their readings were generally consistent. However, the agreement between readers was highly dependent on experience level. The breast imaging experts and general radiologists had substantial agreement to each other (kappa 0.67). However, the agreement of the students to the other two groups was practically non-existent (kappa 0.02 and less), which suggests that their readings are likely to be inaccurate. When comparing each group's density assessment to Volpara, the breast imaging experts had substantial (very close to almost perfect) agreement to the software, with a kappa of 0.77. The agreement of general radiologists to Volpara was also substantial, albeit slightly lower (kappa 0.71), and the agreement of the students to Volpara was very low—which is not surprising, since the students did not agree with any of the groups with professional experience in breast imaging.

When considering how readers and Volpara assigned density on a four-category scale, there were significant differences between Volpara and all four groups (Table 1). However, these differences were least for the breast imaging experts—they had perfect agreement to Volpara on "a" and "b", but assigned 8.5% more women to the "c" category. The general radiologists assigned 9% more women to category “d” than did Volpara. Meanwhile, the students assigned considerably more "a" than did Volpara (almost 15% difference). However, the students’ preponderance to give such low assignments of density is suspect, as multiple previous studies indicate that East Asian women tend to have high breast density, with very few cases of VDG/BI-RADS a.

Table 1. Distribution of density assignment by readers and software.


Density category






Reader type

Breast-imaging expert





General radiologist















This study illustrates that assignment of density categories appears to be dependent on the experience level of the reader, and that readers with very low levels of experience may produce inaccurate estimates. Furthermore, the agreement in density assignment between readers and software increases with reader experience, with breast imaging professionals exhibiting substantial agreement to Volpara. This suggests that Volpara can be used as a substitute for a breast imaging expert for assessing density in practice—and may provide more accurate density readings if a practice does not have a highly-experienced radiologist available.

Featured image credit: Designed by Iaros / Freepik

Recent Posts