Observer Variability for Classification of Pulmonary Nodules on Low-Dose CT Images and Its Effect on Nodule Management
S.J. van Riel, C.I. Sanchez, A.A. Bankier, D.P. Naidich, J. Verschakelen, E.T. Scholten, P.A. de Jong, C. Jacobs, E. van Rikxoort, L. Peters-Bax, M. Snoeren, M. Prokop, B. van Ginneken and C. Schaefer-Prokop
Purpose To examine the factors that affect inter- and intraobserver agreement for pulmonary nodule type classification on low-radiation-dose computed tomographic (CT) images, and their potential effect on patient management. Materials and Methods Nodules (n = 160) were randomly selected from the Dutch-Belgian Lung Cancer Screening Trial cohort, with equal numbers of nodule types and similar sizes. Nodules were scored by eight radiologists by using morphologic categories proposed by the Fleischner Society guidelines for management of pulmonary nodules as solid, part solid with a solid component smaller than 5 mm, part solid with a solid component 5 mm or larger, or pure ground glass. Inter- and intraobserver agreement was analyzed by using Cohen AZAo statistics. Multivariate analysis of variance was performed to assess the effect of nodule characteristics and image quality on observer disagreement. Effect on nodule management was estimated by differentiating CT follow-up for ground-glass nodules, solid nodules 8 mm or smaller, and part-solid nodules smaller than 5 mm from immediate diagnostic work-up for solid nodules larger than 8 mm and part-solid nodules 5 mm or greater. Results Pair-wise inter- and intraobserver agreement was moderate (mean AZAo, 0.51 [95% confidence interval, 0.30, 0.68] and 0.57 [95% confidence interval, 0.47, 0.71]). Categorization as part-solid nodules and location in the upper lobe significantly reduced observer agreement (P = .012 and P < .001, respectively). By considering all possible reading pairs (28 possible combinations of observer pairs AfaEUR? 160 nodules = 4480 possible agreements or disagreements), a discordant nodule classification was found in 36.4% (1630 of 4480), related to presence or size of a solid component in 88.7% (1446 of 1630). Two-thirds of these discrepant readings (1061 of 1630) would have potentially resulted in different nodule management. Conclusion There is moderate inter- and intraobserver agreement for nodule classification by using current recommendations for low-radiation-dose CT examinations of the chest. Discrepancies in nodule categorization were mainly caused by disagreement on the size and presence of a solid component, which may lead to different management in the majority of cases with such discrepancies. (A,A(c)) RSNA, 2015.