To compare the PanCan model, Lung-RADS and the 1.2016 National Comprehensive Cancer Network (NCCN) guidelines for discriminating malignant from benign pulmonary nodules on baseline screening CT scans and the impact diameter measurement methods have on performances. From the Danish Lung Cancer Screening Trial database, 64 CTs with malignant nodules and 549 baseline CTs with benign nodules were included. Performance of the systems was evaluated applying the system's original diameter definitions: D(longest-C) (PanCan), D(meanAxial) (NCCN), both obtained from axial sections, and D(mean3D) (Lung-RADS). Subsequently all diameter definitions were applied uniformly to all systems. Areas under the ROC curves (AUC) were used to evaluate risk discrimination. PanCan performed superiorly to Lung-RADS and NCCN (AUC 0.874 vs. 0.813, p?=?0.003; 0.874 vs. 0.836, p?=?0.010), using the original diameter specifications. When uniformly applying D(longest-C), D(mean3D) and D(meanAxial), PanCan remained superior to Lung-RADS (p?<?0.001 - p?=?0.001) and NCCN (p?<?0.001 - p?=?0.016). Diameter definition significantly influenced NCCN's performance with D(longest-C) being the worst (D(longest-C) vs. D(mean3D), p?=?0.005; D(longest-C) vs. D(meanAxial), p?=?0.016). Without follow-up information, the PanCan model performs significantly superiorly to Lung-RADS and the 1.2016 NCCN guidelines for discriminating benign from malignant nodules. The NCCN guidelines are most sensitive to nodule size definition. - PanCan model outperforms Lung-RADS and 1.2016 NCCN guidelines in identifying malignant pulmonary nodules. - Nodule size definition had no significant impact on Lung-RADS and PanCan model. - 1.2016 NCCN guidelines were significantly superior when using mean diameter to longest diameter. - Longest diameter achieved lowest performance for all models. - Mean diameter performed equivalently when derived from axial sections and from volumetry.