Ultrasound (US) risk-stratification systems for investigation of thyroid nodules may not be as useful as anticipated.
We aimed to assess the performance and costs of the American College of Radiology Thyroid Image Reporting And Data System (ACR-TIRADS).
We examined the data set upon which ACR-TIRADS was developed, and applied TR1 or TR2 as a rule-out test, TR5 as a rule-in test, or applied ACR-TIRADS across all nodule categories. We assessed a hypothetical clinical comparator where 1 in 10 nodules are randomly selected for fine needle aspiration (FNA), assuming a pretest probability of clinically important thyroid cancer of 5%.
The gender bias (92% female) and cancer prevalence (10%) of the data set suggests it may not accurately reflect the intended test population. Applying ACR-TIRADS across all nodule categories did not perform well, with sensitivity and specificity between 60% and 80% and overall accuracy worse than random selection (65% vs 85%). Test performance in the TR3 and TR4 categories had an accuracy of less than 60%. Using TR5 as a rule-in test was similar to random selection (specificity 89% vs 90%). Using TR1 and TR2 as a rule-out test had excellent sensitivity (97%), but for every additional person that ACR-TIRADS correctly reassures, this requires >100 ultrasound scans, resulting in 6 unnecessary operations and significant financial cost.
Perhaps surprisingly, the performance ACR-TIRADS may often be no better than random selection. The management guidelines may be difficult to justify from a cost/benefit perspective. A prospective validation study that determines the true performance of TIRADS in the real-world is needed.