While artificial intelligence (AI) offers possibilities of advanced clinical prediction and decision-making in healthcare, models trained on relatively homogeneous datasets, and populations poorly-representative of underlying diversity, limits generalisability and risks biased AI-based decisions. Here, we describe the landscape of AI in clinical medicine to delineate population and data-source disparities.
We performed a scoping review of clinical papers published in PubMed in 2019 using AI techniques. We assessed differences in dataset country source, clinical specialty, and author nationality, sex, and expertise. A manually tagged subsample of PubMed articles was used to train a model, leveraging transfer-learning techniques (building upon an existing BioBERT model) to predict eligibility for inclusion (original, human, clinical AI literature). Of all eligible articles, database country source and clinical specialty were manually labelled. A BioBERT-based model predicted first/last author expertise. Author nationality was determined using corresponding affiliated institution information using Entrez Direct. And first/last author sex was evaluated using the Gendarize.io API.
Our search yielded 30,576 articles, of which 7,314 (23.9%) were eligible for further analysis. Most databases came from the US (40.8%) and China (13.7%). Radiology was the most represented clinical specialty (40.4%), followed by pathology (9.1%). Authors were primarily from either China (24.0%) or the US (18.4%). First and last authors were predominately data experts (i.e., statisticians) (59.6% and 53.9% respectively) rather than clinicians. And the majority of first/last authors were male (74.1%).
U.S. and Chinese datasets and authors were disproportionately overrepresented in clinical AI, and almost all of the top 10 databases and author nationalities were from high income countries (HICs). AI techniques were most commonly employed for image-rich specialties, and authors were predominantly male, with non-clinical backgrounds. Development of technological infrastructure in data-poor regions, and diligence in external validation and model re-calibration prior to clinical implementation in the short-term, are crucial in ensuring clinical AI is meaningful for broader populations, and to avoid perpetuating global health inequity.
Artificial Intelligence (AI) creates opportunities for accurate, objective and immediate decision support in healthcare with little expert input–especially valuable in resource-poor settings where there is shortage of specialist care. Given that AI poorly generalises to cohorts outside those whose data was used to train and validate the algorithms, populations in data-rich regions stand to benefit substantially more vs data-poor regions, entrenching existing healthcare disparities. Here, we show that more than half of the datasets used for clinical AI originate from either the US or China. In addition, the U.S. and China contribute over 40% of the authors of the publications. While the models may perform on-par/better than clinician decision-making in the well-represented regions, benefits elsewhere are not guaranteed. Further, we show discrepancies in gender and specialty representation–notably that almost three-quarters of the coveted first/senior authorship positions were held by men, and radiology accounted for 40% of all clinical AI manuscripts. We emphasize that building equitable sociodemographic representation in data repositories, in author nationality, gender and expertise, and in clinical specialties is crucial in ameliorating health inequities.
See how this article has been cited at scite.ai
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.