We’ve all heard the headlines: Google Creates AI that Detects Lung Cancer Better than Doctors1, DeepMind Predicts Kidney Disease In Advance2, AI Trained to Spot Brain Tumours Faster than Humans3, and most eerily, How do Doctors Feel About AI Taking Their Jobs?4. It is easy to be wowed by the headline in the moment and later dismiss it as a mere proof of concept, rather than an implementable clinical tool. However, as of July 2019, as many as 26 different artificial intelligence (AI) tools have been approved by the FDA for clinical use5. Many applications are aimed at augmenting the workflow of radiologists, such as QuantX, which uses AI to identify breast abnormalities and Aidoc, which can spot brain bleeds from head imaging5. Even the Apple Watch Series 4 has an FDA-approved algorithm that can detect abnormal heart rhythms5. So, it is no longer a question of when AI will become a mainstay in medicine; it already has. The question now is how we, as future physicians, will respond.
Let’s first address the elephant in the room. No, AI will not replace physicians, even the radiologists and pathologists that these AI tools seem to target. Even the companies that make healthcare-focused AI solutions, such as PeriGen, do not believe doctors and nurses will be out of jobs anytime soon6. Rather, AI will be another tool at the physician’s disposal, facilitating a more efficient workflow, improved diagnostic capabilities, and ultimately better patient care. AI applications that can identify abnormalities and predict disease will function as the ‘consultant’s consultant.’
I envision the radiologist or pathologist getting notified that a scan needs to be examined, recording the patient’s diagnosis, and then running the AI program on the same scan, eventually comparing their own diagnosis with that of the software. If the AI and the physician agree, they move on. If not, upon further examination, the physician may find that the AI caught something they missed or that the AI made a mistake. Either way, it’s a win-win for the patient and the provider; a correct diagnosis is more likely and the additional cost of that second “consultation” is practically negligible.
Such a dynamic is even more encouraging when we learn that in the Northern Hemisphere, there are 25 magnetic resonance imaging (MRI) scans done per million people, but in a region like sub-Saharan Africa, there are 25 million people per MRI7. Leveraging AI solutions can provide care to historically underserved and under-resourced populations, both in the U.S. and abroad. After all, when certain populations have virtually zero access to healthcare, be it due to geographic isolation or lack of trained providers, a widely-distributed AI algorithm could help bridge that gap.
Still, AI is far from perfect. Besides lacking two important qualities of physicians– empathy and the ability to make complex decisions– intelligent medical software can be deceitfully problematic. AI systems are developed by fine-tuning decision-making processes that try to “learn” patterns between medical data and their correct diagnoses. For example, some patient data, such as images of the brain, will be provided to the AI along with some “ground truth” classifications made by real physicians, effectively telling the AI the correct location of a brain tumor on the scan. Thousands of these datapoints are provided, and by “seeing” certain relationships over and over again, the AI can learn on its own what a brain tumor looks like on an image or what sequence of heart rate measurements constitutes an abnormal rhythm.
As a result, any AI is only as good as the data it is provided. Biases and prejudices can easily make their way into AI systems because the developers can unknowingly inject their own biases into the training data. In the legal realm, ProPublica reported that a criminal justice algorithm incorrectly labeled black defendants as future offenders at twice the rate of white defendants8. This most likely occurred because while the algorithm never explicitly had data about defendants’ race, it was trained using questions like, “Was one of your parents ever sent to jail or prison?” and “How many of your friends/acquaintances are taking drugs illegally?” 8. Since African-Americans have been systematically discriminated against in the criminal justice system, such questions inadvertently cause the AI to learn a pattern between race and potential to commit a future crime, which is dangerous. Even natural language processors, which leverage AI algorithms to process human language, have been guilty of perpetuating gender stereotypes9. Unfortunately, these prejudices make their way into the healthcare system as well.
It is well documented how implicit biases on the part of providers negatively impacts patient care. For example, implicit racial biases caused oncologists to spend less time with cancer patients who were black and as a result, decreased how likely those patients were to take their medications10. Such biases have also plagued AI systems in the clinical setting. A 2019 UC-Berkeley study found that the UnitedHealth AI system used to manage 70 million patients prioritized the treatment of white patients over black patients for conditions such as kidney problems or diabetes11. Again, race was not explicitly factored in; the AI prioritized treating those patients predicted to have lower treatment costs. However, due to health disparities, black patients have historically cost healthcare institutions more than their white counterparts and were thus deemed lower priority11. When such biases in AI go undetected, patients from minority communities, who already suffer injustices in the healthcare system, are further discriminated against by so-called advancements in healthcare technology.
We have talked a lot about racial and gender biases in healthcare data, but we also have to acknowledge human error in this data. In my own research, we have found that the “correctness” of publicly available data has huge impacts on the quality of an AI algorithm that segments the prostate on MRI scans12. When a few physicians or scientists have to diagnose thousands of medical images, there are bound to be mistakes made. Maybe the prostate is missed on a few images or multiple visible brain tumors are not correctly annotated. Maybe the data is corrupted during transit.
As a result, computer scientists, who often lack the necessary clinical expertise to notice such errors, treat the training data as infallible. So, if they create substandard AI algorithms and physicians are unimpressed, these computer scientists will be left scratching their heads, unaware of what happened. Or worse, if physicians also treat these classifiers as perfect, then an important diagnosis may be missed or a crucial treatment may be denied. This begs the question: when computer scientists claim that a certain intelligent system is more accurate than physicians, how do they define “accurate”? Is the gold standard defined by third-party physicians or by another AI? As my research colleague Nick says, “How do we know if that gold standard is in fact, gold,” and not bronze or copper?
So, while there have been numerous debates about the ethics of self-driving cars and whether the world would be better if we took human drivers out of the equation, we should be having the same discussions about AI in medicine. Even if the AI is objectively more accurate at reading MRIs or detecting arrhythmias, I personally have strong reservations about a machine being the end-all-be-all for the care of my family or myself. At the same time, I encourage the integration of AI into the clinic as a safety net, if not more.
As future physicians, we must be painfully aware and vigilant about the many shortcomings of AI in the clinical setting. As AI becomes more prevalent in our lives as medical students, residents, and attendings, it is up to us to be open-minded about its potential but wary of the possible harms it presents to patient care, especially for minority populations. In a time when racial injustices have been catapulted to the forefront of the American psyche, it is more important than ever for us to become part of the conversation. We must bridge the gap between the technical and the clinical realms by constantly questioning these tools which are supposed to improve patient outcomes. After all, AI is here to stay. We might as well make the best of it.
1. Carfagno, J., 2019. Google Creates AI That Detects Lung Cancer Better Than Doctors – Docwire News. [online] Docwire News. Available at: https://www.docwirenews.com/docwire-pick/google-makes-ai-that-outperforms-doctors-in-detecting-lung-cancer/ [Accessed 23 June 2020].
2. Sagar, R., 2019. Here’s A Look At All The Recent Advancements In Medical AI. [online] Analytics India Magazine. Available at: https://analyticsindiamag.com/all-the-recent-advancements-in-medical-ai/#:~:text=AI%20has%20the%20potential%20to,work%20more%20efficiently%20and%20effectively. [Accessed 23 June 2020].
3. Lipscombe-Southwell, A., 2020. AI Trained To Spot Brain Tumours Faster Than Humans. [online] BBC Science Focus Magazine. Available at: https://www.sciencefocus.com/news/ai-trained-to-spot-brain-tumours-faster-than-humans/ [Accessed 23 June 2020].
4. Carfagno, J., 2019. How Do Doctors Feel About AI Taking Their Jobs? – Docwire News. [online] Docwire News. Available at: https://www.docwirenews.com/docwire-pick/future-of-medicine-picks/how-do-doctors-feel-about-ai-potentially-taking-jobs [Accessed 23 June 2020].
5. Carfagno, J., 2019. 5 FDA Approved Uses Of AI In Healthcare. [online] Docwire News. Available at: https://www.docwirenews.com/docwire-pick/future-of-medicine-picks/fda-approved-uses-of-ai-in-healthcare/ [Accessed 23 June 2020].
6. Lagasse, J., 2018. Why Artificial Intelligence Won’t Replace Doctors. [online] Healthcare Finance News. Available at: https://www.healthcarefinancenews.com/news/why-artificial-intelligence-wont-replace-doctors#:~:text=While%20AI%20has%20already%20shown,and%2Dblood%20clinicians%20anytime%20soon.&text=Artificial%20intelligence%20is%20coming%20to%20healthcare.&text=A%20recent%20Accenture%20report%20estimated,hit%20%246.6%20billion%20by%202021. [Accessed 23 June 2020].
7. Dechambenoit, G., 2016. Access to health care in sub-Saharan Africa. Surgical Neurology International, [online] 7(1), p.108. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223397/ [Accessed 23 June 2020].
8. Angwin, J., Larson, J., Mattu, S. and Kirchner, L., 2020. Machine Bias. [online] ProPublica. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [Accessed 23 June 2020].
9. Packer, B., Halpern, Y., Guajardo-Céspedes, M. and Mitchell, M., 2018. Text Embedding Models Contain Bias. Here’s Why That Matters. [online] Google Developers Blog. Available at: https://developers.googleblog.com/2018/04/text-embedding-models-contain-bias.html [Accessed 23 June 2020].
10. Bendix, J., 2020. How Implicit Bias Harms Patient Care. [online] Medical Economics. Available at: https://www.medicaleconomics.com/view/how-implicit-bias-harms-patient-care [Accessed 23 June 2020].
11. Jee, C., 2019. A Biased Medical Algorithm Favored White People For Health-Care Programs. [online] MIT Technology Review. Available at: https://www.technologyreview.com/2019/10/25/132184/a-biased-medical-algorithm-favored-white-people-for-healthcare-programs/ [Accessed 23 June 2020].
12. Ahluwalia, V.S., Prabhudesai, S.B., Wang, N.C., Denton, B.T., Curci, N., Salami, S.S., Rao, A., 2020. The Effects of Medical Imaging Data Quality on Deep Learning Classifiers for Prostate Segmentation.
Vinny Ahluwalia is an MS1 at the Perelman School of Medicine.