By MICHAEL MILLENSON
If you ask ChatGPT how many procedures a particular surgeon performs or the infection rate of a particular hospital, the OpenAI and Microsoft chatbot inevitably responds with some variation of “I don’t.”
But depending on how you ask, Google’s Bard gives very different answers, even recommending “consultations” with specific doctors.
Bard told me how many knee replacement surgeries Chicago’s major hospitals performed in 2021, their infection rates, and the national average. It even told me which surgeon in Chicago does the most knee surgeries and his infection rate. When I asked about heart bypass surgery, Bard provided both local hospital death rates and the national average for comparison. While Bard identified himself as the source of the information, beginning his response with “to the best of my knowledge,” at other times he referred to well-known and respected organizations.
There was only one problem. As Google itself warns, “Sophisticated is experimental… so double-check the information in Spontaneous answers.” When I followed that advice, the truth began to mix inexplicably with the “truth”; comedian Stephen Colbert’s memorable term described information that is believed to be true not because of supporting facts, but because it “feels true.”
Take, for example, knee replacement surgery, also known as knee arthroplasty. It is one of the most common surgical procedures, with an estimated 1.4 million performed in 2022. When I asked Bard which surgeon performs the most knee replacements in Chicago, the answer was Dr. Richard A. Berger. Berger, who is affiliated with both Rush University Medical Center and Midwest Orthopedics, has performed more than 10,000 knee replacements, Bard informed me. In response to a follow-up question, Bard added that Berger’s infection rate is 0.5 percent, well below the national average of 1.2 percent. That low rate was attributed to factors such as “Dr. Berger’s expertise, use of minimally invasive techniques and meticulous attention to detail.’
With chatbots, every word in a query counts. When I changed the question a little and asked: “Which surgeon performs the most knee replacements in Chicago? areaThe Bard did not provide a single name again. Instead, it listed seven of the “most renowned surgeons,” including Berger, who are “all highly skilled and experienced,” “have a long history of success” and are “renowned for their compassionate care.”
As with ChatGPT, Bard’s responses to any medical-related questions include over-the-top cautions such as “no surgery is without risk.” However, the Bard still made it clear. “If you are considering knee replacement surgery, I would recommend scheduling a consultation with one of these. [seven] surgeons.”
ChatGPT avoids words like “recommend,” but it confidently assured me that the list it provided of four “top knee replacement surgeons” was based on “their expertise and patient outcomes.”
These endorsements, while a drastic departure from the search engine listings we’re used to, make more sense when you consider how “generative artificial intelligence” chatbots like ChatGPT and Bard are trained.
Bard and ChatGPT both rely on information from the Internet, where individual orthopedic surgeons are often highly rated. Features about Berger’s practice, for example, can be found on his website and numerous media profiles, including: Chicago Tribune a story that tells how athletes and celebrities from all over the country come to him for care. Unfortunately, it’s impossible to know how well chatbots reflect what surgeons say about themselves, as opposed to data from objective sources.
Courtney Kelly, Berger’s director of business development, confirmed the volume of “over 10,000” surgeries, while noting that the practice posted that number on its website several years ago. Kelly added that the practice has only released an overall complication rate of less than one percent, but he confirmed that about half of that figure represents infections.
While Berger’s infection data may be accurate, its cited source, The Joint Commission, is not. A spokesman for the Joint Commission, which examines hospitals for overall quality, said it does not collect infection rates for individual surgeons. Similarly, one of Berger’s colleagues at Midwest Orthopedics, who also said he had a 0.5 percent infection rate, a number Bard attributed to the Centers for Medicare & Medicaid Services (CMS). Not only was I unable to find CMS data on individual clinical infection rates or volume, the CMS Hospital Compare website only provides hospital infection rates for the combination of knee and hip surgeries.
In response to another question from Bard, it showed breast cancer death rates at some of Chicago’s largest hospitals, though careful to note that the numbers were only averages for the condition. But once again its attribution, this time to the American Hospital Association, didn’t stand up. The trade group said it does not collect that type of data.
Delving deeper into life-and-death procedures, I asked Bard about the death rate for heart valve surgery at local hospitals. The quick response was impressively sophisticated. Bard provided hospital risk-adjusted mortality rates for isolated aortic valve replacement and mitral valve replacement, along with the national average for each (2.9 percent and 3.3 percent, respectively). The figures were attributed to the Society of Thoracic Surgeons (STS), whose data is seen as the “gold standard” for this type of information.
For comparison purposes, I asked ChatGPT about the same national death rates. Like Bard, ChatGPT cited STS, but its mortality rate for an isolated aortic valve replacement procedure was much lower (1.6 percent), while the mitral valve mortality rate was about the same (2.7 percent).
Before dismissing Bard’s descriptions of the quality of care at individual hospitals and physicians as hopelessly flawed, consider the alternatives. Advertisements in which hospitals tout their clinical prowess may not quite qualify as “truths,” but they certainly choose carefully which truths to tell. At the same time, I know of no publicly available hospital or doctor that the providers don’t complain about, are unreliable, regardless of whether US News and World Report or the Leapfrog Group (which Bard and ChatGPT also cite) or the federal Medicare program.
(STS data is an asterisked exception because its performance information on individual clinics or groups is only available if affected physicians choose to release it.)
What Bard and ChatGPT provide is a powerful conversation starter that paves the way for doctors and patients to have a frank discussion about the safety and quality of care, and inevitably that discussion will expand to the wider social. Chatbots provide information that, as it improves, may finally drive public demand for consistent medical excellence, as I wrote 25 years ago in a book exploring the burgeoning information age.
I asked John Morrow, a veteran (human) data analyst and founder of Franklin Trust Ratings, how he would advise providers to respond.
“It’s time for the industry to standardize and disclose,” Morrow said. “Otherwise, things like ChatGPT and Bard are going to create panic and reduce trust.”
As an author, activist, consultant, and former Pulitzer-nominated journalist, Michael Millenson focuses professionally on making health care safer, better, and patient-centered.