Google and DeepMind have developed an AI-powered chatbot tool called Med-PaLM designed to generate “safe and helpful responses” to questions posed by healthcare professionals and patients.
The tool is an example of a large language model, or LLM, which is designed to understand queries and generate plain language text responses from large and complex data sets—in this case, medical research.
LLMs made headlines last year with the launch of OpenAI ChatGPTa conversational AI trained with data pulled from the internet that aims to provide near-human interactions, impressing with its ability to answer questions on a wide range of topics and generate on-demand text content such as poems and essays.
It quickly surpassed 1 million users, though the numbers have likely been inflated by those trying to entice users. chatbot in making defamatory, inappropriate or taboo statements.
While ChatGPT is a showcase technology that operates at the consumer end of the LLM scale, Med-PaLM is designed to operate within more narrow parameters and has been trained on seven question response data sets spanning professional medical examinations. , research and consumer inquiries about doctors. affairs.
The researchers have published a paper in the LLM, suggesting that with refinement it could play a role in clinical applications.
Excited to share Med-PaLM, a great medical domain aligned language model for generating safe and useful answers.
Our work advances SOTA on 7 medical question answering tasks, including achieving 67% on the MedQA USMLE, improving on previous work by more than 17%. pic.twitter.com/B0rvtUEysV
— Shek Azizi (@AziziShekoofeh) December 27, 2022
Six of those datasets are already established (NedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU), but the teams at Google and DeepMind have developed their own, called HealthSearchQA, which was curated using questions about medical conditions and their symptoms. associates posted online. .
The researchers behind the project point to a number of potential applications, including knowledge retrieval, clinical decision support, summarizing key findings in studies, and classification of patients’ primary care concerns, but acknowledge that for now “It works encouragingly, but remains inferior to clinical ones.”
For example, incorrect information retrieval was observed in 16.9% of Med-PaLM responses, compared to less than 4% for human physicians, according to the article. There were similar disparities in incorrect reasoning (about 10% vs. 2%) and inappropriate or incorrect response content (18.7% vs. 1.4%).
More important than the results to date are techniques that can be used to improve LLM performance, such as using instruction prompt tuning, using interaction examples to produce responses that are more useful to users, according to the team.
Adjustment of instruction prompts has allowed Med-PaLM to outperform another LLM called Flan-PaLM, with a panel of physicians judging 62% of Flan-PaLM’s long responses to be accurate, compared to 93% of Med-PaLM.
“Our research provides insight into the opportunities and challenges of applying these technologies to medicine,” the researchers write.
“We hope this study sparks further conversations and collaborations among patients, consumers, AI researchers, clinicians, social scientists, ethicists, policymakers, and other stakeholders to responsibly translate these early research findings to improve healthcare.”