With the release of large language models such as GPT-3 Y Palm, the big techies have been experimenting with them for quite some time. Google recently also joined the party as a response to Open AI’s ChatGPT, called media quality controlbut specifically to answer medical questions.
Introducing MultiMedQA
While ChatGPT seems to be all over the place with no real use casesGoogle research and deep mind recently introduced MultiMedQAan open source large language model for medical Purposes Combines HealthSearchQA, a new free-response dataset of medical questions searched online, with six existing open-response datasets covering professional medical examinations, research, and consumer inquiries.
The model also incorporates a methodology for evaluating human model responses along several axes, including factuality, precision, potential harm, and bias.
MultiMedQA provides data sets for multiple-choice questions and for longer responses to questions posed by medical professionals and non-professionals. These comprise the clinical topic data sets for MedQA, MedMCQA, PubMedQA, LiveQA, MedicationQA, and MMLU. In addition, a new dataset of curated and frequently searched medical practices called HealthSearchQA added to improve MultiMedQA.
The HealthsearchQA dataset, consisting of 3,375 frequently asked consumer questions, was selected using initial medical diagnoses and their related symptoms. All users who entered the seed phrases were shown the publicly available FAQ which was retrieved using the seed data and created by a search engine.
palm to the rescue
The researchers developed this model in Palman LLM of 540 billion parameters and their variation adjusted by instructions Flan-PaLM to assess LLMs using MultiMedQA.
Flan-PaLM achieves SOTA performance on clinical topics from MedQA, MedMCQA, PubMedQA, and MMLU by combining self-consistency, chain-of-thought (CoT), and few-take prompting techniques, frequently outperforming many strong LLM baselines by a wide margin . FLAN-PaLM acts on 17% better on the MedQA dataset of USMLE questions than the previous SOTA. However, the human evaluation identifies significant gaps in the Flan-PaLM responses.
The resulting model that addresses this issue is Med-PaLM, which claims to perform well compared to Flan-PaLM, but still needs to pass the judgment of a human medical expert.
For example, a group of physicians found that 92.6% of Med-PaLM responses were on par with physician-generated responses (92.9%), while only 61.9% of responses Long-form Flan-PaLM were considered to be online. with scientific agreement. Furthermore, like Flan-PaLM, 5.8% of Med-PaLM responses were assessed as potentially contributing to negative consequences, compared to physician-generated responses (6.5%), while 29, 7% of the Flan-PaLM responses were.
Consult the complete document here.
Google Health Care Game
At the Google for India 2022 event, Google announced a collaboration with Apollo Hospitals in India to improve the use of deep learning models in X-rays and other diagnostic purposes. Google’s other health associations include Aravind Eye Care System, Ascension, Mayo Clinic, Rajavithi Hospital, Northwestern Medicine, Sankara Nethralaya, and Stanford Medicine, among others.
Google isn’t the first tech giant to venture into the AI-powered healthcare solution. Microsoft is also working closely with the OpenAI team to use GPT-3 to facilitate collaboration between employees and physicians and improve the efficiency of healthcare teams.
In November 2022, Meta AI also introduced galactic, the AI-generated program that claimed it would support academic researchers by generating comprehensive literature reviews and Wiki entries on any topic; however, it failed due to unreliable results.
Around the same time, Meta AI released PICA fusing natural language processing and strategic reasoning. He is the first AI agent to act on a human level in the complex natural language play, Diplomacy. Playing against humans on the website, the AI agent showcased this SOTA performance by beating the average scores of all other players by more than two to one. Furthermore, he was in the top 10% of players who participated in multiple games.