We’d like to introduce MUSE, a large language model (LLM) augmented with the ability to check facts in real-time and generate responses to claims made online. It was released earlier this year by Xinyi Zhou, in collaboration with Ashish Sharma, Amy X. Zhang, and Tim Althoff. Xinyi is a postdoc at the Paul G. Allen School of Computer Science & Engineering and a member of our ARTT team.
How MUSE works: A user inputs a bit of potential misinformation or fact to be checked. This can be text, or an image – MUSE can speak meme! Next, the system runs a real-time Internet search to interrogate the claim, and then generates a written response that explains why the information is or is not correct, providing citations to online resources to support the claim.
Why it’s important: We’re going to use MUSE as a key element of artificial intelligence in the next version of the ARTT software tool.
So, we wanted to share a little more about what MUSE is and how it works, with excerpts from Xinyi’s paper (with insights from Ashish Sharma, Amy X. Zhang, and Tim Althoff), Correcting misinformation on social media with a large language model.
Here are some highlights from the paper:
From their paper, Xinyi notes:
“Real-world misinformation, often multimodal, can be partially correct and factual but misleading by cherry-picking, conflating correlation with causation, and using other tactics. High-quality and timely correction of misinformation that identifies and explains its inaccuracies and accuracies has been shown to effectively reduce false beliefs. Despite the wide acceptance of manual correction, it is difficult to be timely and scalable, a concern as technologies like large language models (LLMs) make misinformation easier to produce. LLMs also have versatile capabilities that could accelerate misinformation correction—however, they struggle due to a lack of recent information, a tendency to produce false content, and limitations in addressing multimodal information.”
“By retrieving evidence as refutation or supporting context, MUSE identifies and explains (in)accuracies in a piece of content—not presupposed to be misinformation—with references. It conducts multimodal retrieval and interprets visual content to verify and correct multimodal content.
Given the absence of a systematic and comprehensive evaluation approach, we propose and define 13 dimensions of misinformation correction quality, ranging from the accuracy of identifications and factuality of explanations to the relevance and credibility of references. Then, fact-checking experts correspondingly evaluate responses to social media content that are not presupposed to be (non-)misinformation but broadly include incorrect, partially correct, and correct posts that may or may not be misleading.“
“The results demonstrate Muse’s ability to write high-quality responses to potential misinformation—across modalities, tactics, domains, political leanings, and for information that has not previously been fact-checked online—within minutes of its appearance on social media.
Overall, Muse outperforms GPT-4 by 37% and even high-quality responses from laypeople by 29%. Our work provides a general methodological and evaluative framework to correct misinformation at scale. It reveals LLMs’ potential to help combat real-world misinformation effectively and efficiently.”
The challenge for us at ARTT now is incorporating MUSE into the ARTT tool.
Our goal is to have this work finished early next year – let us know if you’d like to be an early tester!