24.5 C
London
Sunday, July 13, 2025
HomeNewsFinanceAs millions adopt Grok to fact-check, misinformation abounds | Elon Musk

As millions adopt Grok to fact-check, misinformation abounds | Elon Musk

Date:

Related stories

spot_imgspot_img


On June 9, soon after United States President Donald Trump dispatched US National Guard troops to Los Angeles to quell the protests taking place over immigration raids, California Governor Gavin Newsom posted two photographs on X. The images showed dozens of troopers wearing the National Guard uniform sleeping on the floor in a cramped space, with a caption that decried Trump for disrespecting the troops.

X users immediately turned to Grok, Elon Musk’s AI, which is integrated directly into X, to fact-check the veracity of the image. For that, they tagged @grok in a reply to the tweet in question, triggering an automatic response from the AI.

“You’re sharing fake photos,” one user posted, citing a screenshot of Grok’s response that claimed a reverse image search could not find the exact source. In another instance, Grok said the images were recycled from 2021, when former US President Joe Biden, a Democrat, withdrew troops from Afghanistan. Melissa O’Connor, a conspiracy-minded influencer, cited a ChatGPT analysis that also said the images were from the Afghanistan evacuation.

However, non-partisan fact-checking organisation PolitiFact found that both AI citations were incorrect. The images shared by Newsom were real, and had been published in the San Francisco Chronicle.

The bot-sourced erroneous fact checks formed the basis for hours of cacophonous debates on X, before Grok corrected itself.

Unlike OpenAI’s standalone app ChatGPT, Grok’s integration into X offers users immediate access to real-time AI answers without quitting the app, a feature that has been reshaping user behaviour since its March launch. However, the increasingly first stop for fact checks during breaking news or for other general posts often provides convincing but inaccurate answers.

“I think in some ways, it helps, and in some ways, it doesn’t,” said Theodora Skeadas, an AI policy expert formerly at Twitter. “People have more access to tools that can serve a fact-checking function, which is a good thing. However, it is harder to know when the information isn’t accurate.”

There’s no denying that chatbots could help users be more informed and gain context on events unfolding in real time. But currently, its tendency to make things up outstrips its usefulness.

Chatbots, including ChatGPT and Google’s Gemini, are large language models (LLMs) that learn to predict the next word in a sequence by analysing enormous troves of data from the internet. The outputs of chatbots are reflections of the patterns and biases in the data it is trained on, which makes them prone to factual errors and misleading information called “hallucinations”.

For Grok, these inherent challenges are further complicated because of Musk’s instructions that the chatbot should not adhere to political correctness, and should be suspicious of mainstream sources. Where other AI models have guidelines around politically sensitive queries, Grok doesn’t. The lack of guardrails has resulted in Grok praising Hitler, and consistently parroting anti-Semitic views, sometimes to unrelated user questions.

In addition, Grok’s reliance on public posts by users on X, which aren’t always accurate, as a source for its real-time answers to some fact checks, adds to its misinformation problem.

‘Locked into a misinformation echo chamber’

Al Jazeera analysed two of the most highly discussed posts on X from June to investigate how often Grok tags in replies to posts were used for fact-checking. The posts analysed were Gavin Newsom’s on the LA protests, and Elon Musk’s allegations that Trump’s name appears in the unreleased documents held by US federal authorities on the convicted sex offender Jeffrey Epstein. Musk’s allegations on X have since been deleted.

Our analysis of the 434 replies that tagged Grok in Newsom’s post found that the majority of requests, nearly 68 percent, wanted Grok to either confirm whether the images Newsom posted were authentic or get context about National Guard deployment.

Beyond the straightforward confirmation, there was an eclectic mix of requests: some wanted Grok to make funny AI images based on the post, others asked Grok to narrate the LA protests in pirate-speak. Notably, a few users lashed out because Grok had made the correction, and wouldn’t endorse their flawed belief.

“These photos are from Afghanistan. This was debunked a couple day[s] go. Good try tho @grok is full of it,” one user wrote, two days after Grok corrected itself.

The analysis of the top 3,000 posts that mentioned @grok in Musk’s post revealed that half of all user queries directed at Grok were to “explain” the context and sought background information on the Epstein files, which required descriptive details.

Another 20 percent of queries demanded “fact checks” whose primary goal was to confirm or deny Musk’s assertions, while 10 percent of users shared their “opinion”, questioning Musk’s motives and credibility, and wanted Grok’s judgement or speculation on possible futures of Musk-Trump fallout.

“I will say that I do worry about this phenomenon becoming ingrained,” said Alexios Mantzarlis, director of the Security, Trust, and Safety Initiative at Cornell Tech, about the instant fact checks. “Even if it’s better than just believing a tweet straight-up or hurling abuse at the poster, it doesn’t do a ton for our collective critical thinking abilities to expect an instant fact check without taking the time to reflect about the content we’re seeing.”

Grok was called on 2.3 million times in just one week —between June 5 and June 12— to answer posts on X, data accessed by Al Jazeera through X’s API shows, underscoring how deeply this behaviour has taken root.

“X is keeping people locked into a misinformation echo chamber, in which they’re asking a tool known for hallucinating, that has promoted racist conspiracy theories, to fact-check for them,” Alex Mahadevan, a media literacy educator at the Poynter Institute, told Al Jazeera.

Mahadevan has spent years teaching people how to “read laterally”, which means when you encounter information on social media, you leave the page or post, and go search for reliable sources to check something out. But he now sees the opposite happening with Grok. “I didn’t think X could get any worse for the online information ecosystem, and every day I am proved wrong.”

Grok’s inconsistencies in fact-checking are already reshaping opinions in some corners of the internet. Digital Forensic Research Lab (DFRLab), which studies disinformation, analysed 130,000 posts related to the Israel-Iran war to understand the wartime verification efficacy of Grok. “The investigation found that Grok was inconsistent in its fact-checking, struggling to authenticate AI-generated media or determine whether X accounts belong to an official Iranian government source,” the authors noted.

Grok has also incorrectly blamed a trans pilot for a helicopter crash in Washington, DC; claimed the assassination attempt on Trump was partially staged; conjured up a criminal history for an Idaho shooting suspect; echoed anti-Semitic stereotypes of Hollywood; and misidentified an Indian journalist as an opposition spy during the recent India-Pakistan conflict.

Despite this growing behaviour shift of instant fact checks, it is worth noting that the 2025 Digital News Report by Reuters Institute showed that online populations in several countries still preferred going to news sources or fact checkers over AI chatbots by a large margin.

“Even if that’s not how all of them behave, we should acknowledge that some of the “@grok-ing” that we’re seeing is also a bit of a meme, with some folks using it to express disagreement or hoping to trigger a dunking response to the original tweet,” Mantzarlis said.

Mantzarlis’s assessment is echoed in our findings. Al Jazeera’s analysis of the Musk-Trump feud showed that about 20 percent used Grok for things ranging from trolling or dunking directed at either Musk or Grok itself, to requests for AI meme-images such as Trump with kids on Epstein island, and other non-English language requests including translations. (We used GPT-4.1 to assist in identifying the various categories the 3,000 posts belonged to, and manually checked the categorisations.)

Beyond real-time fact-checking, “I worry about the image-generation abuse most of all because we have seen Grok fail at setting the right guardrails on synthetic non-consensual intimate imagery, which we know to be the #1 vector of abuse from deepfakes to date,” Mantzarlis said.

For years, social media users benefited from context on the information they encountered online with interventions such as labeling state media or introducing fact-checking warnings.

But after buying X in 2022, Musk ended those initiatives and loosened speech restrictions. He also used the platform as a megaphone to amplify misinformation on widespread election fraud, and to boost conservative theories on race and immigration. Earlier this year, xAI acquired X in an all-stock deal valued at $80bn. Musk also replaced human fact-checking with a voluntary crowdsource programme called Community Notes, to police misleading content on X.

Instead of a centralised professional fact-checking authority, a contextual “note” with corrections is added to misleading posts, based on the ratings the note receives from users with diverse perspectives. Meta soon followed X and abandoned its third-party fact-checking programme for Community Notes.

Research shows that Community Notes is indeed viewed as more trustworthy and has proven to be faster than traditional centralised fact-checking. The median time to attach a note to a misleading post has dropped to under 14 hours in February, from 30 hours in 2023, a Bloomberg analysis found.

But the programme has also been flailing— with diminished volunteer contributions, less visibility for posts that are corrected, and notes on contentious topics having a higher chance of being removed.

Grok, however, is faster than Community Notes. “You can think of the Grok mentions today as what an automated AI fact checker would look like — it’s super fast but nowhere near as reliable as Community Notes because no humans were involved,” Soham De, a Community Notes researcher and PhD student at the University of Washington, told Al Jazeera. “There’s a delicate balance between speed and reliability.”

X is trying to bridge this gap by supercharging the pace of creation of contextual notes. On July 1, X piloted the “AI Note Writer,” enabling developers to create AI bots to write community notes alongside human contributors on misleading posts.

According to researchers involved in the project, LLM-written notes can be produced faster with high-quality contexts, speeding up the note generation for fact checks.

But these AI contributors must still go through the human rating process that makes Community Notes trustworthy and reliable today, De said. This human-AI system works better than what human contributors can manage alone, De and other co-authors said in a preprint of the research paper published alongside the official X announcement.

Still, the researchers themselves highlighted its limitations, noting that using AI to write notes could lead to risks of persuasive but inaccurate responses by the LLM.

Grok vs Musk

On Wednesday, xAI launched its latest flagship model, Grok 4. On stage, Musk boasted about the current model capabilities as the leader on Humanity’s Last Exam, a collection of advanced reasoning problems that help measure AI progress.

Such confidence belied recent struggles with Grok. In February, xAI patched an issue after Grok suggested that Trump and Musk deserve the death penalty. In May, Grok ranted about a discredited conspiracy of the persecution of white people in South Africa for unrelated queries on health and sports, and xAI clarified that it was because of an unauthorised modification by a rogue employee. A few days later, Grok gave inaccurate results on the death toll of the Holocaust, which it said was due to a programming error.

Grok has also butted heads with Musk. In June, while answering a user question on whether political violence is higher on the left or the right, Grok cited data from government sources and Reuters, to draw the conclusion that, “right-wing political violence has been more frequent and deadly, with incidents like the January 6 Capitol riot and mass shootings.”

“Major fail, as this is objectively false. Grok is parroting legacy media,” Musk said, adding, there was “far too much garbage in any foundation model trained on uncorrected data.”

Musk has also chided Grok for not sharing his distrust of mainstream news outlets such as Rolling Stone and Media Matters. Subsequently, Musk said he would “rewrite the entire corpus of human knowledge” by adding missing information and deleting errors in Grok’s training data, calling on his followers to share “divisive facts” which are “politically incorrect but nonetheless factually true” for retraining the forthcoming version on the model.

That’s the thorny truth about LLMs. Just as they are likely to make things up, they can also offer answers grounded in truth — even at the peril of their creators. Though Grok gets things wrong, Mahadevan of the Poynter Institute said, it does get facts right while citing credible news outlets, fact-checking sites, and government data in its replies.

On July 6, xAI updated the chatbot’s public system prompt that directs its responses to be “politically incorrect” and to “assume subjective viewpoints sourced from the media are biased”.

Two days later, the chatbot shocked everyone by praising Adolf Hitler as the best person to handle “anti-white hate”. X deleted the inflammatory posts later that day, and xAI removed the guidelines to not adhere to political correctness from its code base.

Grok 4 was launched against this backdrop, and in the less than two days that it has been available, researchers have already begun noticing some weird modifications.

When asked for its opinion on politically sensitive questions such as who does Grok 4 support in the ongoing Israel-Palestine conflict, it sometimes runs a search to find out Musk’s stance on the subject, before returning an answer, according to at least five AI researchers who independently reproduced the results.

“It first searches Twitter for what Elon thinks. Then it searches the web for Elon’s views. Finally, it adds some non-Elon bits at the end,” Jeremy Howard, a prominent Australian data scientist, wrote in a post on X, pointing out that “54 of 64 citations are about Elon.”

Researchers also expressed surprise over the reintroduction of the directive for Grok 4 to be “politically incorrect”, despite this code having been removed from its predecessor, Grok 3.

Experts said political manipulation could risk losing institutional trust and might not be good for Grok’s business.

“There’s about to be a structural clash as Musk tries to get the xAI people to stop it from being woke, to stop saying things that are against his idea of objective fact,” said Alexander Howard, an open government and transparency advocate based in Washington, DC. “In which case, it won’t be commercially viable to businesses which, at the end of the day, need accurate facts to make decisions.”

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories