
As generative AI platforms ingest greater oceans of data and get connected to more and more corporate databases, researchers are sounding an alarm: the tools are highly inaccurate and becoming more inscrutable.
Large language models (LLMs), the algorithmic platforms on which generative AI (genAI) tools like ChatGPT are built, are highly inaccurate when connected to corporate databases and becoming less transparent, according to two studies. LLMs (also known as foundation models) such as GPT, LLaMA, and DALL-E emerged over the past year and have transformed artificial intelligence (AI), giving many of the companies experimenting with them a boost in productivity and efficiency. But those benefits come with a heavy dollop of uncertainty.
To assess transparency, Stanford brought together a team that included researchers from MIT and Princeton to design a scoring system called the Foundation Model Transparency Index (FMTI). It evaluates 100 different aspects or indicators of transparency, including how a company builds a foundation model, how it works, and how it is used downstream.
The Stanford study evaluated 10 LLMs and found the mean transparency score was just 37%. LLaMA scored highest, with a transparency rating of 52%; it was followed by GPT-4 and PaLM 2, which scored 48% and 47%, respectively.
“If you don’t have transparency, regulators can’t even pose the right questions, let alone take action in these areas,” Bommasani said.
Meanwhile, almost all senior bosses (95%) believe genAI tools are regularly used by employees, with more than half (53%) saying it is now driving certain business departments, according to a separate survey by cybersecurity and antivirus provider Kaspersky Lab. That study found 59% of executives now expressing deep concerns about genAI-related security risks that could jeopardize sensitive company information and lead to a loss of control of core business functions.
“Much like BYOD, genAI offers massive productivity benefits to businesses, but while our findings reveal that boardroom executives are clearly acknowledging its presence in their organizations, the extent of its use and purpose are shrouded in mystery,” David Emm, Kaspersky’s principal security researcher, said in a statement.
…return accurate responses to most basic business queries just 22% of the time. And for intermediate and expert-level queries, accuracy plummeted to 0%.
https://www.computerworld.com/article/3711343/genai-is-highly-inaccurate-for-business-use-and-getting-more-opaque.html