Catoblepas@piefed.blahaj.zone to

Technology@beehaw.orgEnglish · 2 days ago

LLMs’ “simulated reasoning” abilities are a “brittle mirage,” researchers find

arstechnica.com

134

LLMs’ “simulated reasoning” abilities are a “brittle mirage,” researchers find

arstechnica.com

Catoblepas@piefed.blahaj.zone to

Technology@beehaw.orgEnglish · 2 days ago

Chain-of-thought AI “degrades significantly” when asked to generalize beyond training.

Using supervised fine-tuning (SFT) to introduce even a small amount of relevant data to the training set can often lead to strong improvements in this kind of “out of domain” model performance. But the researchers say that this kind of “patch” for various logical tasks “should not be mistaken for achieving true generalization. … Relying on SFT to fix every [out of domain] failure is an unsustainable and reactive strategy that fails to address the core issue: the model’s lack of abstract reasoning capability.”

Rather than showing the capability for generalized logical inference, these chain-of-thought models are “a sophisticated form of structured pattern matching” that “degrades significantly” when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate “fluent nonsense” creates “a false aura of dependability” that does not stand up to a careful audit.

As such, the researchers warn heavily against “equating [chain-of-thought]-style output with human thinking” especially in “high-stakes domains like medicine, finance, or legal analysis.” Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond “surface-level pattern recognition to exhibit deeper inferential competence,” they write.

Chat

Mac@mander.xyz
link
fedilink
English
arrow-up
7·
2 days ago
It is most definitely satire but that doesnt mean your comments aren’t worth reading.
- massive_bereavement@fedia.io
  link
  fedilink
  arrow-up
  4·
  2 days ago
  Amd you should see the therapeutic effects of brick throwing and the very promising health applications.
  
  You would be amazed of what you can achieve with a well thrown brick.

Technology@beehaw.org

technology@beehaw.org

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@beehaw.org

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

663 users / day
1.54K users / week
2.29K users / month
2.3K users / 6 months
1 local subscriber
39.9K subscribers
1.67K Posts
27.6K Comments
Modlog