AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

hedge@beehaw.org · 6 months ago

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Onno (VK6FLAB)@lemmy.radio · 6 months ago

The underlying issue with an LLM is that there is no “learning”. The model itself doesn’t dynamically change whilst it’s being used.

This article sets out a process that gives the ability to alter the model, by “dialling up” (or down) concepts. In other words, it’s changing the balance of the weight of concepts across the whole model.

Altering one concept is hardly “learning”, especially since it’s being done externally by researchers, but it’s a start.

A much larger problem is that the energy consumption is several orders of magnitude larger than that of our brain. I’m not convinced that we have enough energy to make a standalone “AI”.

What machine learning actually gave us is the ability to automatically improve a digital model of things, like weather prediction, something that took hours on a supercomputer to give you a week of forecast, now can be achieved on a laptop in minutes with a much longer range and accuracy. Machine learning made that possible.

An LLM is attempting the same thing with human language. It’s tantalising, but ultimately I think the idea applied to language to create “AI” is doomed.

Paragone@beehaw.org · 5 months ago

To the best of my knowledge, back-propagation IS learning, whether it’s happening in a neural-net on a chip, or whether we’re doing it, through feedback, & altering our understanding ( so both hard-logic & our wetware use the method for learning, though we use a rather sloppy implimentation of it. )

& altering the relative-significances of concepts IS learning.

( I’m not commenting on whether the new-relation-between-those-concepts is wrong or right, only on the mechanism )

so, I can’t understand your position.

Please don’t deem my comment worthy of answering: I’m only putting this here for the record, is all.

Everybody can downvote my comment into oblivion, & everything in the world’ll still be fine.

Onno (VK6FLAB)@lemmy.radio · 5 months ago

Back propagation happens during the creation of the model, not after it’s deployed.

dsemy@lemm.ee · 6 months ago

A much larger problem is that the energy consumption is several orders of magnitude larger than that of our brain. I’m not convinced that we have enough energy to make a standalone “AI”.

This is a major issue I have with basically anyone who talks about current “AI” systems - they’re clearly not even close to AI, as they require an extreme amount of energy and data to perform tasks which would be trivial to an actual brain. They seem to lack any ability to comprehend their input, only mimicking it through brute force, which is only feasible since computers got fast enough and we can currently keep up with the energy demands.

GenderNeutralBro@lemmy.sdf.org · 6 months ago

AI does not mean artificial brain or anything similar. It’s a very broad term that’s been in use for about 70 years now.

Pac Man has AI.

dsemy@lemm.ee · 6 months ago

Obviously I’m not referring to that, but to what large tech companies call AI. And they are in fact trying to convince people these AI systems they are developing will soon be clever enough to be considered general AI.

astronaut_sloth@mander.xyz · 6 months ago

The original paper itself, for those who are interested.

Overall, this is really interesting research and a really good “first step.” I will be interested to see if this can be replicated on other models. One thing that really stood out, though, was that certain details are obfuscated because of Sonnet being proprietary. Hopefully follow-on work is done on one of the open source models to confirm the method.

One of the notable limitations is quantifying activation’s correlation to text meaning, which will make any sort of controls difficult. Sure, you can just massively increase or decrease a weight, and for some things that will be fine, but for real manual fine tuning, that will prove to be a difficulty.

I suspect this method is likely generalizable (maybe with some tweaks?), and I’d really be interested to see how this type of analysis could be done on other neural networks.

ɔiƚoxɘup@beehaw.org · 6 months ago

https://archive.is/20240521152252/https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/

Ilandar@aussie.zone · 6 months ago

This sounds promising but I do wonder how undermined any progress they make will be by:

the speed of advancements in AI
the fact that this research doesn’t necessarily apply to other LLMs
the fact that LLMs are being released/leaked to the public, so anyone who has access to them has the potential to jailbreak the AI and circumvent any safety precautions researchers implement as a result of this work