• Technus@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Not sure what this internal state you are referring to is. Are you talking about all the values that come out of each step of the computations?

    It would need to be able to form memories like real brains do, by creating new connections between neurons and adjusting their weights in real time in response to stimuli, and having those connections persist. I think that’s a prerequisite to models that are capable of higher-level reasoning and understanding. But then you would need to store those changes to the model for each user, which would be tens or hundreds of gigabytes.

    These current once-through LLMs don’t have time to properly digest what they’re looking at, because they essentially forget everything once they output a token. I don’t think you can make up for that by spitting some tokens out to a file and reading them back in, because it still has to be human-readable and coherent. That transformation is inherently lossy.

    This is basically what I’m talking about: https://www.comicagile.net/comic/context-switching/

    But for every single token the LLM outputs. The fact that it’s allowed to take notes is a mitigation for this context loss, not a silver bullet.

    • Modern_medicine_isnt@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 days ago

      Yeah, I get what you are saying. I’m just not convinced that it needs to be able to update it’s model in real time to be capable of high level reasoning. And while human readable files are inherantly lossy, they do still represents tracking an internal state.
      They also have vector dbs. My understanding is that they are closer to what you are talking about as far as internal state. But they still don’t allow th AI to update the vectordb in real time. Mainly they worry about what happens with live updates being similar to how people are easily manipulated into believeing BS. So they are more careful about what they feed it to update. I do wonder how they generate those vector dbs, and if that is something users could utilize locally.