DeepSeek has open-sourced DeepSeek-OCR, a 3B-parameter model that upends how AI will deal with text, by being far more efficient than anything else available.

Lugh@futurology.today · 4 months ago

DeepSeek has open-sourced DeepSeek-OCR, a 3B-parameter model that upends how AI will deal with text, by being far more efficient than anything else available.

Clocks [They/Them]@lemmy.ml · 4 months ago

So instead of recognizing characters…

Compress page / text into a handful of pixels.
Feed pixels into a generative AI.
Hope for the best.

I rather just use existing OCRs which can be easily backtracked in how they processed text.

FauxLiving@lemmy.world · 4 months ago

They were able to efficiently encode visual information to be used by further networks. In this case the further network was a language model trained on an OCR task.

The news is the technique, the OCR software is a demonstration of the technique. Encoding visual information efficiently is also key for robotics which use trained networks in their feedback control loops. Being able to process 10 times as much visual data with the same hardware is a very significant increase in capability.

Moidialectica [he/him, comrade/them]@hexbear.net · 4 months ago

it doesn’t actually process text, which is why it’s more efficient, it can essentially take in ten times the text through images without suffering the penalties associated by having that many tokens

metaStatic@kbin.earth · 4 months ago

https://github.com/deepseek-ai/DeepSeek-OCR

DeepSeek has open-sourced DeepSeek-OCR, a 3B-parameter model that upends how AI will deal with text, by being far more efficient than anything else available.

DeepSeek has open-sourced DeepSeek-OCR, a 3B-parameter model that upends how AI will deal with text, by being far more efficient than anything else available.

Text is Dead, Vision Shall Reign: Karpathy Raves About New DeepSeek Model, Ending Tokenizer Era