Research Findings:

  • reCAPTCHA v2 is not effective in preventing bots and fraud, despite its intended purpose
  • reCAPTCHA v2 can be defeated by bots 70-100% of the time
  • reCAPTCHA v3, the latest version, is also vulnerable to attacks and has been beaten 97% of the time
  • reCAPTCHA interactions impose a significant cost on users, with an estimated 819 million hours of human time spent on reCAPTCHA over 13 years, which corresponds to at least $6.1 billion USD in wages
  • Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set
  • Google should bear the cost of detecting bots, rather than shifting it to users

“The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service,” the paper declares.

In a statement provided to The Register after this story was filed, a Google spokesperson said: “reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling.”

  • cley_faye@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    2 months ago

    reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling

    That’s funny, because when I’m faced with this, I keep adding/removing one of the image randomly and it keeps accepting them as ok.

  • repungnant_canary@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    It is undoubtedly a new piece of research, but the cause is always the same: corporations exploit people because they are taken out of government and democratic control effectively everywhere.

    Some corporations employ more people and have bigger budgets than some countries and they often influence people’s lives more than the government. Yet they’re effectively electoral monarchies where electors and monarchs are just a bunch of rich assholes who respond to nobody.

    Only when we change that system then those headlines will stop.

  • FierySpectre@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    2 months ago

    I mean, duh? With proof of work captchas existing, there’s no reason to have those image selection captchas… Ever…

    How those work is by having the server generate a puzzle. Server side this is cheap to generate, while client side solving is “hard”. The server can even choose the difficulty of the puzzle, and even set it dynamically. This means that when your website is under light load the captcha can be really easy/fast to solve. If your website is under attack however the captcha can be set to take seconds to solve.

    • brbposting@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      24
      ·
      2 months ago

      Finally heard a clear audio CAPTCHA for the first time in my life this past month. It was glorious. There was slight garbling before and after the characters were read, but that’s it.

      Besides that singular experience, all audio CAPTCHAs have been utterly 100% impossible to interpret. Blaring white noise followed by a small squeak of “threeve” or “eleventeen”.

      • IronKrill@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        I’ve found them to be pretty clear usually. Half-formed words at start/end I just ignore. Either way, even on Firefox with uBlock and all the rest, audio captchas have always passed me first try even if I think I got it wrong. I don’t like posting about it in-case they tighten it up after it gets more users.

  • serenissi@lemmy.world
    link
    fedilink
    English
    arrow-up
    37
    arrow-down
    2
    ·
    2 months ago

    The objective of reCAPTCHA (or any captcha) isn’t to detect bots. It is more of stopping automated requests and rate limiting. The captcha is ‘defeated’ if the time complexity to solve it, whether human or bot, is less than what expected. Now humans are very slow, hence they can’t beat them anyway.

    • tb_@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      2 months ago

      I thought captcha’s worked in a way where they provided some known good examples, some known bad examples, and a few examples which aren’t certain yet. Then the model is trained depending on whether the user selects the uncertain examples.

      Also it’s very evident what’s being trained. First it was obscured words for OCR, then Google Maps screenshots for detecting things, now you see them with clearly machine-generated images.

    • smb@lemmy.ml
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 months ago

      […] reCAPTCHA […] isn’t to detect bots. It is more of stopping automated requests […]

      which is bots. bots do automated requests and every automated request doer can also be called a bot (i.e. web crawlers are called bots too and -if kind- also respect robots.txt which has “bots” in its name for this very reason and bots is the shortcut for robots) use of different words does not change reality behind it, but may add a fact of someone trying something on the other.

      • serenissi@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        There isn’t a good way to classify human users with scripts without adding too much friction to normal use. Also bots are sometimes welcome amd useful, it’s a problem when someone tries to mine data in large volume or effectively DoS the server.

        Forget bots, there exist centers in India and other countries where you can employ humans to do ‘automated things’ (youtube like count, watch hour for example) at the same expense of bots. There are similar CAPTCHA services too. Good luck with those :)

        Only rate limiting is the effective option.

        • smb@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          Only rate limiting is the effective option.

          i doubt that. you could maybe ratelimit per IP and the abusers will change their IP whenever needed. if you ratelimit the whole service over all users in the world, then your service dies as quickly into uselessness as effective your ratelimiter is. if you ratelimit actions of logged in users, then your ratelimiting is limited by your ability to identify fake or duplicate accounts, where captchas are not helpful at all.

          at the same expense of bots. they might be cheap, but i doubt that anyway, bots don’t need sleep.

          i was answering about that wording (that captchas were “not” about bots but about “stopping automated requests”) and that automated requests “are” bots instead.

          call centers are neither bots nor automated requests (the opposite IS their advantage) and thus have no relation to what i was specifically saying in reply to that post that suggested automated requests and bots would be different things in this context.

          i wasn’t talking about effectiveness of captchas either or if bots should be banned or not, only about bots beeing automated requests (and vice versa) from the perspective of the platform stopping bots. and that trying to use different words for things, (claiming like “X isn’t X, it is really U!”* or automated requests aren’t bots) does not change the reality of the thing itself.

          *) unrelated to any (a-)social media platform

          • serenissi@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 month ago

            stopping automated requests

            yeah my bad. I meant too many automated requests. Both humans and bot generate spams and the issue is high influx of it. Legitimate users also use bots and by no means it’s harmful. That way you do not encounter captcha everytime you visit any google page, nor a couple of scraping scripts gets a problem. Recaptcha (or hcaptcha, say) triggers when there is high volume of request coming from same ip. Instead of blocking everyone out to protect their servers, they might allow slower requests so legitimate users face mininimal hindrance.

            Most google services nowadays require accounts with stronger (like cell phone) verification so automated spam isn’t a big deal.

            • smb@lemmy.ml
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 month ago

              since bots are better at solving captchas and humanoid services exist that solve them, the only ones negatively affected by captchas are regular legitimate users. the bad guys use bots or services and are done. regular users have to endure while no security is added, and for the influx i guess it is much more like with the better lock on the front door: if your lock is a bit better than that of your neigbhour, theirs might be force-opened more likely than yours. it might help you, but its not a real but only relative and also very subjective feeling of 'security".

              beeing slower than the wolves also isn’t as bad as long as you are not the slowest in your group (some people say)… so doing a bit more than others always is a good choice (just better don’t put that bar too low like using crowdsnakeoil for anything)

              • serenissi@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 month ago

                the bad guys use bots or services and are done. regular users have to endure while no security is added

                put in other words, common users can’t easily become ‘bad guy’ ie cost of attack is higher hence lower number of script kiddies and automated attacks. You want to reduce number. These protections are nothing for bitnet owners or other high profile bad actors.

                ps: recaptcha (or captcha in general) isn’t a security feature. At most it can be a safety feature.

                • smb@lemmy.ml
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  29 days ago

                  isn’t a security feature. At most it can be a safety feature.

                  o,O

  • HiramFromTheChi@lemmy.world
    link
    fedilink
    English
    arrow-up
    39
    arrow-down
    1
    ·
    2 months ago

    There’s nothing that can express my disdain for Google’s reCaptcha.

    😒 We’re training its AI models 😒 It’s free labor for Google 😒 Sometimes it wants the corner of an object, sometimes it doesn’t 😒 Wildly inconsistent 😒 Always blurry and hard to see 😒 Seemingly endless 😒 It’s the robot asking us humans if we’re the robots

    • Appoxo@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      edit-2
      2 months ago

      In case you didnt know: This is already a thing with pictures slowly fading in for selecting stuff like traffic cones or busses.

  • polonius-rex@kbin.run
    link
    fedilink
    arrow-up
    35
    arrow-down
    2
    ·
    2 months ago

    Google should bear the cost of detecting bots, rather than shifting it to users

    how?

      • siph@lemmy.world
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        1
        ·
        2 months ago

        Considering the article states that reCAPTCHA v2 and v3 can be broken/bypassed by bots 70-100% of the time, they are obviously not the solution.

        • polonius-rex@kbin.run
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          2 months ago

          how do you get the metric of 70-100% of the time?

          the best bots doing it 70-100% of the time is very different to the kind of bot your average spammer will have access to

          • siph@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            2
            ·
            2 months ago

            Did you read the article or the TL:DR in the post body?

            The paper, released in November 2023, notes that even back in 2016 researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time. The reCAPTCHA v2 checkbox challenge is even more vulnerable – the researchers claim it can be defeated 100 percent of the time.

            reCAPTCHA v3 has fared no better. In 2019, researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time.

            So yeah, while these are research numbers, it wouldn’t be surprising if many larger bots have access to ways around that - especially since those numbers are from 2016 and 2019 respectively. Surely it is even easier nowadays.

            • polonius-rex@kbin.run
              link
              fedilink
              arrow-up
              5
              ·
              2 months ago

              researchers were able to defeat reCAPTCHA v2 image challenges 70 percent of the time

              that doesn’t answer the question?

              researchers devised a reinforcement learning attack that breaks reCAPTCHAv3’s behavior-based challenges 97 percent of the time

              i’d argue “bespoke system, deployed in a very limited context, built by researchers at the top of their field” is kind of out of reach for most people? and any bot network scaled up automatically becomes easier to detect the further you scale it

               

              the cost of just paying humans to break these already at or below pennies per challenge

          • siph@lemmy.world
            link
            fedilink
            English
            arrow-up
            13
            arrow-down
            1
            ·
            2 months ago

            Maybe a billion dollar company has the budget to come up with something?

            Looking at the numbers in this post, reCAPTCHA exists to make Google money, not to keep bots out.

            I’d rather have no reCAPTCHA than the current state.

            • OsrsNeedsF2P@lemmy.ml
              link
              fedilink
              English
              arrow-up
              9
              ·
              edit-2
              2 months ago

              Hi it’s me. I work for a billion dollar company with a budget. We have no ethical ideas on how to stop bots. Thanks for coming to my tech talk.

              • siph@lemmy.world
                link
                fedilink
                English
                arrow-up
                7
                arrow-down
                1
                ·
                2 months ago

                Yeah, that’s about the way I’d expect it to go.

                “Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”

                There might be a tiny chance they’re not interested in changing things.

        • conciselyverbose@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          7
          ·
          2 months ago

          At what cost?

          100% success rate isn’t even moderately useful if it costs $5 per pass. The discussion is completely pointless without a concrete, documented analysis of the actual hardware and energy costs involved.

        • radivojevic@discuss.online
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 months ago

          “Google should bear the cost”

          Google should shut it down and make sites roll their own verification. Give everyone a month to implement a new solution on millions of websites.

            • radivojevic@discuss.online
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 months ago

              I’m actually 100% for rolling your own… almost everything.

              20 years ago I made an e-commerce website for a client. Looking at the code now I’m embarrassed how insecure it is. However, because it was totally custom no one ever found the bugs and it has never been cracked. (Knock on wood) that’s the benefit of not using a prebuilt solution that isn’t a target for mass exploits.

  • Petter1@lemm.ee
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    3
    ·
    2 months ago

    Why is that no news to me? How did so many people not know that? Should I have spread the word more, even if all people I told that where likr “yea, yea, of course, but, what can I do? 🤷🏻‍♀️”?

  • umbraroze@lemmy.world
    link
    fedilink
    English
    arrow-up
    32
    arrow-down
    1
    ·
    2 months ago

    reCAPTCHA is exploiting users for profit

    Well duh.

    reCAPTCHA started out as a clever way to improve the quality of OCRing books for Distributed Proofreaders / Project Gutenberg. You know, giving to the community, improving access to public-domain texts. Then Google acquired them. Text CAPTCHAs got phased out. No more of that stuff, just computer vision rubbish to improve Google’s own AI models and services.

    If they had continued to depend on tasks that directly help community, Google would at least have had to constantly make sure the community’s concerns are met. But if they only have to answer to themselves for the quality of the data and nobody else even gets to see it, well, of course it turned into yet another mildly neglected Google project.

    • dan@upvote.au
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 months ago

      Then Google acquired them. Text CAPTCHAs got phased out

      Google kept the text version for five years after the acquisition though. They used it to digitize books on Google Books, to allow full-text search of their book archive.