• recapitated@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    2 months ago

    I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.

    I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)

    • cordlesslamp@lemmy.today
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 months ago

      Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?

      I’m not a tech person so probably don’t even know what I’m talking about.

      • GenosseFlosse@feddit.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        2 months ago

        Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.

        • JovialMicrobial@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?

          So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.

          • GenosseFlosse@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            2 months ago

            Always has been. Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.

      • generaldenmark@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        2 months ago

        I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…

        They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…

        I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well… which would be unfortunate.