• recapitated@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 month ago

    I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.

    I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)

    • cordlesslamp@lemmy.today
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 month ago

      Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?

      I’m not a tech person so probably don’t even know what I’m talking about.

      • GenosseFlosse@feddit.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 month ago

        Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.

        • JovialMicrobial@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?

          So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.

          • GenosseFlosse@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            1 month ago

            Always has been. Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.

      • generaldenmark@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        1 month ago

        I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…

        They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…

        I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well… which would be unfortunate.

  • JeeBaiChow@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    1 month ago

    I’m seldom on reddit after the exodus, but when I am, I noscript the duck out of it.

  • Mnemnosyne@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    I’m kind of curious to understand how they’re blocking other search engines. I was under the impression that search engines just viewed the same pages we do to search through, and the only way to ‘hide’ things from them was to not have them publicly available. Is this something that other search engines could choose to circumvent if they decided to?

    • Madis@lemm.ee
      link
      fedilink
      English
      arrow-up
      11
      ·
      1 month ago

      Search engine crawlers identify themselves (user agents), so they can be prevented by both honor-based system (robots.txt) and active blocking (error 403 or similar) when attempted.

      • Mnemnosyne@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        Thank you, I understand better now. So in theory, if one of the other search engines chose to not have their crawler identify itself, it would be more difficult for them to be blocked.

        • tb_@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 month ago

          This is where you get into the whole webscraping debate you also have with LLM “datasets”.

          If you, as a website host, are detecting a ton of requests coming from a singular IP you can block said address. There are ways around that by making the requests from different IP addresses, but there are other ways to detect that too!

          I’m not sure if Reddit would try to sue Microsoft or DDG if they started serving results anyway through such methods. I don’t believe it is explicitly disallowed.
          But if you were hoping to deal in any way with Reddit in the future I doubt a move like this would get you in their good graces.

          All that is to say; I won’t visit Reddit at all anymore now that their results won’t even show up when I search for something. This is a terrible move and will likely fracture the internet even more as other websites may look to replicate this additional source of revenue.

    • kratoz29@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 month ago

      One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      63
      ·
      1 month ago

      Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.

      kagis

      Yeah.

      https://gs.statcounter.com/search-engine-market-share

      According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.

      • scarabic@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 month ago

        Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.

        Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.

      • eronth@lemmy.world
        link
        fedilink
        English
        arrow-up
        30
        ·
        1 month ago

        It’s also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.

          • whatwhatwhatwhat@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            14 days ago

            Seconding this. I work in IT, and the number of tech-illiterate people using DuckDuckGo as their default search engine is astounding. It’s got to be about 10% of our users (none of whom are in tech roles).

  • Babalugats@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    1 month ago

    They’re also blocking posts by users who aren’t banned or even got a warning. It appears to the user as though it’s been posted, but it hasn’t.

      • Babalugats@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 month ago

        I didn’t know there was a name for it, I don’t have anymore info on it, but I can show examples of it happening.

      • WolfLink@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        edit-2
        1 month ago

        They’ve done this for a long time. It’s supposedly only supposed to be used on bots but it definitely isn’t in practice

    • eee@lemm.ee
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 month ago

      shadowbanning is a totally different issue that’s existed for a long time though.

  • MehBlah@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 month ago

    Bing it is then. I hate Microsoft with the intensity of thousand suns but bing is now my jam as long as this lasts.

    • buttfarts@lemy.lol
      link
      fedilink
      English
      arrow-up
      13
      ·
      1 month ago

      I’ve started a Kagi subscription for my new search engine. Basically $6 USD per month but because it’s a user-pay model they have a really good privacy policy and don’t sell/analyze your data.

      It’s currently better than Google (which I still use search in the maps for reviews)

        • CileTheSane@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          5
          ·
          1 month ago

          At best this is as intelligent as saying Google Maps is YouTube by another name because they’re both on Google servers. Even that would be smarter to say actually, because Google Maps and YouTube are owned by the same company.

          • MehBlah@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            4
            ·
            edit-2
            1 month ago

            When bing goes down so does duckduckgo but somehow your apples to oranges argument is somehow comparative to you.

            • CileTheSane@lemmy.ca
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              5
              ·
              1 month ago

              They share hosting servers, that doesn’t make them the same service. When the power goes out do you think you and your neighbors live in the same house?

              • MehBlah@lemmy.world
                link
                fedilink
                English
                arrow-up
                6
                arrow-down
                1
                ·
                1 month ago

                Just keep sucking down the hype. They don’t share the same hosting for the frontend but they both use the same backend. The backend is of course owned by microsoft. duckduckgo uses bings backend and somehow you have convinced yourself beyond all evidence to the contray that it isn’t bing with a different wrapper.

                • CileTheSane@lemmy.ca
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  9
                  ·
                  1 month ago

                  When you can’t pay Stardew Valley (because Steam is down) you also can’t play Eldenring. They must use the same backend and Eldenring is just Stardew Valley by another name.

                  You’re going to need a better source than “they go down at the same time”.

        • CileTheSane@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          8
          ·
          1 month ago

          Yes, duckduckgo uses other search engines to provide its results. Your point?

          I don’t care where duckduckgo gets the links from, I care how relevant the top links are and that they aren’t being crowded out by ads.

          • Emmie@lemm.ee
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            1 month ago

            No need to be defensive, ddg uses bing which means it is part of the big five under the hood. That always will have certain ramifications in the long run.

            I also use it but I am looking for decentralised alternatives in meantime not because ddg is bad but because sooner or later it will get worse.

            Also why are you so aggressive anyway, it’s super weird and doesn’t fit Lemmy

  • Burn_The_Right@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    1 month ago

    Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.

    I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.

    • ChronosTriggerWarning@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 month ago

      I saw Reddit results in a search last night using DDG. It just said something like “It’s here on Reddit, but we’re not allowed to show you.” I wasn’t planning on using Reddit (never again), but that just irritated me.

  • AlphaOmega@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    Couldn’t a search engine just aggregate the result from Google, filter the Reddit responses, and then add those results to their own organic results?

  • KroninJ@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 month ago

    It’s still possible to search with “site:reddit.com …”

    Has it been implemented yet or are they blocking non-flagged searches? Which seems odd.

    • tb_@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 month ago

      You shouldn’t be getting any new results if you do that, older posts will/may remain indexed.