• kautau@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    3 days ago

    Free software

    users have the freedom to run, copy, distribute, study, change and improve the software

    https://www.gnu.org/philosophy/free-sw.en.html

    Open source

    https://en.wikipedia.org/wiki/The_Open_Source_Definition

    1. No discrimination against fields of endeavor, like commercial use

    You are removing the terms software and source. The code is freely available and to be open source should be usable for whatever purpose.

    As an aside, it’s used by smaller sites frequently to prevent overwhelming scraping that could take down the site, which has become far more rampant recently due to AI bots

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      2
      arrow-down
      3
      ·
      edit-2
      3 days ago

      I’m not saying it’s not open source or free. I say that it does not contribute to make the web free and open. It really only contribute into making everyone waste more energy surfing the web.

      The web is already too heavy we do NOT need PoW added to that.

      I don’t think even a raspberry 2 would go down over a web scrap. And Anubis cannot protect from proper ddos so…

      • kautau@lemmy.world
        link
        fedilink
        arrow-up
        6
        ·
        3 days ago

        I don’t think even a raspberry 2 would go down over a web scrap

        Absolutely depends on what software the server is running, if there’s proper caching involved. If running some PoW is involved to scrape 1 page it shouldn’t be too much of an issue, as opposed to just blindly following and ingesting every link.

        Additionally, you can choose “good bots” like the internet archive, and they’re currently working on a list of “good bots”

        https://github.com/TecharoHQ/anubis/blob/main/docs/docs/admin/policies.mdx

        AI companies ingesting data nonstop to train their models doesn’t make for a open and free internet, and will likely lead to the opposite, where users no longer even browse the web but trust in AI responses that maybe be hallucinated.

        • daniskarma@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          1
          arrow-down
          5
          ·
          edit-2
          3 days ago

          There a small number of AI companies training full LLM models. And they usually do a few trains per years. What most people see as “AI bots” are not actually that.

          The influence of AI over the net is another topic. But anubis is also not doing anything about that as it just makes so the AI bots waste more energy getting the data or at most that data under “anubis protection” does not enter the training dataset. The AI will still be there.

          Am I in the list of “good bots” ?sometimes I scrap websites for price tracking or change tracking. If I see a website running malware on my end I would most likely just block that site, one legitimate user less.

          • squaresinger@lemmy.world
            link
            fedilink
            arrow-up
            4
            ·
            2 days ago

            That’s outdated info. Yes, not a lot of scraping is really necessary for training. But LLMs are currently often coupled with web search to improve results.

            So for example if you ask ChatGPT to find a specific product for you, the result doesn’t come from the model. Instead it does a web seach, then it loads the results, summarizes them and returns you the summary plus the links. This is a time-critical operation since the user is waiting for the results. It’s also a bad operation for the site being scraped in many situations (mostly when looking for info, not for products) since the user might be satisfied with the summary and won’t click the source.

            So if you can delay scraping like that by a few seconds, that’s quite significant.

          • somerandomperson@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            4
            ·
            3 days ago

            I (and A LOT) of lemmings already had enough of AI. We DON’T need AI-everything. So we block/make it harder for ai to be trained. We didn’t say “hey, please train your llm on our data” anyways.

            • daniskarma@lemmy.dbzer0.com
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              edit-2
              3 days ago

              That’s legitimate.

              But it’s not “open”, nor “free”.

              Also it’s a little placebo. For instance Lemmy is not an Anubis usecase. As lemmy can be legitimately scrapped by any agent through the federation system. And I don’t really know how would even Anubis work with the openess of the Lemmy API.