I found this link aggregator that someone made for a personal project and they had an exciting idea for a sorting algorithm whose basic principle is the following:

  1. Upvotes show you more links from other people who have upvoted that content
  2. Downvotes show you fewer links from other people who have upvoted that content

I thought the idea was interesting and wondered if something similar could be implemented in the fediverse.

They currently don’t have plans of open-sourcing their work which is fine but I think it shouldn’t be too hard to try and replicate something similar here right?

They have the option to try this out in guest mode where you don’t have to sign in, but it seems to be giving me relevant content after upvoting only 3 times.

There is more information on their website if you guys are interested.

Edit: Changed title to something more informative.

  • Skull giver@popplesburger.hilciferous.nl
    link
    fedilink
    English
    arrow-up
    22
    ·
    9 months ago

    Based on these criteria:

    1. Upvotes show you more links from other people who have upvoted that content
    2. Downvotes show you fewer links from other people who have upvoted that content

    That sounds a lot like you would need to keep track of the vote weights of every user for every other user.

    With 100 users, that would require tracking and updating 10000 values. That seems quite manageable!

    There are about 39863 active Lemmy users, according to Fediverse.observer. That means keeping track of (up to) 1,589,058,769 weights (assuming you use f16 representations, that’s over 3GB of data). For every single upvote and downvote.

    Would this be possible to implement? Yeah, for sure! Would this be practical? Not really, no! This is an O(n²) complex algorithm in terms of data storage, and that simply won’t do.

    Imagine if Lemmy were to gain popularity and all of its inactive accounts came back. We’re now up to 2 million users. For busy servers (.ml, world) that means tracking 4 trillion variables for every single interaction (that’d be 8TB of data).

    Did I say 2 million? Threads.net has 140 million users. Foursquare has 55 million. The “classic” fediverse has a mere 14 million users (active + inactive). That’s 4.3681×10¹⁶ weights to update for each upvote.

    Now, perhaps there are ways to do this more efficiently. For example, users’ devices could track these numbers, so exposing the upvote data would allow the end users’ device to “only” track a couple million data points in their browser and on their devices, and then locally calculate the score of batches of a couple hundred posts collected based on some kind of heuristic. Sorting would take a while and drain your phone’s battery, but it wouldn’t kill the server. With “just” a couple hundred megabytes of data transfer every time the browser gets refreshed and running the device’s GPU/AI accelerator chip full blow for a while, you could use this algorithm.

    The idea is enticing, but it doesn’t scale well.

    You’d also turn Lemmy into the strongest echo chamber you could possibly create. I’m not sure if that’s what we want to do. If that’s your goal, you should consider moving to Facebook or Threads, maybe?

    • statue_smudge@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      ·
      9 months ago

      Storing it as a sparse graph should reduce the storage requirements drastically, since most edges wouldn’t exist.

    • Danterious@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      edit-2
      9 months ago

      you should consider moving to Facebook or Threads, maybe?

      Not an option

      As for the rest yeah those do seem like genuine obstacles. Partially think the reason I liked the algorithm is because it reminded me of the Web of Trust things like Scuttlebutt use to get relevant information to users but with a lower barrier to entry.

      Also as I’ve said elsewhere it doesn’t have to be this exact thing but since this is a new platform we have the chance to make algorithms that work for us and are transparent so I wanted to share examples that I thought were worthwhile.

      Edit:

      You’d also turn Lemmy into the strongest echo chamber you could possibly create.

      PS. I don’t think that’s true. Big tech companies that have more advanced algorithms would probably be much better at creating echo chambers.

    • criitz@reddthat.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      Instead of comparing every single individual users votes with every other one, you create clusters using data science techniques and bucket all users into those clusters, which are calculated on a nightly or weekly basis. By controlling the cluster size you can keep the number of comparisons managable, and still achieve OP’s vision.