For example, if I (on kbin.run - which is Mbin, but for the purposes of this let’s just assume it’s Kbin) go to a random magazine on kbin.social, I will often see a prompt that the magazine may be incomplete and that I should visit the original instance for all the content.
Why doesn’t the request to that magazine automatically trigger a “pull” from that instance for that magazine, or at least cause it to check if the number of threads is the same (and conditionally pull on that)? I would think by pulling the changes then, magazines would never be out-of-date.
I get that it would be a lot heavier of a load on the servers, but in combination with good caching techniques (maybe setting a time of 1 day or something until the next pull occurs, idk) I feel like that could be mitigated.
Is this maybe an implementation detail of ActivityPub?
Thank you!
It’s in production right now and it’s working just fine. Push based systems are efficient and effective as long as retransmission on failure is taken into account. Some services don’t do that, or at least not enough, or suffer from thundering horde issues when they recover from downtime, but that’s all because of the individual implementations.
This problem barely exists in centralised systems such as Twitter or Reddit (it does crop up sometimes), but they don’t tell you that your local cluster may be out of sync. They just pretend you’re up to date and that the comments you didn’t see five minutes ago have always been there.
Of course, you’re free to use a different protocol for your own fediverse server if you feel ActivityPub isn’t up to the task. There are various federating protocols available, and services sometimes speak multiple protocols. Matrix and OStatus also federate, for instance.
OStatus has been mostly abandoned, but it worked similarly to ActivityPub, with subscriptions and publications using standard protocols. GNU Social and, I believe, Friendica still use it.
If you value consistency above all else, you could use Matrix, for example, which does active syncing to ensure data is up to date. However, you’ll need a beefy server if you’re going to keep magazine equivalents up to date for thousands of people. My Synapse server (8GB of RAM, 4 Epyc cores) takes a couple of minutes to join rooms like #matrix on the main server with a few hundred active participants. Granted, Synapse is written in Python, but I doubt the alternatives will load as quickly as reading a magazine will ever be.
Even protocols like IRC suffer from “net splits” that break up federation between them. NNTP is partially pull based, so perhaps that’ll suit your needs, but interactions aren’t nearly as fast as ActivityPub in my experience. SMTP is also a federated protocol but that doesn’t provide pull mechanisms either. You could use a DHT and use BitTorrent-like unified data storage, but that’s slow as hell in compassion. Perhaps a system of linked IPFS addresses could be used as a fast method of pulling in distributed posts, but that’s not exactly a fast protocol either, and it’s not really meant for this stuff.
There are fast, distributed systems, like databases, but they only work because all servers in a cluster can be trusted. They’re also severely restrained in what directions data can flow and how applications react to data insertions. A database model would be quite trivial to disrupt or DDoS if you use it for the “anyone can federate with anyone else” style federation that ActivityPub is designed for. Blockchains may work for that, but they’re slow, inefficient, and, as a consequence, usually expensive.
I’m afraid you’ll have to build a protocol from the ground up if you want to enable pulling in magazines, kbin style. You could accept ActivityPub submissions and serve them on your own server using your own protocol, of course, but I’m not sure if you’ll find anyone who will implement your protocol if you don’t find a solution to the performance problems that distributed systems need to cope with.