Well, the big issue here is that we sort of don’t have the power you think we do.
What I mean is, say you have 10 servers. 7 are Lemmy, 3 are kbin. Great, each admin has control over those servers. Then you have Meta. They’ll run 1 huge server. When the 10 other servers enable Federation, Meta now has 10 servers of content that isn’t even on their own platform that they can sell. Your data will literally exist on the Meta server because your data is not contained within your instance/platform once it’s Federated. Meta can then harvest the entire Fediverse for data like this. It’s like an absolute wet dream for them. They don’t even have to coax people to use their own platform!
Meta must be defederated the second they so much as dip a toe into the Fediverse or everything you’ve ever done, or do, on any ActivityHub platform will be scooped up and sold.
Edit: And it’s even worse because all it takes is 1 server to Federate with Meta. If server A is Federated with your sever B, Meta can sill pull your data from server A they Federated with, even if your local server B has Defederated with Meta. This is a huge problem.
I’m confused about what kind of data you want to protect. If you mean your posts and comments, they are already publicly availible on the Internet. Meta doesn’t need to make a activitypub app that gets federated with Lemmy (or kbin) to aggregate and sell this data.
Is there an other kind of data that is visible only to server administrators?
I guess the fear is that they’ll monetize others’ content without giving anything back. Like imagine if there was Reddit2 that just took all the content from Reddit but didn’t add their oc back to Reddit. Basically just leeching off and your average user would be incentivized to join “Reddit2” since it had all the content that Reddit has and more. They’d slowly drain users from Reddit to Reddit2 and THEN monetized turning everything to shit (you can use your imagination how’d that look).
Nothing stopping them, except, you know, the law… They can certainly display content that was not marked for public display. They will then proceed to get sued out of existence… If they do this automatically I’ll just privately post a music file with copyright protected music. Which is perfectly fine to do if it is indeed hidden from everyone. If they then publicly post it that’s on them and now I get to see the Music Industry fight the Zuck :D
Edit: Been corrected, the following is NOT how it works! Original Text follows
Someone correct me if I’m getting details wrong, but from reading this post it appears as if fediverse admins are provided both the username and email accounts registered by those users that have visited their instances.
If that’s true, one problematic scenario I can imagine is when someone has registered on the fediverse with a pseudonym, but has an e-mail address they also use on their real-life Facebook profile. Visiting a Facebook-run ActivityPub instance while logged in would give Facebook enough data to link both the pseudonymous account (with past and future post history), and the real-life Facebook profile.
So, even if you’re not signed up for Facebook’s version of ActivityPub, engaging with it could still be giving Facebook a source of ongoing data for building personal profiles and targeted advertisement that people would not provide on their own.
It’s more about ease and precision. I can see on your post on the website that you posted “5 hours ago”, but in the request, I’d likely be able to see an exact posting time for yours and significantly more posts (hundreds?) simultaneously.
Right… But…
ActivityPub is not a protected encrypted protocol. Everything anyone says on any service using ActivityPub can already be intercepted and harvested by anyone, even blocked instances. The defederating is software based. But for example if someone wanted they could simply do https://mastodon.social/tags/fediverse.rss and there were go, instant access to data from the Fediverse. You can query any Mastodon server for any hashtag you like. That’s just one of many endpoints that will spit out Fediverse content.
What I’m taking issue with is essentially the same thing that is getting Reddit into hot water. Spez is acting like all the content on Reddit is exclusively his. And legally, it probably is, since it exists on his servers. Now if you extrapolate that out to Meta on ActivityHub, any instance that federates with them immediately puts all of your content directly onto Meta’s servers. Once it’s in their possession, it’s legally theirs to do with as they please. If they want to pull a Facebook or Reddit, using your data, they can with no way for you to opt-out. Sure, nothing is stopping people from doing it already, but Meta does not have your best interest in mind. Ever. They’ve shown it again and again. So I think people are preemptively wanting to cut off this spigot of user data to Meta because their abuse of it is a matter of when, not if. Any other company might deserve the benefit of the doubt, but Meta? We know who they are already.
Also, as I said elsewhere, Meta could already use a bot to scrape Lemmy instances, but you can’t sell a bot to investors. But you can sell a platform. Meta will build a slick platform to sell to investors and sit back while federation fills up their instance with data which they’ll turn around and sell the same way they do on Facebook. To me this is very different than setting up an RSS feed.
I completely agree with the overall point you’re making, but would like to correct the legal aspects. I am not a lawyer, but I do have a pretty good understanding of US copyright law which is the most relevant in this case.
Having possession of data isn’t sufficient to legally establish the rights to do as a company pleases. In general, an individual author immediately has copyright on a creative work as soon as it’s recorded in any medium. The main exception to this is “work for hire” — a legal agreement that employers hold copyrights since they’re paying for the work. It’s usually part of the paperwork an established company has you sign when you start a job.
Because of this, and because we users aren’t employees of Reddit, they need a license to duplicate and display our copyrighted posts. The terms of service for any online service almost always stipulate a “worldwide, non-exclusive, perpetual license”. In other words: you still own the copyright to your post and can still share it elsewhere, but by sending it to Reddit, they get to put it anywhere they want and you can’t ever take that right away from them.
If Meta begins slurping up data from the Fediverse, things get tricky. They’re probably violating copyright law if they do that, just as ChatGPT, Google Bard, etc… likely have. However, legal enforcement of our rights would be near-impossible. Everyone who has ever had an account with any of Meta’s properties has most likely agreed to an binding arbitration provision. (These are utterly immoral, they force you — as a precondition of doing business! — to preemptively waive your legal rights before anything occurs that would cause you to need them.) These provisions also prohibit any sort of class action, so each individual person would have to initiate their own case against Meta. And then you’d have to somehow prove to an arbitrator from an organization selected by and paid by Meta that Meta violated your copyright. And Meta’s high-priced lawyers will have all kinds of ways of referencing prior cases to argue why what they did is fine.
So yeah. But again, I completely agree with your main point. Meta will (if they haven’t already) collect all the data they please from the Fediverse and use it to further their business interests. And those business interests are not aligned with our best interests.
Thank you for your clarification! I don’t know any of the legal specifics of this stuff and I very much appreciate you taking the time to help educate me and anyone else who needs it. I can only give a conceptual argument based on the history I’ve seen with these companies, but not any sort of specific knowledge of law.
The gist of what you’re saying, and what we’ve actually seen play out recently, is technically they shouldn’t be able to do this, but they’re going to lawyer it in such a way that they’ll get away with it unless/until someone actually sues them which is prohibitively expensive. We have recently seen class action suits against Meta, but realistically the damage has already been done, the money has already been made, and they go on with finding the next cash cow. Even a multimillion dollar settlement is a drop in the bucket, simply the cost of doing business for these people.
You bring up an interesting point, because of how the fediverse works, every server (that has an active subscription) essentially has a mirror of the original data. So if Facebook have data from people who never consented to that, then they would surely be breaking GDPR rules? GDPR rules say that they can only PROCESS the data (or mine it - if you want to use a more realistic term) if a user has explicitly agreed to that, implicit agreement doesn’t count. So this is going to interesting to see how they manage this - providing that they don’t process the data and simply present it, as is - they don’t break GDPR, but the second that they start processing it, they breach GDPR. Now - they can process data that belongs to their users, but they would have to write code that ensures they don’t ingest posts from any user that is not a meta user - for the purposes of harvesting it.
Yes, this is exactly the sticky issue we get into. And I’m wondering if lawyers would be able to make a case that using ActivityPub alone automatically gives your consent to have your data exist on an instance outside your own. Once they have data you’ve consented to give they can do with it as they please, essentially arguing you’ve become a consenting party when you consented to federation. I don’t know the GDPR well enough to have any answers, but you can bet Meta lawyers do.
I don’t think Facebook would be having high level NDA-protected talks with Mastodon people if they weren’t trying to work all this out. And by work out, I mean how to monetize/data mine. I’ve been talking about this with people all day, many of whom didn’t see a problem with this, but eventually all of them have had the lightbulb turn on when they realize the potential abuse Meta could do with/to ActivityPub.
If, by some miracle, Meta wants to be the good guy for a change, let them prove it. I would love to see defederation by default, and let Meta prove they’re trustworthy to federate to. And even then, have a really itchy defederate trigger finger if they even hint at pulling another Cambridge Analytica fiasco. But getting everyone on-board with that is probably impossible, especially if Meta starts throwing money around.
Meta can have the data, that part yes you consent to by using ActivityPub software, though there is a whole other argument to get into later about whether “normal” users really understand that. But no Meta absolutely cannot process that data, for creating shadow profiles or anything like that - unless the user explicitly opts in. GDPR is quite clear that you cannot infer that a user agree based on some other influence (in this case the user using ActivityPub) - the user MUST have been presented with a dialog explaining what Meta would do with the data and giving the user the option to say they agree or disagree with it.
Thank you for the clarification there. I hope you don’t mind having this conversation with me, I’m learning a lot by interacting with people on this topic. I don’t want you to feel like I’m arguing with you though. So the GDPR seems fairly bullet proof, but it only applies within the EU. So how about a scenario like this:
Your instance is hosted in the EU and has the full protection of the GDPR. My instance is hosted in the US where the GDRP does not apply. Your instance federates with mine. I federate with Meta. Meta now has your data but they didn’t get it from a GDPR protected source. You consented to give it to me, and I consented to give it to them. They have no obligation to uphold the GDPR because they’ve had no interaction with your instance whatsoever, they’ve simply accepted what I gave them and that transaction occurred within the jurisdiction of the US.
Maybe the GDPR still works here, I don’t know. But I guess my point is that if I can come up with endless scenarios like this, lawyers can too, and they know infinitely more about the law than I do. Hell, they can even come up with their own interpretations of law and act on them for years, only changing their practices when they’re forced to by someone actually suing them. Which by then they’ve already collected and sold millions worth of data.
I feel like outside the federated system, meta would rely on geographic metadata (eg IP address) to identify if a user was within the scope of the GDPR or not. But they aren’t going to have access to any of this information, when they receive the data from another server in the fediverse. There will be zero way for them to identify if a user from any server in the fediverse would be applicable to the GDPR or not, because any user from any country can basically sign up anywhere. It will be difficult for them to argue against that - since it’s highly publicised that when Mastodon was struggling under the strain of the massive influx of new users - that people were being advised to find an instance that aligned to their interests rather than just their geographical location. Indeed I am on a Scottish server - where I arrived in 2019, but I have recently started another account on a US server ( allthingstech.social) so I would indeed be a user protected by GDPR on a US server. Because Meta have no way of knowing where a user comes from, the only thing they can definitely legally do - is process data from their own known users - but they are crossing into dangerous territory the second they start trying to process data from users outside their own instance. In my opinion anyway.
And no I don’t mind debating at all. There needs to be a lot more debate, and a lot less death threats and screaming matches online - in order for us to start resolving anything.
Edit:
The GDPR applies to data on people. So in your example - it doesn’t matter how Meta got the data, the point is that they have data on citizens that are protected by the GDPR, the fact that the data arrived indirectly via a US server, doesn’t remove the protection afforded to the EU citizen
Well, the big issue here is that we sort of don’t have the power you think we do.
What I mean is, say you have 10 servers. 7 are Lemmy, 3 are kbin. Great, each admin has control over those servers. Then you have Meta. They’ll run 1 huge server. When the 10 other servers enable Federation, Meta now has 10 servers of content that isn’t even on their own platform that they can sell. Your data will literally exist on the Meta server because your data is not contained within your instance/platform once it’s Federated. Meta can then harvest the entire Fediverse for data like this. It’s like an absolute wet dream for them. They don’t even have to coax people to use their own platform!
Meta must be defederated the second they so much as dip a toe into the Fediverse or everything you’ve ever done, or do, on any ActivityHub platform will be scooped up and sold.
Edit: And it’s even worse because all it takes is 1 server to Federate with Meta. If server A is Federated with your sever B, Meta can sill pull your data from server A they Federated with, even if your local server B has Defederated with Meta. This is a huge problem.
I’m confused about what kind of data you want to protect. If you mean your posts and comments, they are already publicly availible on the Internet. Meta doesn’t need to make a activitypub app that gets federated with Lemmy (or kbin) to aggregate and sell this data.
Is there an other kind of data that is visible only to server administrators?
I guess the fear is that they’ll monetize others’ content without giving anything back. Like imagine if there was Reddit2 that just took all the content from Reddit but didn’t add their oc back to Reddit. Basically just leeching off and your average user would be incentivized to join “Reddit2” since it had all the content that Reddit has and more. They’d slowly drain users from Reddit to Reddit2 and THEN monetized turning everything to shit (you can use your imagination how’d that look).
Well, they could do that regardless of whether they’re running an ActivityPub service. Nothing’s stopping them from a technical viewpoint
Nothing stopping them, except, you know, the law… They can certainly display content that was not marked for public display. They will then proceed to get sued out of existence… If they do this automatically I’ll just privately post a music file with copyright protected music. Which is perfectly fine to do if it is indeed hidden from everyone. If they then publicly post it that’s on them and now I get to see the Music Industry fight the Zuck :D
Edit: Been corrected, the following is NOT how it works! Original Text follows
Someone correct me if I’m getting details wrong, but from reading this post it appears as if fediverse admins are provided both the username and email accounts registered by those users that have visited their instances.
If that’s true, one problematic scenario I can imagine is when someone has registered on the fediverse with a pseudonym, but has an e-mail address they also use on their real-life Facebook profile. Visiting a Facebook-run ActivityPub instance while logged in would give Facebook enough data to link both the pseudonymous account (with past and future post history), and the real-life Facebook profile.
So, even if you’re not signed up for Facebook’s version of ActivityPub, engaging with it could still be giving Facebook a source of ongoing data for building personal profiles and targeted advertisement that people would not provide on their own.
It’s more about ease and precision. I can see on your post on the website that you posted “5 hours ago”, but in the request, I’d likely be able to see an exact posting time for yours and significantly more posts (hundreds?) simultaneously.
Right… But…
ActivityPub is not a protected encrypted protocol. Everything anyone says on any service using ActivityPub can already be intercepted and harvested by anyone, even blocked instances. The defederating is software based. But for example if someone wanted they could simply do https://mastodon.social/tags/fediverse.rss and there were go, instant access to data from the Fediverse. You can query any Mastodon server for any hashtag you like. That’s just one of many endpoints that will spit out Fediverse content.
What I’m taking issue with is essentially the same thing that is getting Reddit into hot water. Spez is acting like all the content on Reddit is exclusively his. And legally, it probably is, since it exists on his servers. Now if you extrapolate that out to Meta on ActivityHub, any instance that federates with them immediately puts all of your content directly onto Meta’s servers. Once it’s in their possession, it’s legally theirs to do with as they please. If they want to pull a Facebook or Reddit, using your data, they can with no way for you to opt-out. Sure, nothing is stopping people from doing it already, but Meta does not have your best interest in mind. Ever. They’ve shown it again and again. So I think people are preemptively wanting to cut off this spigot of user data to Meta because their abuse of it is a matter of when, not if. Any other company might deserve the benefit of the doubt, but Meta? We know who they are already.
Also, as I said elsewhere, Meta could already use a bot to scrape Lemmy instances, but you can’t sell a bot to investors. But you can sell a platform. Meta will build a slick platform to sell to investors and sit back while federation fills up their instance with data which they’ll turn around and sell the same way they do on Facebook. To me this is very different than setting up an RSS feed.
I completely agree with the overall point you’re making, but would like to correct the legal aspects. I am not a lawyer, but I do have a pretty good understanding of US copyright law which is the most relevant in this case.
Having possession of data isn’t sufficient to legally establish the rights to do as a company pleases. In general, an individual author immediately has copyright on a creative work as soon as it’s recorded in any medium. The main exception to this is “work for hire” — a legal agreement that employers hold copyrights since they’re paying for the work. It’s usually part of the paperwork an established company has you sign when you start a job.
Because of this, and because we users aren’t employees of Reddit, they need a license to duplicate and display our copyrighted posts. The terms of service for any online service almost always stipulate a “worldwide, non-exclusive, perpetual license”. In other words: you still own the copyright to your post and can still share it elsewhere, but by sending it to Reddit, they get to put it anywhere they want and you can’t ever take that right away from them.
If Meta begins slurping up data from the Fediverse, things get tricky. They’re probably violating copyright law if they do that, just as ChatGPT, Google Bard, etc… likely have. However, legal enforcement of our rights would be near-impossible. Everyone who has ever had an account with any of Meta’s properties has most likely agreed to an binding arbitration provision. (These are utterly immoral, they force you — as a precondition of doing business! — to preemptively waive your legal rights before anything occurs that would cause you to need them.) These provisions also prohibit any sort of class action, so each individual person would have to initiate their own case against Meta. And then you’d have to somehow prove to an arbitrator from an organization selected by and paid by Meta that Meta violated your copyright. And Meta’s high-priced lawyers will have all kinds of ways of referencing prior cases to argue why what they did is fine.
So yeah. But again, I completely agree with your main point. Meta will (if they haven’t already) collect all the data they please from the Fediverse and use it to further their business interests. And those business interests are not aligned with our best interests.
Thank you for your clarification! I don’t know any of the legal specifics of this stuff and I very much appreciate you taking the time to help educate me and anyone else who needs it. I can only give a conceptual argument based on the history I’ve seen with these companies, but not any sort of specific knowledge of law.
The gist of what you’re saying, and what we’ve actually seen play out recently, is technically they shouldn’t be able to do this, but they’re going to lawyer it in such a way that they’ll get away with it unless/until someone actually sues them which is prohibitively expensive. We have recently seen class action suits against Meta, but realistically the damage has already been done, the money has already been made, and they go on with finding the next cash cow. Even a multimillion dollar settlement is a drop in the bucket, simply the cost of doing business for these people.
Exactly so! 🙂😭
You bring up an interesting point, because of how the fediverse works, every server (that has an active subscription) essentially has a mirror of the original data. So if Facebook have data from people who never consented to that, then they would surely be breaking GDPR rules? GDPR rules say that they can only PROCESS the data (or mine it - if you want to use a more realistic term) if a user has explicitly agreed to that, implicit agreement doesn’t count. So this is going to interesting to see how they manage this - providing that they don’t process the data and simply present it, as is - they don’t break GDPR, but the second that they start processing it, they breach GDPR. Now - they can process data that belongs to their users, but they would have to write code that ensures they don’t ingest posts from any user that is not a meta user - for the purposes of harvesting it.
Yes, this is exactly the sticky issue we get into. And I’m wondering if lawyers would be able to make a case that using ActivityPub alone automatically gives your consent to have your data exist on an instance outside your own. Once they have data you’ve consented to give they can do with it as they please, essentially arguing you’ve become a consenting party when you consented to federation. I don’t know the GDPR well enough to have any answers, but you can bet Meta lawyers do.
I don’t think Facebook would be having high level NDA-protected talks with Mastodon people if they weren’t trying to work all this out. And by work out, I mean how to monetize/data mine. I’ve been talking about this with people all day, many of whom didn’t see a problem with this, but eventually all of them have had the lightbulb turn on when they realize the potential abuse Meta could do with/to ActivityPub.
If, by some miracle, Meta wants to be the good guy for a change, let them prove it. I would love to see defederation by default, and let Meta prove they’re trustworthy to federate to. And even then, have a really itchy defederate trigger finger if they even hint at pulling another Cambridge Analytica fiasco. But getting everyone on-board with that is probably impossible, especially if Meta starts throwing money around.
Meta can have the data, that part yes you consent to by using ActivityPub software, though there is a whole other argument to get into later about whether “normal” users really understand that. But no Meta absolutely cannot process that data, for creating shadow profiles or anything like that - unless the user explicitly opts in. GDPR is quite clear that you cannot infer that a user agree based on some other influence (in this case the user using ActivityPub) - the user MUST have been presented with a dialog explaining what Meta would do with the data and giving the user the option to say they agree or disagree with it.
Thank you for the clarification there. I hope you don’t mind having this conversation with me, I’m learning a lot by interacting with people on this topic. I don’t want you to feel like I’m arguing with you though. So the GDPR seems fairly bullet proof, but it only applies within the EU. So how about a scenario like this:
Your instance is hosted in the EU and has the full protection of the GDPR. My instance is hosted in the US where the GDRP does not apply. Your instance federates with mine. I federate with Meta. Meta now has your data but they didn’t get it from a GDPR protected source. You consented to give it to me, and I consented to give it to them. They have no obligation to uphold the GDPR because they’ve had no interaction with your instance whatsoever, they’ve simply accepted what I gave them and that transaction occurred within the jurisdiction of the US.
Maybe the GDPR still works here, I don’t know. But I guess my point is that if I can come up with endless scenarios like this, lawyers can too, and they know infinitely more about the law than I do. Hell, they can even come up with their own interpretations of law and act on them for years, only changing their practices when they’re forced to by someone actually suing them. Which by then they’ve already collected and sold millions worth of data.
I feel like outside the federated system, meta would rely on geographic metadata (eg IP address) to identify if a user was within the scope of the GDPR or not. But they aren’t going to have access to any of this information, when they receive the data from another server in the fediverse. There will be zero way for them to identify if a user from any server in the fediverse would be applicable to the GDPR or not, because any user from any country can basically sign up anywhere. It will be difficult for them to argue against that - since it’s highly publicised that when Mastodon was struggling under the strain of the massive influx of new users - that people were being advised to find an instance that aligned to their interests rather than just their geographical location. Indeed I am on a Scottish server - where I arrived in 2019, but I have recently started another account on a US server ( allthingstech.social) so I would indeed be a user protected by GDPR on a US server. Because Meta have no way of knowing where a user comes from, the only thing they can definitely legally do - is process data from their own known users - but they are crossing into dangerous territory the second they start trying to process data from users outside their own instance. In my opinion anyway.
And no I don’t mind debating at all. There needs to be a lot more debate, and a lot less death threats and screaming matches online - in order for us to start resolving anything.
Edit:
The GDPR applies to data on people. So in your example - it doesn’t matter how Meta got the data, the point is that they have data on citizens that are protected by the GDPR, the fact that the data arrived indirectly via a US server, doesn’t remove the protection afforded to the EU citizen