I’m trying to set up a personal Lemmy instance, and I’ve got it running but it doesn’t seem to sync very well with posts and comments made before the instance was created. I ran [lemmony] (https://github.com/jheidecker/lemmony) to get the /all to work correctly and to start syncing communities, but now when I go to some communities and I look at the posts created before I subscribed to the community, they either don’t show up or don’t have the correct number of upvotes/comments. Also, when I search for communities, next to the community name is only the number of users from my instance subscribed, not the actual number of subscribers to the community. Is there a way to fix this?
I ran [lemmony] (https://github.com/jheidecker/lemmony) to get the /all to work correctly and to start syncing communities
You shouldn’t run this script, at least not in its default config. It’s more work, but a much better approach is hitting lemmyverse.net and manually subscribing to a bunch of communities you’re interested in.
Unless you’re very careful managing defederation and blocks, it subscribes you to thousands of communities you’ll never read (including the deep/dark parts of the lemmyverse posting porn/loli, piracy, and hate-speech) that will be cached on your server, re-served to the public internet from your instance, and may have legal repercussions in your jurisdiction.
It also increases the federation load your server generates by 50x or more compared to a “normal” single-user instance that subs to 100 communities or so… which doesn’t make you a very good fediverse citizen at a time when federation is being flaky and overloaded throughout the lemmyverse.
… when I go to some communities and I look at the posts created before I subscribed to the community, they either don’t show up or don’t have the correct number of upvotes/comments…
This is expected. To a first approximation, subscribing to a community asks the sending server to forward a copy of every post/comment/vote from now on. There’s no significant historical backfill (although there are a few ways to get your instance to download a particular old post, or a handful of them).
But when you first subscribe, you’d expect to be missing old posts, and then to have posts right on the border have some comments missing depending on when they were made. New posts and comments should generally show up in an orderly fashion, except for the global issues with federated replication that cause many servers to struggle to stay exactly in sync.
… when I search for communities, next to the community name is only the number of users from my instance subscribed, not the actual number of subscribers to the community.
This is expected, subscriber counts are not federated. You need to visit the community’s instance to see the global sub count.
… viewing posts by All gives a very limited view of what’s out there, and uncomparable to viewing All posts on R*ddit. The tool mentioned, while definitely could use some adjusting, alleviates that issue a lot more.
I understand the goal of the tool, the defaults are a really bad approach at achieving it and the docs are really bad at identifying the pitfalls. A tool that subscribes to a list of communities provided in a text file would be great. Subscribing to the entire lemmyverse is a solution that creates problems that are worse than the discovery problem.
Definitely amusing to see you gauge piracy on the same level as hate-speech or porn/loli. Not that I have any opinions about the matter, but amusing regardless.
All of this content seems fairly clearly to me to fall into the category “content that can cause legal liability for the hoster depending on their jurisdiction”. Is that a controversial point of debate?
I run my own tool, written by myself, subs to about ~800 communities of a certain defined activity threshold, of which have about more than 50 users/month, my metrics have indicated a disk space usage of about 2GiB/day, 20% of a single CPU core, and about 8~10GiB/traffic a day. Is this workable for a tiny instance on a Pi? Probably not, but it is what it is, and while I think my fediverse activity is not agreeable, I try to take steps to alleviate that by manually unsubscribing from the communities that I absolutely have no interest in.
This all sounds eminently reasonable. 800 subs is a lot, but it’s much more reasonable than the 7k subs this tool leaves you with in it’s default config, and if you further curate it manually and that’s what it takes for your feed to feel lively… then go for it.
Maybe consider releasing it? I totally agree that community discovery is rough all over, and moreso on tiny instances. A tool to help folks bootstrap 50-200 communities and that did a good job documenting the tradeoffs of oversubscription and helped folks identify/avoid legal risk would be a huge step up from the “subscribe all” approach.
I definitely agree with the part about jurisdiction, but content serving is still done from the original instance…
Content is NOT served from the original instance. https://lemmy.world/post/1191149 shows a post that was made to a community on
lemmy.ml
. Because there are subscribers onlemmy.world
, that post is replicated there. Any unauthenticated user on the internet can view that post, the content was pulled out of the db onlemmy.world
and sent out fromlemmy.world
’s ip and over its internet connection. By every legal definition I’ve encounteredlemmy.world
is serving that post and subject to any legal complications that entails. The only exception I’m aware of are full-size images, which don’t replicate. Thumbnails do though, so that provides no protection. You host the image content, just at reduced quality.… and while I’m not a lawyer, I think the most severe legal threat might be just a takedown.
This is also not true. In the US, you have to register a copyright agent to receive the kinds of protection typically associated with commercial hosts. If you fail to do so, I believe that you run the risk of just getting sued out of the gate for copyright issues. There are also almost certainly jurisdictions where hosting gay porn or certain political speech is a “straight to jail” kind of maneuver.
Of course, I have no evidence that OP is in a particularly dangerous jurisdiction. But my broader point is that new users of single-user instance often don’t consider that they may be signing up to host legally risky content that they themselves didn’t create, view, or want. If one curates their list of subs, they can gauge for themselves what communities they consider to be risky. If they “subscribe all”, they WILL be serving to the unauthenticated public internet the worst of the lemmyverse without realizing it… which is an entirely avoidable situation.
I believe the author of Lemmony has already patched the code to only subscribe to the top instances, which shouldn’t leave anyone with 7k subscriptions.
They offered an option to limit the sub count, but the default is still unlimited. They seem aggressively against more sensible defaults in other posts.
It’s also worth noting that there’s an upper limit on the number of communities you choose to federate with, while there doesn’t seem to be an upper limit on the blocked communities
That does make sense, I’ll probably go through and do that. I just wish there was a better way of sorting by all and historical syncs
Yeah, it’s confusing if you’re not real steeped in how ActivityPub works… and admittedly not real intuitive.
If you choose to wipe the existing subs you made with the tool, you might want to delete the user you used to subscribe with them. I don’t think wiping your db or server will tell the remote servers to unsub you. But I think deleting the user will clear all their subscriptions (not 100% sure though).
The normal way to do this would be to unsub every community, but I don’t think this tool handles rolling back like that.
But when you first subscribe, you’d expect to be missing old posts
OP didn’t expect to be missing old posts, hence his question. I had the same surprising discovery. Not sure how the UX could be improved to convey to the user what is actually happening.
with posts and comments made before the instance was created.
Lemmy only syncs content forward in time, there is no backfill.
don’t have the correct number of upvotes/comments. Also, when I search for communities, next to the community name is only the number of users from my instance subscribed
These are all normal behaviors of Lemmy.
But, theoretically, from now on posts should sync like normal and have the correct amount of comments/upvotes, correct? Also, is there a way to change the community member behavior or no?