Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.

As of today, lemmy-ui does not allow hiding non-local (or any) communities from Google and other search engines. If you, like me, do not want your instance to be associated with other content, you can add a custom robots.txt and response headers to avoid indexing.

In nginx, simply add this:

# Disallow all search engines
location / {
  ...
  add_header X-Robots-Tag noindex;
}

location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: *\nDisallow: /\n";
}

Here’s a commit in my fork of the lemmy-ansible playbook. And here’s a corresponding issue I opened in lemmy-ui.

I hope this helps someone :-)

  • binwiederhier@discuss.ntfy.shOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    There are plenty of instances that copy the original content. As an instance owner that runs a only a single project specific community, I should be able to decide what content is available on my domain, and what isn’t. Don’t you think?

    Aside from the questionable content, there is also legal issues around it that I’d rather not deal with.

    • NXL@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Yes, its your choice. I would prefer it if this is barely done to increase the likely hood of information being indexed and easily found on google searches though.