SourceHut continues to face disruptions due to aggressive LLM crawlers. We are continuously working to deploy mitigations. We have deployed a number of mitigations which are keeping the problem contained for now. However, some of our mitigations may impact end-users.
In particular, we have deployed Nepenthes to certain routes which are associated with large volumes of LLM-related traffic. You may encounter certain pages which are not usable as a result, especially if you are not logged in.
Mitigations only affect the web frontend of SourceHut: SSH access, git operations, API access, and so on, should behave normally.
We understand that some of our mitigations are user-impacting. We apologize for the inconvenience. These measures are temporary, but we do not have an estimate for when they will no longer be required. To be honest, we are running out of ideas for how to deal with these LLM bots. Your patience is appreciated.
If you are having problems using the SourceHut web UI:
First, log into your SourceHut account. Logged-in users bypass most of our mitigations. If that does not work, please contact support on IRC or via email.
If your cloud server is unable to reach SourceHut:
We have unilaterally blocked several cloud providers, including GCP and Azure, for the high volumes of bot traffic originating from their networks. If your cloud server is experiencing problems using SourceHut, and you have a legitimate reason to do so, you must email support to request an exception. Please explain your use-case and include a list of affected IPs and/or subnets.
We kindly ask the administrators of SourceHut integrations to program their software with responsible usage patterns. If possible, we request that you prefer webhooks over polling for updates. If your integration performs git operations, please prefer to use git fetch to update a persistent repository, or use a shallow git clone, rather than performing a fresh clone each time your automation runs. We also request that you set a User-Agent string for your traffic which identifies your software and includes an email address that we can contact with questions and feedback, as well as clearly identifying your traffic as non-malicious so we do not mistakenly apply mitigations to you.
If you are using git(1) for git operations, you can set a User-Agent by setting
the GIT_HTTP_USER_AGENT
environment varaible accordignly.
If you would like advice on making your integration more efficient, or setting up webhooks, please contact support for assistance.
(08:30 UTC — Mar 17)