[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1033632: [External] Debian Bug #1033632 - SourceForge RSS feed rate limit



On Wed, 19 Apr 2023 17:12:09 -0400 Federico Grau wrote:
> Copying sf reply to Debian bug #1033632 , as requested by pabs, to enable
> Debian members to analyze.
> 
> On Tue, Apr 18, 2023 at 08:35:03AM -0600, SourceForge.net Support & Ops wrote:
...
> > We've checked our logs for the past week and see 209.87.16.61 with
> > user-agent "Python-httplib2/$Rev$" has hit a couple of RSS feeds, but has
> > not received any 429 status from our rate limits.

This is Planet Debian, I guess some blogs are on SourceForge.

> > There is an IPv6 address 2607:f8f0:614:1::1274:73 (which is also
> > qa.debian.org it seems) that is sending a lot of traffic. Nearly all of it
> > is with a user-agent of "Mozilla/5.0 (X11; U; Linux i386; en-us)
> > AppleWebKit/531.2+ (KHTML, like Gecko)", and hitting non-RSS feeds, it is
> > hitting /projects/dispcalgui/files/... URLs over and over.

This is caused by fakeupstream.cgi, which also has a SourceForge
redirector, which recursively scrapes SourceForge files pages instead
of using the RSS feed. It likely dates from before the RSS feed.
There are only 3 packages using it, but none of them are dispcalgui.

https://codesearch.debian.net/search?q=fakeupstream.cgi?upstream=sf/&literal=1

I temporarily disabled the web server IP address privacy in order
to find out where the requests are coming from and found Msnbot IP
addresses. Then I noticed the User-Agent is bingbot/2.0. I also
verified that the IP addresses are legitimate bingbot addresses.

https://en.wikipedia.org/wiki/Msnbot
http://www.bing.com/bingbot.htm
https://www.bing.com/webmasters/help/verify-bingbot-2195837f

For now I have blocked bingbot from accessing fakeupstream.cgi
and then requested that it stop accessing fakeupstream.cgi:

https://salsa.debian.org/qa/qa/commit/37ada830d0c2c1ece51e7622910014b8ec047909
https://salsa.debian.org/qa/qa/commit/4893d7fce8537d6978ace6484889d3e5efe34af5

This has stopped the flood to SourceForge and hopefully will stop the
flood to fakeupstream.cgi, so this bug can likely be closed now, but...

There are some improvements that we could make to QA services:

 * pass on HTTP error codes from services fakeupstream.cgi accesses
 * switch fakeupstream.cgi SourceForge support to using the RSS feed
 * switch fakeupstream.cgi/sf.php User-Agents to legitimate ones

If anyone would like to work on these, please submit a merge requests.
If no-one does these fixes, then I may get to them eventually.

> > A different pattern from that address does hit RSS feeds and has no
> > user agent.

That is likely to be the regular SourceForge redirector.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: