[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#569191: marked as done (crawler not allowed to perform ?action=raw)



Your message dated Tue, 11 May 2010 08:26:51 +0200
with message-id <1273559211.3503.222.camel@solid.paris.klabs.be>
and subject line [Debian Wiki] crawler not allowed to perform ?action=raw
has caused the Debian Bug report #569191,
regarding crawler not allowed to perform ?action=raw
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
569191: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=569191
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: libwww-perl
Version: 5.834-1
Severity: important

Hi, 

we use GET to download a wikipage and further process the data to
prepare the manual of Debian Edu. The command:
	GET "http://wiki.debian.org/DebianEdu/Documentation/Lenny/AllInOne?action=raw";
works fine in Lenny, but stopped working in squeeze where "You are not
allowed to access this!" is returned. If you remove "?action=raw" from
the URL anything is fine. Is this inteded and we have to provide a
header?

Regards, 
	 Andi
 



-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-nouveau.git (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages libwww-perl depends on:
ii  libhtml-parser-perl           3.64-1     collection of modules that parse H
ii  libhtml-tagset-perl           3.20-2     Data tables pertaining to HTML
ii  libhtml-tree-perl             3.23-1     represent and create HTML syntax t
ii  liburi-perl                   1.52-1     module to manipulate and access UR
ii  netbase                       4.40       Basic TCP/IP networking system
ii  perl                          5.10.1-9   Larry Wall's Practical Extraction 

Versions of packages libwww-perl recommends:
ii  libhtml-format-perl           2.04-2     format HTML syntax trees into text
ii  libio-compress-perl           2.022-1    IO::Compress modules
ii  libmailtools-perl             2.05-1     Manipulate email in perl programs
ii  perl [libio-compress-perl]    5.10.1-9   Larry Wall's Practical Extraction 

Versions of packages libwww-perl suggests:
ii  libcrypt-ssleay-perl          0.57-2     Support for https protocol in LWP
ii  libio-socket-ssl-perl         1.31-1     Perl module implementing object or

-- debconf-show failed



--- End Message ---
--- Begin Message ---
retitle 569191 crawler not allowed to perform ?action=raw
thanks

Andreas B. Mundt wrote:
> we use GET to download a wikipage and further process the data to
> prepare the manual of Debian Edu. The command:
> 	GET "http://wiki.debian.org/DebianEdu/Documentation/Lenny/AllInOne?action=raw";
> works fine in Lenny, but stopped working in squeeze where "You are not
> allowed to access this!" is returned. If you remove "?action=raw" from
> the URL anything is fine. Is this inteded and we have to provide a
> header?

Damyan Ivanov wrote:
> On Lenny (works)
> ================
> User-Agent: lwp-request/0.810
> 
> On Sid (breaks)
> ===============
> User-Agent: lwp-request/5.834 libwww-perl/5.834

Yes, this is moinmoin standard behavior.
The wiki engine has some surge protection mechanisms, to avoid web
crawlers (and users) from DoS'ing the wiki.
Well known web crawlers (including libwww-perl/*) are only allowed to
fetch html rendered pages.

As it was mentioned, you should change your crawler's user-Agent string
(use something meaningful, so the admin can get in touch with you,
rather than just blacklisting the "offending" IPs)

Thanks,

Franklin



--- End Message ---

Reply to: