Version 1 (modified by 2 months ago) ( diff ) | ,
---|
Etiquette for automated accesses to pages on virtualbox.org
It's a waste of time but the web admins for virtualbox.org
have to spend an increasing amount of time to fend off server overload caused by rogue crawlers and scripts which access certain pages over and over again.
- Apply common sense.
- Make sure that there is enough time between accesses.
- Remember you are not the only person who is doing automated accesses.
- Any frequent access can cause overload of the servers due to resource limitations.
- Honor
robots.txt
. - Be especially careful with crawling every link. Some links are rather expensive and exist in many variants (sort order and the like) which will not give you more information but multiplies the load.
- It makes no sense to get every revision of every file in the browser of the
vbox
repository. It just uselessly multiplies the load and traffic use by approximately a factor of 100000.
This list will be updated as needed (even though item 1 covers it all and all the rest isn't rocket science).
Any violations will result in blocking by User Agent, IP range or whatever else we think is appropriate. Currently we use HTTP status code 410 Gone
for this purpose but this may change.
If you think you're being blocked for no good reason (possibly because someone else in the same IP range had to be blocked) you can reach a human at tra
c@virt
ualbox
.org
. Please explain what you intend to do with the automated accesses and how often it has to be done to be useful.