VirtualBox

Etiquette for automated accesses to pages on virtualbox.org

It's a waste of time but the web admins for virtualbox.org have to spend an increasing amount of time to fend off server overload caused by rogue crawlers and scripts which access certain pages over and over again.

  1. Apply common sense.
  2. Make sure that there is enough time between accesses.
  3. Remember you are not the only person who is doing automated accesses.
  4. Any frequent access can cause overload of the servers due to resource limitations.
  5. Honor robots.txt.
  6. Be especially careful with crawling every link. Some links are rather expensive and exist in many variants (sort order and the like) which will not give you more information but multiplies the load.
  7. It makes no sense to get every revision of every file in the browser of the vbox repository. It just uselessly multiplies the load and traffic use by approximately a factor of 100000.

This list will be updated as needed (even though item 1 covers it all and all the rest isn't rocket science).

Any violations will result in blocking by User Agent, IP range or whatever else we think is appropriate. Currently we use HTTP status code 410 Gone for this purpose but this may change.

If you think you're being blocked for no good reason (possibly because someone else in the same IP range had to be blocked) you can reach a human at trac@virtualbox.org. Please explain what you intend to do with the automated accesses and how often it has to be done to be useful.

Last modified 2 months ago Last modified on Sep 12, 2024 11:12:27 AM
Note: See TracWiki for help on using the wiki.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette