In an attempt to give my readers a little insight as to "what goes on behind the scenes," I have posted the following news update.
Time to start the latest Quick Rant.
This is the longest news update I have had in awhile. The reason? I am banging my head up against the monitor.
I, once again, fired up the automatic banning of IP addresses last night. This is due to my desire to stop "bad" robots from sucking too much bandwidth. More information on this practice is located in my Abuse Rant. Basically, I implemented a "hidden link" which all "good" robots, including all major search engines, would ignore.
I had a reader contact me in distress saying they did "nothing wrong" and that they are using a plain version of IE6. Some proxy servers, pre-fetchers and firewall’s chose to ignore the robots.txt standard.
After reviewing the log files, I am attempting to figure out with this person, the exact "reason" this particular version of IE6 is attempting to "pre-fetch" links that are not valid and, as a result, causing my server to flag the IP for abuse. Whether or not this person is using any of the previously mentioned products is unknown at this time.
Once again, I have temporarily removed that particular function from the server. However, even though IP addresses are not automatically banned, I am notified immediately of the spidering attempt and the logs will remain until I can narrow down the cause of this problem.
A cut and paste from the web server log file, and my explanation of the issue will follow: (The actual IP address is removed for obvious reasons).
x.x.x.x - - [22/Dec/2003:12:45:26 -0800] "GET /WinXP/servicecfg.htm HTTP/1.1" 200 9027 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
The above log file line (even though it may display in your browser as "several lines," for the sake of argument, it actually is only one line) denotes the page where this person entered the domain. Having no "referer" (sic) information logged (the "-" after the "200 9027") is how I came to that conclusion.
The next several lines is the "normal" traffic. This includes the "referer" (sic) header, which is valid and tells me that "the browser requested the information because of accessing the above page." One such entry is shown below:
x.x.x.x - - [22/Dec/2003:12:45:26 -0800] "GET /css/20031222basic.css HTTP/1.1" 200 266 "http://www.blackviper.com/WinXP/servicecfg.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
This shows the requesting page as "servicecfg.htm" and it is also requiring a download of "20031222basic.css." This is normal traffic. However, the following request should not be there and is directly "after" the normal logging of traffic patterns:
x.x.x.x - - [22/Dec/2003:12:45:26 -0800] "GET / HTTP/1.1" 200 4581 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
The above line tells me that, in only one second since the first request, the root "index" page ("GET /") was requested by the browser, but it has no "referer" (sic) header attached like the "normal" requests do. Three seconds later, the invalid link is spidered and the IP address was automatically banned. This particular "hidden" link is also the "first link" appearing in my XHTML code. However, It gets better.
The next two lines is what frightens me the most:
x.x.x.x - - [22/Dec/2003:12:45:40 -0800] "GET /AskBV/XP25.htm HTTP/1.1" 200 211 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
x.x.x.x - - [22/Dec/2003:12:45:49 -0800] "GET /AskBV/XP25.htm HTTP/1.1" 200 211 "http://www.blackviper.com/WinXP/servicecfg.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
On the original page, the reference to XP25.htm is the "next" link in the code. However, the first request had no referer (sic) header information (as noted in the log file by "-" after the "200 211"). The second line "does" have the referer (sic) information logged only 9 seconds later, just as if the person actually "clicked" the link and attempted to go to that page.
The burning question I have is "what on Earth is causing IE to pre-fetch links?"
When that question is answered, I will rest better at night.
I am sure this issue has blocked other legitimate readers and I apologize. My intentions are only good by attempting to protecting my server "from the bad folk."
Other people have wrote to me and, in a matter of speaking, "If you do not want people to visit your site, take it down!" That is not the issue. I am not blocking legitimate traffic (well, except for the unknown cause outlined above). What I am attempting to do is stop the complete download of my domain for no reason other than "because it is there."