Trouble with URLs in Google Sitemap

August 7, 2008

My favorite client is having trouble with a few URLs in their XML sitemap.  The sitemap has been submitted to Google, and Google gobbled it up and deemed it palatable enough via Webmaster Tools.

But there is something really weird going on with a handful of URLs that end in index.html.

Here is an example:
This URL is in the sitemap:
http://www.xxxxx.com/campaigns/usa/unifiedcommunications/index.html 

Here is a snapshot of the error Google returned for the above URL:

sitemap_error2.jpg

This site uses both .htm and .html as file extensions and there are index.htm and index.html pages listed in the sitemap.  Is this discrepancy encouraging Google to access the pages at the root level of the directory.  All of the URLs in the sitemap include the full path, so Google shouldn’t be forced to do any guesswork.

Any ideas what is going on here?

Comments

Got something to say?