Over the summer I moved my blog from livejournal to wordpress. There were a lot of reasons to do this, and overall my experience has been very good with wordpress.
Once google indexed me, it came up with the following:
|Includes personal information, photographs, family, and friends.
dague.net/ – 34k – Cached – Similar pages – Note this
Which was odd. That description didn’t show up anywhere on my site. At first I marked it up to finding a new wordpress installation, but others didn’t seem to have it. Then I marked it up to the xfn tag in the headers, but removing that didn’t seem to help either.
Then, today, I found something that makes this officially declared as a mystery. I did a google search on the phrase If you search google on the phrase “Includes personal information, photographs, family, and friends.”. Guess how many hits are found?
No… really… guess.
I don’t think you actually guessed yet…
Seriously, this is more fun if you play along.
|Web||Results 1 – 6 of 6 for “Includes personal information, photographs, family, and friends.”. (0.24 seconds)|
6… 6 ?!?!
And I am 5 of the 6 hits. Ok, what is going on here? Anyone with any theories would be appreciated. While it is amusing, I’d love to actually get real content indexed for dague.net again. Feedback appreciated as comments.
8 thoughts on “A google mystery”
Try the search without the quotes. It looks to me like you were assigned a subset of a larger set of attributes which happened to be unique to your page. Other entries in the google directory are very similar, but none of them are exactly the same. I don’t know if wordpress or google assigned you those attributes, but it was probably based on some algorithm looking for keywords, links and images that assigns these attributes.
Well searching for the exact string is interesting for the following reason: the only place the string exists is on google indexes of my content, and the only place it is found is in things which index the google hits for my site.
So, I’ve got the fact that the google description of my website is a self reinforcing piece of meta data that google invented, and only has applied to my website.
If the set of attributes was large enough, then it’s plausible that you were the only one assigned this particular subset of attributes in this particular order. Add to that the possibility that the algorithm was modified very recently, and until now was not producing that particular subset of attributes in that particular order.
I tried doing a search for “Includes personal information, friends, photographs, and snowboarding.” and I got only 3 results, all of which refer to a Chris Davy, who is on the same page as you are.
Same for “Includes personal information and poetry.” 4 results, all for Dan Dempsey.
If you tack enough words together the probability that google has indexed another page with the exact same combination of words decreases dramatically.
Furthermore, check this out:
“Includes personal information, photographs, family, and friends” -> 6 hits
“Includes personal information, photographs, family” -> 28 hits
“Includes personal information, photographs” -> 230 hits
“Includes personal information” -> 63,500 hits
“Includes personal” -> 597,000 hits
“Includes” -> 627,000,000 hits
Try graphing those data points and you’ll get something like a logarithmic curve where x is the number of words, and y is the number of hits.
Yes, very true. But I think you are missing the point. 🙂
Google shows that specific string as my page summary in it’s index. It doesn’t show any content from my page there, just that summary string. The only place that text exists on the internet is on google’s summary of my page.
However, I may have just figured out the mystery. Google seems to be joining in dmoz information to their index now. The dmoz description that I put in place 7 years ago seems to be trumping found content on my homepage (which is somewhat bad behavior on google’s part I think). I think the change of my website was coincidental to this other issue.
I was only addressing the uniquness issue. I had no idea where the summary string came from, but it did seem reminiscent of your old web page.
Also, it seems to me that Google is slowly moving from the “innovation” phase to the “assimilation” phase of their business. Next comes “domination”, “stagnation”, and “superannuation”, at least if they follow Microsoft’s lead.
Right, I think the point of crossing that we were having wasn’t that “it’s amazing that long strings have a small number of google hits”, it was “it’s amazing that the summary google has for my web page (which is not in any of my content on my website) is a long string that exists no where else on the internet, so can’t be marked up to some bucketing of blogs or the like”.
Anyway, now I think the mystery is more or less solved.