Archive for Google Mini

Getting the Right Results

One of the challenges with the Mini has been getting it to return results without being redundant. We are using an ISAPI rewrite to make our URLs search engine friendly.

Rather than:
mysite.com?product_id=365&department_id=2
our URLs look more like
mysite.com/p/365,2_Big-Purple-Widget.html

The rewriter knows that the “365″ the key for the product and that “2″ is a key for a category and the server can then render the page accordingly. The “Big Purple Widget” part is merely a way of introducing keywords into the URL to benefit our positioning in organic search engine results.

The challenge is that the same product can appear under many different categories and can have as many different URLs.

Consider the following URLs:
mysite.com/p/365,11_Big-Purple-Widget.html
mysite.com/p/365,900_Big-Purple-Widget.html
mysite.com/p/365,,_Big-Purple-Widget.html

Each URL displays the same data, with the second numeric sequence (11 or 900) affecting the way that category navigation tree is rendered. The last URL is a direct path to the product.

The Mini sees each of the pages above as different pages and will return each page for any query for big purple widgets. Using the “filter=p” or “filter=1″ option in the URL does not achieve the desired results because those filters respectively screen out duplicate information in the same directory and duplicate snippets. Using the googleoff/googleon comment tags didn’t work either.

The key to solving the issue was to leverage the last URL in the series, the direct path with no category information. I wrote a script to generate a page of links to product pages for the Mini to crawl. Secondly, I instructed the Mini by way of a regular expression to not return results for any pages that had the second set of numerals (the category id).

It works nicely. The only problem is that with over 30,000 unique SKUs and over 6,000 product display groups, the links page is quite large and takes too long to render if pulled from the database. I solved this problem by using a schedule task to create the pages on a nightly basis, splitting them alphabetically to keep their size down (the Mini won’t index an HTML document larger than 2.5 MB).

Extensible Stylesheet Language Transmogrifications

Now that the Mini is up and running, I’ve got to get jiggy with XML transformations using XSL. I’ve dabbled in this before and have even bought a book, but being as busy as I am I usually never get to the books I buy until I actually have to use the technology featured in said book.

XSL is really pretty simple. You have to be able to use XPath syntax, but that’s not that big a deal. W3Schools.com is your friend. Fortunately the Mini comes with default XSLT that you can edit. Trust me, this is a good thing, because it wasn’t until after I looked at the default stylesheet that I realized two things:

  1. I’m a real amateur at XSLT. What I don’t know about XSLT would fill a small country, larger than Luxembourg but smaller than Madagascar.
  2. There was quite a bit I hadn’t even considered having to process. Search within results pages, advanced search pages, etc.

The great thing is that the Google people do some things with XSLT that I hadn’t even considered possible (or necessary). They manage to provide a very modular approach to the stylesheet that makes modification and extension of the code very simple.

Them Google folks, they know their bidness.

Good Things Come In Small Packages

The Google Mini is here!!! It arrived on Tuesday and our network administrator got it installed and ready to go at about noon yesterday (Thursday). Huge kudos to him for getting it up and running so quickly, seeing as we had another minor little project this week.

I set it up to crawl the site straight away. It ran overnight…and hit the threshold of 50,000 documents rather quickly. Looking through the logs, it looks like the SEO-friendly URL ISAPI transmogrifier is causing some issues. So I’ve got to program the device to exclude non-friendly URLs and have the Mini recrawl the site.

Even with this minor issue, the potential of the Mini is really exciting. The device has a test page where you can run sample searches and it is already stellar. The improvement over our current site search will be exponential.