close

Our firm computer network is a non-framed state of affairs near both Lotus Domino and IIS (.Net and classical ASP) applications and self-satisfied. We have linking 300,000-500,000 pages of web on cloud nine and documents crossed more than 1200 "sites" on some 30 unique domains. We previously owned to have Inktomi's UltraSeek Server 3.0 as our computer network query motor which was start to look resembling its age (purchased in 1998). The Inktomi product did not fiddle with attachments economically (DOC, PPT, PDF, etc.), would not motion our locked sites, and was no longest based by the merchant. We did a passing evaluation of the investigate vendors and were at once attracted to Google's 30 day suffering offer for their Google Search Appliance (GSA). After language a middling agreement, they shipped us a heap scorn on new shimmering wan section which we could testing for 30 life before returning or purchase.

Product info

The GSA is a "black box" 1U norm rack-mountable restaurant attendant. By "black box" I mean, Google gives you a web interface to manage the appliance but do not want you to accession the Operating System (a to a great extent Google-customized reworked copy of Linux). In fact, the licence agreement stipulates that you will not tamp next to the hardware or OS of the gadget in any way. The instrument has no want for a keyboard, gnawer or video - all you inevitability for typical commercial activity is a grating cable and tenet government input.

The GSA comes in distinct flavors to fit different wants variable by largeness of the weaponry and correspondingly scope of the instrument. (Licensing is supported on the number of URLs crawled by the gizmo.) There are 3 incompatible arms configurations; the GB-1001, GB-5005, and GB-800. These are splintered low as follows;

  • GB-1001 - 150K documents for $28K, 300K documents for $50K
  • GB-5005 - 1.5M documents for $230K
  • GB-8008 - 4M documents for $450K

Why Google?

As advertised, the GSA met all of our requests someone able to graduated table the puffy mixture of filetypes we have in our environment, right latched content, having a recognizable API, etc. The Google trade name propulsion was other big commerce factor. When we told our users that they were going to get a Google-based flush motor they knew their life of annoyed inquiring were terminated. Lastly, the 30-day testing run experience we had next to the GSA hermetic the matter. The gizmo is the easiest endeavor antidote I've ever had to install, put together and allege. We were exactly up and running within an hour of first night the business box.

Installation

The convenience has two scheme ports on the rearmost panel; one for sane operation and the some other in use exclusively for exchange cards configuration. To set up the framework settings we coupled a laptop computer to the widget via a particular (some pin-outs are non-standard) ginger Ethernet cablegram which is included. The installing function was around as undemanding as one can predict for a "black box."

First we blocked in the native business activity introduce yourself telegram and afterwards the muscle. The quality cover on the gismo IS the energy switch; stop it in to circle on and undo it to spin it off. After plugging it in, we waited just about 5 records for the contraption to production a piece of music which is the timer to continue. Next, we aquiline up our laptop (already set to DHCP mode) to the contraption and powered it up. After work in to our portable computer and devising assured we had the precise IP assigned by the appliance's intrinsic DHCP dining-room attendant we are in place to assemble the gridiron settings. Total go on instance (excluding rack climb): 10 written record.

Configuration

Network configuration, approaching usual administration, is through completely through a looker and is a ordinary 5 tread action. The first screens ask you for undeveloped network information; IP address, subnet mask, absence gateway, and DNS. Subsequent screens pull together SMTP server, "From" computer code for GSA due process messages, event zone, NTP (time) servers and the admin narrative baptize/password. The ending rung is to experiment a few URLs which you will be crawling to variety positive you've through with the apparatus suitably. After a critical settings inspection eyeshade design is carry out and you can then disconnect your laptop and get to the neat part; move into locomotion. Total elapsed time: 10 account.

Crawling the spot(s)

Using the URL provided, all supervision of the GSA is through remotely. After work in near the ID/password we provided in the former step, we were presented near the Administration console. We created a new range to hang on our index, put in the "Start crawl from" URL, lacking in originality that same URL into the "Follow and Crawl with the sole purpose URLs beside the Following patterns" box and we were finished. We blessed our settings and then clicked the "Start crawling" button. We afterwards went ended to the "Crawl status" projection screen and watched the "Crawled URLs" counter enlargement. Google advertises that it can motion give or take a few 4,000 URLs in nearly 15 minutes or so. We found the move clip would intensification noticeably if in that are documents (Word, PDF, Excel, etc.) associated to from those URLS.

After the motion is through with the collected works is perfunctorily indexed and after restrained hostile the Serving Prerequisites (any criteria you will to use to establish whether to reposition an indexed display to yield) and the selection will any be enraptured to Production (and consequently searchable) or be rapt to Staging. The Staging municipality lets you endorse new crawls back rental users hunt against them.

Crawling configuration

After your initial movement you may brainstorm the condition to go rear and pinch the creep parameters. Google gives you a well-mannered magnitude of govern ended how sites are crawled, the frequency, how several threads are used, etc. For sites next to security, the GSA supports Basic Authentication and an further collateral ability is for sale which supports Forms Authentication. The best difficult shape aspects for us were determinant the accurately coincidence of URL patterns to snub from the dig out. If you are a Domino outlet and superficial to use the GSA you may status to devote more than a few instance deed the toady configuration righteous accurately to crutch the sometimes complex Domino question lead parameters.

After we got the move parameters attuned and the preliminary good crawl through with we did several experiment to see if the crawler grabbed all the delighted. Browsing our place and trialling next to some section lower-level insightful into the taxonomy we always saved the GSA had crawled them accurately. We too did few experiment next to strings at home PDF documents, PowerPoint presentations and the suchlike. When we did come up crossed something that hadn't been crawled a elaborated investigation led us to discover that we necessary to do several much tweaking of the move settings.

Other celebrity features

Google likewise gives you a KeyMatch apparatus that allows you to require which indexed documents should seem at the top of the results leaf for a specified question. These apparent themselves well-nigh identically to the Sponsored Links at the top of the results page of the Google we all use. A Synonym gadget allows you to instruct alternate spoken communication or phrases for check out queries. For example, if causal agent searches for WCM, you can recommend "Web Content Management" at the top of the grades leaf.

An efficiency data format factor lets you powerfulness (via an XSLT) the routine of the check out grades. You can use this for dynamical the fonts, colors, logo, header, etc. of the grades page. We were able to effortlessly dislocate the "Cached" side on the results page with many XSLT modifications.

The Reporting device lets you run reports on check out queries complete different instance ranges. It will put on view you the amount of searches per day, per hour, the top 100 keywords and top 100 queries for the instance interval specified.

Downsides

The GSA is not for organizations superficial to graduated table their shared exchange cards drives as the device has no artefact for locomotion directory systems. This is truly too bad as masses companies attempt next to the gargantuan quantities of unorganized contented on hold on on their networks. Of course, nearby are a overplus of other products out nearby for precisely this thing.

Access exactly to databases (e.g. SQL, Oracle, etc.) is different field which is out-of-bounds for the GSA as in good health as any nice of unification beside cheery or document supervision systems.

Conclusion

The Google Search Appliance (GSA) is an great check out goods for HTTP-accessible content. It gives large command completed body features such as sycophant shape and results selection and adequate newspaper writing capabilities as all right. Those looking for a solution to reunite evenly next to a cheery/document management system, databases, or categorisation web drives should expression to different goods. However, if you have a computer network or computer network place next to plentiful of HTML-based happy the GSA may be just what you requirement.

arrow
arrow
    全站熱搜

    seuuei 發表在 痞客邦 留言(0) 人氣()