About Content Crawlers

Create a crawler to import content into your portal from external content repositories. You must run a job associated with the crawler to periodically search the external repository for content and import that content. For information about jobs, see About Jobs.

Note: Crawlers depend on data sources. For information on content sources, see About Content Sources.

This topic discusses the following information:

To learn how to create or edit administrative objects (including crawlers), click here.

Web Crawlers

A Web crawler allows users to import content from the Web into the portal.

To learn about the Web Crawler Editor, click one of the following editor pages:

Remote Crawlers

A remote crawler allows users to import content from an external content repository into the portal.

Some crawl providers are installed with the portal and are readily available to portal users, but others require you to manually install them and set them up. For example, Plumtree provides the following crawl providers:

Note: For information on obtaining crawl providers, contact Customer Support. For information on installing crawl providers, refer to the Installation Guide for Plumtree Corporate Portal or the documentation that comes with your crawl provider, or contact your portal administrator.

To create a remote crawler:

  1. Install the crawl provider on the portal computer or another computer.
  2. Create a remote server.
  3. Create a Content Crawler Web Service (discussed next).
  4. Create a remote content source.
  5. Create a Remote crawler.

To learn about the Remote Crawler Editor, click one of the following editor pages:

The following crawl providers, if installed, include at least one extra page to the Remote Crawler Editor:

Content Crawler Web Services

Content Crawler Web Services allow you to specify general settings for your remote content repository, leaving the target and security settings to be set in the associated remote content source and remote crawler. This allows you to crawl multiple locations of the same content repository without having to repeatedly specify all the settings.

Note: You create Content Crawler Web Services on which to base your remote content sources. For information on content sources, see About Content Sources.

To learn about the Content Crawler Web Service Editor, click one of the following editor pages:

Importing Document Security

Users can automatically be granted access to the content imported by some remote crawlers. The Global ACL Sync Map shows these crawlers how to import source document security.

For an example of how importing security works, click Importing Security Example.

Troubleshooting the Results of a Crawl

You should check the following if your crawler does not import the expected content: