To configure this crawler to avoid importing unwanted Web pages into your portal:
By default, this crawler follows the Web server's
recommendations about which pages might be of value to automated crawlers.
If you want to ignore these recommendations, clear the Obey
the target site's robot exclusion protocols check box.
In general, these recommendations help limit unwanted content from
being crawled into the portal. However, some sites offer very strict recommendations.
If your crawler is not importing any content from a site, try turning
this option off.
By default, this crawler saves the URLs to imported Web pages in the case used on the source Web site. To change the URLs to lower case, select Convert all URLs to lower case.
To avoid importing content from an area of a Web site or to avoid importing particular pages:
To specify an area to avoid, click Add Exclusion;
then, in the text box, type the URL to the area of the Web site that you
want to avoid.
You can use wildcard notation (*) to make the exclusion more general.
For example, to avoid crawling sales information from a site, you might
type http://mycompany.com*sales.
As a result, this crawler would not import any pages from mycompany.com
that have "sales" anywhere in the URL.
Note: Wildcards are assumed on either side of your text.
For example, if you type sales, the crawler will not import any pages
from any site accessible from
the target URL that has "sales" anywhere in the URL.
Important: If you list exclusions and
inclusions, the exclusions apply only to the included
pages. For example, if you excluded
sales and included http://mycompany.com,
your crawler would import all pages from http://mycompany.com except
for those pages that had "sales" anywhere in the URL.
To remove an exclusion, select the exclusion
and click .
To select or clear all exclusion check boxes, select or clear the box to the left of Exclusions.
By default, this crawler does not crawl or import any pages specified in the exclusions. If your crawler will navigate from a link on an excluded page to a page that is not excluded and that should be imported, choose Crawl excluded pages, but do not import them.
To limit your crawl to an area of a Web site or a particular page:
To specify where this crawler may crawl, click
Add Inclusion;
then, in the text box, type the URL to the area of the Web site to which
you want to restrict your crawl. Because Web sites can contain links to
other sites, you might want to use inclusions to keep your crawler on
a particular site. To avoid crawling other sites, add the base URL of
the site you want to crawl to the inclusion list; for example, http://mycompany.com.
You can use wildcard notation (*) to make the inclusion more general.
For example, if you want to crawl only information on single sign-on (SSO),
you might type http://mycompany.com*sso.
As a result, this crawler would import only pages from mycompany.com that
have "sso" anywhere in the URL.
Note: Wildcards are assumed on either side of your text.
For example, if you type sso, the crawler will import any pages
from any site accessible from
the target URL that has "sso" anywhere in the URL.
Important: If you list inclusions and
exclusions, the exclusions apply only to the included
pages. For example, if you included http://mycompany.com
and excluded sso, your crawler
would import all pages from http://mycompany.com except
for those pages that had "sso" anywhere in the URL.
To remove an inclusion, select the inclusion
and click .
To select or clear all inclusion check boxes, select or clear the box to the left of Inclusions.
To display the page associated with this help topic: