Web Page Exclusions

To configure this crawler to avoid importing unwanted Web pages into your portal:

By default, this crawler follows the Web server's recommendations about which pages might be of value to automated crawlers. If you want to ignore these recommendations, clear the Obey the target site's robot exclusion protocols check box.

In general, these recommendations help limit unwanted content from being crawled into the portal. However, some sites offer very strict recommendations. If your crawler is not importing any content from a site, try turning this option off.
By default, this crawler saves the URLs to imported Web pages in the case used on the source Web site. To change the URLs to lower case, select Convert all URLs to lower case.
To avoid importing content from an area of a Web site or to avoid importing particular pages:

To specify an area to avoid, click Add Exclusion; then, in the text box, type the URL to the area of the Web site that you want to avoid.

You can use wildcard notation (*) to make the exclusion more general. For example, to avoid crawling sales information from a site, you might type http://mycompany.com*sales. As a result, this crawler would not import any pages from mycompany.com that have "sales" anywhere in the URL.

Note: Wildcards are assumed on either side of your text. For example, if you type sales, the crawler will not import any pages from any site accessible from the target URL that has "sales" anywhere in the URL.

Important: If you list exclusions and inclusions, the exclusions apply only to the included pages. For example, if you excluded sales and included http://mycompany.com, your crawler would import all pages from http://mycompany.com except for those pages that had "sales" anywhere in the URL.
To remove an exclusion, select the exclusion and click .
To select or clear all exclusion check boxes, select or clear the box to the left of Exclusions.

By default, this crawler does not crawl or import any pages specified in the exclusions. If your crawler will navigate from a link on an excluded page to a page that is not excluded and that should be imported, choose Crawl excluded pages, but do not import them.
To limit your crawl to an area of a Web site or a particular page:

To specify where this crawler may crawl, click Add Inclusion; then, in the text box, type the URL to the area of the Web site to which you want to restrict your crawl. Because Web sites can contain links to other sites, you might want to use inclusions to keep your crawler on a particular site. To avoid crawling other sites, add the base URL of the site you want to crawl to the inclusion list; for example, http://mycompany.com.

You can use wildcard notation (*) to make the inclusion more general. For example, if you want to crawl only information on single sign-on (SSO), you might type http://mycompany.com*sso. As a result, this crawler would import only pages from mycompany.com that have "sso" anywhere in the URL.

Note: Wildcards are assumed on either side of your text. For example, if you type sso, the crawler will import any pages from any site accessible from the target URL that has "sso" anywhere in the URL.

Important: If you list inclusions and exclusions, the exclusions apply only to the included pages. For example, if you included http://mycompany.com and excluded sso, your crawler would import all pages from http://mycompany.com except for those pages that had "sso" anywhere in the URL.
To remove an inclusion, select the inclusion and click .
To select or clear all inclusion check boxes, select or clear the box to the left of Inclusions.