Advanced Settings (Crawler)

To specify the language of content, what to do with rejected documents, and a crawler tag:

  1. Under Content Language, in the drop-down list, choose the language in which the majority of content that you want to import is written.

  2. Under Rejected Documents, specify what to do with documents that do not successfully sort into a folder:

  3. If you are editing an existing crawler, you see the section Importing Documents. Under Importing Documents, specify whether to import only new documents. By default, this crawler attempts to import only new documents (those that have not been previously imported by this crawler or other crawlers that access this same content source). You can change the crawler setting to import multiple copies of each document, which might be useful while testing your crawlers.

    1. To import only new documents, select Import only new links and new options display; otherwise, skip to Step 4.

    2. To specify what new links means:

    3. Note: The option you choose here affects your actions in Step 3f and Step 4.

    4. To refresh the previously imported documents as specified on the Document Settings page, select refresh them. Generally, refreshing documents is the job of the Document Refresh Agent; refreshing documents slows the crawler down. However, if you changed the document settings for this crawler or changed the property mappings in the associated content types, refreshing documents updates these settings for the previously imported documents.

    5. If you created additional folders or applied different filters to destination folders, select try to sort them into additional folders to sort the previously imported documents to new Knowledge Directory folders.

      Another crawler might have imported documents from the same data source but into different folders than the destination folders specified for this crawler. Make sure you really want to re-sort those documents into the destination folders specified for this crawler.

    6. To re-import documents that were previously deleted (manually, due to expiration, or due to missing source documents), select regenerate deleted links. This might re-import documents that were at one time deemed inappropriate for your portal.

    7. If absolutely necessary, you can delete the record of documents that have been deleted from the portal. "History" is defined by what you specified as new documents in Step 3b:

    8. If you are still sure that you must delete the record of documents deleted from the portal, click Clear Deletion History.

  4. If you are editing an existing crawler, you see additional options under Rejected Documents. Under Rejected Documents, specify what to do when this crawler finds a previously rejected document. Again, the definition of "previously rejected" depends on the option you chose in Step 3b:

    1. To have this crawler try to import previously rejected documents, select Re-import.

    2. To delete the rejection history, click Clear Rejection History. Remember, if you chose "from this Data Source" is Step 3b, you are essentially deleting the rejection history for all crawlers that import documents from this content source.

    Note: If a document does not sort into any folder but is placed into the Unclassified Documents folder, this does not count as being rejected. Rejected documents are documents that were not placed in any folder.

  5. To mark imported documents with a crawler tag, type a tag in the Mark imported documents with the following Crawler Tag box. This tag is used to differentiate documents imported by this crawler from those imported by another crawler.

  6. Under Runtime Configuration, set the following:

The allowable ranges for these fields are set in the portalconfig.xml file. The values set here are also limited by the maximum threads allowable in the automation service used for this crawler job.


  1. Click Administration.
  2. Open the Crawler Editor:
  3. On the left, under Edit Object Settings, click Advanced Settings.