Modify Streaming Discovery Jobs
A completed Enterprise Streaming Discovery Job may be modified to make recalculated hash values for deduplication or to fix node and/or item level exceptions. Enterprise Streaming Discovery Jobs have been that have been exported, cannot be modified. Errors may be examined while the Streaming Discovery Job is processing.
Note: Node level exception errors may be requeued from the Enterprise Media Manager for Streaming Discovery Jobs submitted through Enterprise.
- In the Client Management Tree View, select a Client, expand the Streaming Discovery Jobs folder, and select the Streaming Discovery Job.
Click the button. This button is disabled until the Enterprise Streaming Discovery Job completes.
The Streaming Discovery Errors dialog box appears.
- Select Recalculate hash values for email deduplication if you want to recalculate the hash value for email messages.
Under Email Deduplication, select the email properties that will be used to generate the deduplication hash.
The method for gathering and creating the MD5Hash has changed for newly created Cases (Projects). Hashing of emails uses UTC time to ensure proper deduplication across time zones.
In most Cases (Projects), MD5 hash values are calculated on the file itself. However, for more reliable deduplication of emails, it is required that deduplication occur on the information contained within it and not the file itself. There are many reasons for this; the simplest is that when an email is saved out of its container (PST, NSF, and so on) the file that is created contains information that would change the hash value of the same email each time that the email was saved out.
When an email is discovered within eCapture, it is assigned a hash value based on fields selected by the user. The values of these fields are concatenated and the text is hashed. Select from the following email fields to generate the hash value:
- Attachment Count
- Body: When this option is selected, the default setting is to include the body whitespace. Whitespace in the email body could cause slight differences between the same emails, which could result in different hashes being generated. If you do not want to include the whitespace, on the Body Whitespace drop-down menu choose Remove to remove all whitespace between lines of text in the email body before hashing.
- Email Date: The following message types use the specified date values: Outlook: Sent Date; IBM (formerly Lotus) Notes: Posted Date; RFC822: Date; and GroupWise: Delivered Date. On the Alternate Email Date drop-down menu, choose either Creation Date or Last Modification Date. The chosen value is then used when calculating the MD5 hash if the normal Email Date value is not present. This commonly occurs for Draft messages that have not been sent.
- Attachment Names
Note: Start Time is always used if it exists.
By default, Subject, From/Author, Email Date, and an Alternate Email Date chosen to be Creation Date are used for email hash generation.
- Click OK. The Job displays in the Job Queue pane and starts automatically.
After Streaming Discovery Jobs are run, Node-level exceptions and Item-level exceptions are displayed in the Streaming Discovery Errors dialog box and in the Status and Summary panel.
When an encrypted document is requeued, due to a Detect Container discovery error, the configured case-level password list is utilized. This applies for both Item-level exceptions and Node-level exceptions. If a password is received after the Streaming Discovery Job was initially submitted, an attempt is made to extract from these files without requiring to generate a separate job.
Note: This does not apply to NSF and PST container types.
After a Streaming Discovery Job is run, the errors that occurred during the execution of the Job are listed in the Status and Summary panel for the Job. To view the error details for the Streaming Discovery Job, in the Client Management Tree View, click the Streaming Discovery Job.
To view node-level exceptions, click the Node-level exceptions [n] tab.
- The Node-level exceptions [n] tab lists the number of node-level exceptions in brackets. Click the tab to view the exceptions. A node level error means that a problem was encountered extracting the contents of a container (for example, email store, folder within the email store, or a loose file with attachments). Node level errors indicate items are missing from the production set. The ID, Type, Subtype, Requeue Attempts, Last Date Requeued, and Location are shown for each node-level exception. The Requeue Attempts column lists the number of times the node-level exception was requeued. The Date Last Requeued column lists the date the last time the node-level exception was requeued.
Double click an error to open the Discovery Error Information dialog box. If more than one error is listed, click Next to advance to the next error’s details. When through reading the error details, click OK to close the dialog box.
To view item-level exceptions, click the Item-level exceptions [n] tab.
- Item-level exceptions [n] tab lists the number of item-level exceptions in brackets. Click the tab to view the exceptions. An item level error means that an error was encountered on a specific item. If items in the production are password protected, these items should be reflected in the Detailed Error Report that lists errors and status messages encountered during Streaming Discovery. The Count, Document Type, and Exception Type are shown for each item-level exception.
To requeue items, select the check box next to the items. After you have selected all of the items you want to requeue, click OK. The Streaming Discovery Errors dialog box closes. The Job is placed in the Job Queue pane and starts automatically.
Note: If an exception cannot be requeued, the system displays a message for both the Node Level and/or the Item Level if they cannot be requeued.