Modify Streaming Discovery Job

A completed ADDAutomated Digital Discovery StreamingThe process of automatically copying, processing, filtering and loading data into review systems. Discovery JobIn eCapture, a single directory is chosen to run the discovery job from in order to determine file types. During the discovery process, the MD5 hash for files (sans container files) are calculated and indexing occurs. may be modified in order to make modifications for de-duplication and to fix node and/or item level exceptions.

As of version 2016.3.3, node level exception errors may be requeued from the ADD Media ManagerA component of the ADD workflow provides a solution for efficient, secure media management. for Streaming DiscoveryProcess used to determine file type(s) to later be processed. The process of making data known to the eCapture system and assigning an index value to this data. Jobs submitted through ADD.

ADD Copy count validation errors may be requeued directly from the ADD MediaThe medium (hard drive, DVD, network share, etc.) associated with the delivery. Manager. Errors may be examined while the Streaming Discovery Job is processing.

 

  1. Under the ClientThe highest level in the ADD hierarchy. A Client is required to create a case. Management Tab, select a Client, click Discovery Jobs, and select the Discovery Job to be modified.
  2. Click The Streaming Discovery Errors dialog appears.
  3. ClosedRecalculate hash values for email de-duplication

    When selected, recalculates the hash value for e-mail messages. Select the e-mail properties (under Email De-duplicationThe process of identifying and separating identical electronic documents. In eCapture, the MD5 hash value of each document is generated during the discovery phase. When de-duplication is performed, a look-up for the same MD5 hash is performed across the specified de-duplication scope (Current Job, Custodian, Project and Client) for all previously-processed data. If a match is found, the item is marked a duplicate; if not, it is marked an original. Additional scope options within eCapture allow families of documents to be maintained through de-duplication such that if the top-level parent document is marked a duplicate, the entire family is marked as duplicates. Alternatively, items within a family can be de-duplicated individually. Only items selected for processing can be eligible for de-duplication, and only non-filtered (i.e., processed) items are marked as an original. If two items have matching MD5 hashes, the SHA-1 hash value is checked as well. If those values still match and the documents are parents, a family hash is generated by hashing the concatenated MD5 hash values of the entire family. This allows for a through hash comparison for the entire family in the event of differences between child documents. Bit-by-bit comparisons between files can also be performed during de-duplication, and matching file names can also be made a requirement for de-duplication.) that will be used to calculate the hash.

    ClosedE-mail De-duplication

    Method of gathering and creating the MD5Hash has changed for newly created Projects. Hashing of e-mails uses the UTC time to ensure proper de-duplication across time zones.

    In most cases, MD5 hash values are calculated on the file itself. For more reliable de-duplication of emails though, it is required that de-duplication occur on the information contained within it and not the file itself. There are many reasons for this; the simplest is the fact that when an email is saved out of its container (PST, NSF, etc.) the file that is created contains information that would change the hash value of the same email each time that the email was saved out.

    When an email is discovered within eCapture, it is assigned a hash value based on fields chosen by the user. The values of these fields are concatenated and the text is hashed. Select from the following email fields to generate the hash value:

    • Subject
    • From/Author
    • Attachment Count
    • Body - when this option is selected the default setting is to include the body whitespace. Whitespace in the e-mail body could cause slight differences between the same e-mails, which could result in different hashes being generated. If you do not wish to include the whitespace, select remove from the Body Whitespace drop-down list to remove all whitespace between lines of text in the email body prior to hashing E-mail Date: The following message types use the specified date values: Outlook:Sent Date, IBM (formerly Lotus) Notes: Posted Date, RFC822: Date, and GroupWise: Delivered Date. From the Alternate Email Date drop-down list, select either Creation Date or Last Modification Date. The selected value will be used when calculating the MD5 hash in the event that the normal E-mail Date value is not present. This commonly occurs for Draft messages that have not been sent.
    • Attachment Names
    • Recipients
    • CC
    • BCC

    Start Time is always used if it exists.

    By default, Subject, From/Author, Email Date, and an Alternate Email Date of Creation Date are used for email hash generation.

    ClosedExceptions

    As of version 2018.5.2, the configured case-level password list is utilized when an encrypted document is requeued, due to a ‘Detect Container’ discovery error. This applies for both item-level exceptions and node-level exceptions. If a password is received after the Streaming Discovery job was initially submitted, an attempt is made to extract from these files without the need to generate a separate job. Note: Container types .NSF and .PST are not directly supported by this change.

    The NodeDescribes items in the Client Management tree view such as Client, Custodian, Project, etc. In reports, an error may be defined as a Node-level error (e.g., email store).-level exceptions (n) tab lists the number of node-level exceptions in parenthesis. Click the tab to view the exceptions. A node level error means that a problem was encountered extracting the contents of a container (e.g. Email store - folder within the email store, or a loose file with attachments). Node level errors indicate items are missing from the production set.

    The Requeue Attempts column lists the number of times the node-level exception was requeued. The Date Last Requeued column lists the date the last time the node-level exception was requeued.

    To view the error details for the ADD Streaming Discovery Job:

    1. Locate and click the Ipro ADD Streaming Discovery Job in the Client Management tree view.
    2. Click the Node-level exceptions (n) tab located in the Client Management tab Status and Summary Panel.
    3. Double click the error to open the Discovery Error Information dialog. If more than one error is listed, click Next to advance to the next error’s details.
    4. When through reading the error details, click OK to close the dialog.
    5. The Item-level exceptions (n) tab lists the number of item-level exceptions in parenthesis. Item Level errors mean that an error was encountered on a specific item. If items in the production are password protected these should be reflected in the Detailed Error Report that lists errors and status messages encountered during Streaming Discovery. The Count, DocumentIn eCapture, refers to an electronic file (letter, spreadsheet, slideshow, etc.) that can be discovered; or discovered and processed. Type, and Exception Type are shown for each item-level exception.

    Note: This information is also available under the Item-level exceptions (n) tab for the Streaming Discovery Job located in the Client Management tab Status and Summary Panel.

  1. From either tab, select the exception(s).
  2. Note: If an exception cannot be requeued, the system displays a message for both the Node Level and/or the Item Level if it/they cannot be requeued

  3. Click OK. The Streaming Discovery Errors dialog closes. The Job is placed in the Job QueueThe jobs that waiting to be processed. pane and started automatically.

 

Related Pages:

Publish Error Handling