Reprocess Native Files

If you encounter errors while processing native files, discovery ][ Local gives you the option to reprocess those native files into your case once the underlying issues have been fixed. Common reasons for why files fail to process correctly are that the files are password protected (while the associated passwords have not been supplied prior to ingestion) and that the Use Filename as Begdoc option had been selected for archive file types (ZIPs, PSTs, etc.) during the initial ingestion. Review the sections below to learn how to prepare for reprocessing native files, as well as steps involved in completing the reprocessing procedure.

Prepare to Reprocess Native Files

Take the following steps before you begin the reprocessing procedure, based on the tasks you want to perform:

  1. Make sure native files are available to the computer on which the ingestion will be performed. For example, if they are in a network location, ensure that the computer has access to that location.

  2. If some of the native files require a password to be opened, create a simple text file containing the needed passwords, one password per line. Notes:

    • Files for which passwords are not included will not be ingested.

    • Files for which passwords are provided will be ingested, but will not be viewable in the Quick View tab. When opening a native file from Local (in its native application), the password will be required.

Reprocessing Procedure

After completing preparation (see above), perform the following steps to reprocess native files:

  1. On the Dashboard, click on the Processing module.

  2. In the left navigation panel, under the Ingestion tab, click Native.

  3. On the Ingest Natives page, select a Client ID and a Case Name.

  4. Once a client and case have been chosen, the Reprocessing tab in the left navigation panel becomes selectable. Click the Reprocessing tab.

  5. Click the Start button. The Reproccesing wizard displays.
  6. The Log File step opens and displays a list of log files from past native ingestions. Select the log file you need to reprocess. Then click Next.

    TIP: You can use the log file details that display in the table, such as the name, modified date, and size, to help you locate the log file with the documents you need to reprocess. However, if still uncertain, you can select a log file and review the items in the next step of the wizard.

  7. On the Items step of the wizard, you have the opportunity to review the full list of items that failed to process the first time around. Items that are capable of being reprocessed display in the first table on this step. The Items Capable of Reprocessing table shows the error type, Begdoc number, and filename associated with each item. To select items for reprocessing, do one of the following:

    • Select individual items, one by one, by clicking the checkbox beside their name.
    • Select all items in the list by clicking the Select All button beneath the table.
    • Clear all items in the list by clicking the Clear All button beneath the table.
    • Select all the password-protected items by clicking the Select Password Protected button beneath the table.
    • Select all the items that failed to process as the result of using filename as BEGDOC by clicking the Select errors caused by using Filename as BEGDOC button beneath the table.

    Note: The second table on this step displays the full list of items that failed to process the first time around, and which are incapable of being reprocessed. Review the log file in the case data directory for information on why these files failed to process. Correct any issues with the files in question (see Prepare to Ingest Native Files) and process them again once the issues have been amended.

  8. Once all items have been selected for reprocessing, click Next to proceed to the next step of the wizard.

  9. On the Options step of the wizard, define the various options for the documents being reprocessed:

    • Numbering: Define the needed image-key numbering scheme for the documents being reprocessed. Take one of the following actions:

      • If a Starting BEGDOC value is listed and is acceptable, skip to the next option.

        Processing “remembers” previous entries and increments the last number used by one (1) after an ingestion is completed. This makes it easy to maintain a numbering scheme if you ingest more than one set of files (such as from multiple CDs or other media).

      • To define a new format and/or numbering set, enter the needed starting BEGDOC value.

        Note: The Use Filename option is disabled in the reprocessing wizard. This option is only available when processing native documents for the first time, using the native ingestion wizard.

    • Native File Options:

      • Copy files to case directory or Keep original file location: Select where native files should be located after they are ingested. The native file path will be correctly included in the case.
    • Extracted File Options:

      • Use Case Data Directory or Use Specified Directory: Select where you want attached/embedded files to be located after the native files are ingested and these files are extracted. For example, if an email message has an attachment, the email message will go in the Native File location and the attachment file will be in the Extracted File location. If you choose Use Specified Directory, click the Browse button and navigate to the directory where you would like to store the extracted files.
  10. When finished with the first page of options, select Next to proceed to the second page of options.

  11. On the second Options page of the wizard, define additional options for the native documents being reprocessed:

    • File Extension Filters (optional): Specify the types of files to be ingested by either or both of the following options:

      • Include: Specify explicit file types to ingest. Leave this field blank to include all supported file types in the location specified for ingestion. If there are unsupported file types in that location, an error will be recorded in the ingestion log.

      • Exclude: Specify file types to be ignored during the ingestion process. If you know that unsupported files are included in the location specified for ingestion, you can exclude them with this option to avoid error messages.

      • TIP: Ensure all needed file extensions are entered. For example, if you want to include .DOC, .DOCM, and .DOCX files, all extensions must be entered.

    • Custodian (optional): To specify a custodian for all documents being ingested, take one of the following steps:

      Note: The name will be added to the CUSTODIAN field for all documents, except as noted for deduplication.

      • Select an existing custodian name (if a list exists).

      • Enter the name of an existing custodian (using the same capitalization).

      • Enter a new custodian name. In this case, you will be asked to verify the addition of a new value during the ingestion.

      Additionally, select or clear the Make Primary option:

      • Option selected: For any document being ingested that is a duplicate of one in the case, if a custodian is defined for the original document, the “ingestion custodian” will be added to the CUSTODIAN field (and will replace the existing custodian). The existing custodian will be added to the DUP_CUSTODIAN field.
      • Option not selected: If you do not select this option, for any document being ingested that is a duplicate of one in the case, if a custodian is defined for the original document, the ingestion custodian will be added to the document’s DUP_CUSTODIAN field.
    • Password Handling (optional): If some of the files being ingested are password-protected:

      1. Identify or create a plain text “password” file (such as .TXT or .CSV) that includes all needed passwords, one per line.

      2. In the Password List field, enter the complete path and filename for the password file, or click Browse and navigate to/select the file.

        NOTES: When problems exist with password-protected files (e.g. you do not have a password file, some passwords are missing or incorrect):

        • Although the files are not ingested, records are added for the password-protected files and the following fields are populated: BegDoc, Filename, Extracted Path, MD5_Hash, and SHA1_Hash. These are the default field names in the Native Ingestion template. The DocumentType field includes “EXTRACTION ERROR.”

        • The error log lists files that are not ingested because they are password protected.

    • Extract inline images (optional): Select this option to extract images or embedded objects in emails as separate documents. For example, if the body of an email includes an inserted image and two images in the footer, all three images will become separate documents. Documents created in this way are considered attachments to the original email.

    • Email Hash Options (optional):

      1. Select the details to be factored into the calculation of SHA1_HASH values for emails (see the following figure). For example, if only Subject is selected, then only the email Subject field will be used to calculate the SHA1_HASH value.

      2. Depending on the options selected above, choose from the following options.



        Alternate Email Date

        If the Email Date option is selected, select either Creation Date or Last Modified date to be used in the hash analysis for emails for which no sent date exists (such as draft messages).

        Email Body Whitespace

        If the Body option is selected, choose to Retain (include) or Remove white space between lines of text in the hash analysis.

        Use Start/End Times

        If the Email Date option is selected, select this option to use a calendar item’s start or end date in the hash analysis for calendar items for which no sent date exists (such as draft appointments).

        If this option is not selected, then the last modified/created date is used.

    • Deduplication (optional): To ensure that duplicate documents are not ingested into the case, complete the following steps. (A SHA1_HASH field is required for deduplication, and a DUP_CUSTODIAN field must exist if you want to define a primary custodian.)

      • Select Perform Deduplication.

      • If you selected a custodian, select the type of analysis to be performed:

        • Case Level: The files being ingested will be compared to all documents in the case.

        • Custodian Level: The files being ingested will be compared only to files for which the custodian is the same as the custodian specified earlier in this wizard.

  12. Once all options have been properly defined, click Start to begin the ingestion. Wait as the job is processed. Status displays on the Reprocess step of the wizard. If needed, you can cancel a job during ingestion by clicking the Cancel Ingestion button on the Native Ingestion dialog that displays.

    Note: The time it takes to ingest documents varies depending on the number and types of files chosen and the options selected.

  13. When the ingestion is complete, details regarding the ingestion are shown on the Confirmation step of the wizard. Details include:

    • The number of documents processed.

    • The number of documents updated.
    • The number of errors that occurred, if any.

    • The number of duplicate documents found, if deduplication was performed.

    • The amount of time it took to ingest the documents.
  14. To view details about the ingestion, click Log File. A separate log file is generated for each ingestion session; clicking View Log File opens the current log. Files are named with a date/time indicator and they are located in the “Native Ingestion Logs” folder in the case data directory.

  15. If deduplication was performed and duplicates were identified, you can view the deduplication log file by clicking the Dedupe Log File button. Files are named with the ingestion date and time, using the format YYYYMMDD_HHMMSS.csv (for example, 20190507_105145.csv).

  16. Check the ingested files in the Review document details tabs (Quick View, Extracted Text, etc.), or in Administration as follows:

    1. In the Administration module, expand Case Management by clicking on the green arrow.

    2. Click Case Management.

    3. Select the needed client.

    4. Select the needed case.

    5. Click the Database Records tab.

    6. Click List all Records and navigate to the records created for the newly ingested native files.

    7. Click a document (if the page count is zero), or click next to a document (page count >0) of interest and click a page in the document.

    8. Ensure that the Native File field includes the correct file name and size; the Extracted Text field may also contain details (depending on file type). The following figure shows an example.

    9. NOTES:

      • An Image File should be included for native files that are image formats (TIFF, JPG, PDF, etc.), but for other files, the page count will be zero (until/unless images are ingested/created using bulk TIFF.)

      • Also review field data. Click the Field Name or Field Value column heading to sort the column. Also, the column width(s) can be increased by dragging the heading boundary to the desired size.

    10. If the native file path is wrong, correct it as explained in Validate Paths and Fields.

    11. Repeat this procedure as needed to check other newly ingested native files.

  17. Inform your users that the new native files are available in the case and explain proper use of these files in their case review.


Related Topics

Overview: Processing Files