Ingest Native Files

Overview

About native ingestion

If you receive native files without a load file, you can easily ingest them directly into an existing case. All major native file types can be ingested, including Microsoft Outlook® .PST and .OST files.

In addition, native file metadata, such as file properties, can be mapped to a case’s fields as needed.

Deduplication

Deduplication is available in discovery ][ Local for native ingestion. “De-duping” allows you to eliminate redundant documents from the native file set (and thus your case), at either the custodian or case level.

During ingestion, Local evaluates documents and assigns a SHA1_HASH value for each document, based on document analysis and the options you select. Only one instance of files with the same SHA1_HASH value is ingested into the case.

When deduplication is employed, duplicate files are maintained in their original location(s) and Local provides reports identifying duplicate files.

Basic ingestion process

When native files are ingested, the following actions occur:

  • New records, with image keys defined for ingestion, are added to the case.

  • Newly ingested native files are associated with the new case records.

  • Files contained in an archive file (such as a .ZIP or .PST file) are extracted and processed.

  • Email attachments are extracted and processed.

  • Inline/inserted files in emails (such as an inserted graphic or a logo in email footers) will be extracted and processed if the associated option is selected.

  • Files embedded in other files (for example, a Word file embedded in an Excel file) are extracted and processed.

  • Any native file that is an image format (TIFF, JPG, PDF, etc.), will be included as the image file for the document as well as its native file.

  • NOTE: Image files created in this way will not have word coordinates for search highlighting in the Image tab; search highlighting will appear on the Extracted Text tab (and Quick View tab for text-based native files). If search highlighting is required on the Image tab, OCR the files as explained in Create Multiple Document Images (Bulk TIFF).

  • If you choose to deduplicate files, then Processing will evaluate all native files and ingest only one file of each SHA1_HASH value.

  • The following fields will be populated:

    • BEGATTACH, ENDATTACH

    • BEGDOC, ENDDOC

    • CUSTODIAN, DUP_CUSTODIAN (these fields will be populated if the ingestion definition calls for it.)

    • EXTRACTEDTEXT

    • NATIVE

    • MD5_HASH, SHA1_HASH

    • Other (depending on metadata in files and mapped fields)

  • The case is re-indexed (unless the Do not merge indexes after ingestion check box is checked).

Prepare to Ingest Native Files

Take the following steps before you begin the native ingestion procedure, based on the tasks you want to perform:

  1. Make sure native files are available to the computer on which the ingestion will be performed. For example, if they are in a network location, ensure that the computer has access to that location.

  2. Identify (or create) the case into which the files will be ingested, ensuring that needed fields exist. For new cases, the Native Ingestion Template can be used to help ensure all common fields (including those for email metadata) are included.

  3. If you want to specify a custodian for the files being ingested, ensure that a CUSTODIAN field exists. If you will be performing deduplication, ensure that a DUP_CUSTODIAN field exists.

  4. If you want to maintain original file name and path details for the ingested files, make sure that SINGLE_VALUE fields are defined for this purpose.

  5. The default and Native Ingestion templates in Eclipse SE 2016.3.2 and later include the “Filename” and “Extracted Path” fields for this purpose. Create the fields if they don't exist and/or edit field definitions as needed; see Step 3: Define Database Fields for details.

  6. To ingest files into a case created before Eclipse SE version 2016.3.2 and perform deduplication on the new files, ensure that the SHA1_HASH field is added to the case. See Validate Paths and Fields.

  7. If some of the native files require a password to be opened, create a simple text file containing the needed passwords, one password per line. Notes:

    • Files for which passwords are not included will not be ingested.

    • Files for which passwords are provided will be ingested, but will not be viewable in the Quick View tab. When opening a native file from Local (in its native application), the password will be required.

Ingestion Procedure

After completing preparation (see above), perform the following steps to ingest native files without a load file:

  1. On the Dashboard, click on the Processing module.

  2. In the left navigation panel, under the Ingestion tab, click Native.

  3. On the Ingest Natives page, select a Client ID and a Case Name.

  4. Click the Start button. The Ingest Natives wizard displays.
  5. On the Directory step, select the directories where native files are stored that you want to ingest.

    • If the file location is a mapped drive, select one or more drives.

    • Or, click to open the folder “tree” and select specific subdirectories or specific files.

    • If you add a new mapped drive or folder(s) in Windows, click Refresh to display the new mapped drive.

    • If the file location is not listed, click Browse to navigate to and select a network or other location.

    • To search for native files in all folders in a specific location, select the Include Subdirectories option.

    TIP: If all drives do not appear, check drive status in Windows Explorer. Windows Explorer may incorrectly show drives as disconnected, even though they are connected and available. Once those drives are opened in Windows Explorer, they will be displayed correctly in the Search Locations list.

  6. On the Options step of the wizard, define the various processing options for the native documents:

    • Numbering: Define the needed image-key numbering scheme for the documents being added to the case. Take one of the following actions:

      • If a Starting BEGDOC value is listed and is acceptable, skip to the next option.

        Processing “remembers” previous entries and increments the last number used by one (1) after an ingestion is completed. This makes it easy to maintain a numbering scheme if you ingest more than one set of files (such as from multiple CDs or other media).

      • To define a new format and/or numbering set, enter the needed starting BEGDOC value.

      • To use each native file’s name as the BEGDOC value, select the Use Filename option. Do not select this option if you are ingesting files from any type of archive file (such as a .ZIP or .PST file).

    • Native File Options:

      • Copy files to case directory or Keep original file location: Select where native files should be located after they are ingested. The native file path will be correctly included in the case.
    • Indexing Options:

      • Do not merge indexes after ingestion - When checked, natives will be ingested but will not be searchable. To make the ingested natives searchable you will need to do a full text rebuild of indexes in Case Utilities. For more information, see Index Maintenance.
    • Extracted File Options:

      • Use Case Data Directory or Use Specified Directory: Select where you want attached/embedded files to be located after the native files are ingested and these files are extracted. For example, if an email message has an attachment, the email message will go in the Native File location and the attachment file will be in the Extracted File location. If you choose Use Specified Directory, select the Browse button and navigate to the directory where you would like to store the extracted files.

    • Optional Temp Folder:

      • If you want to set a new location for temp files, select the Browse button in the Optional Temp Folder box. Navigate to the directory and click Select Folder. The same location will be used when another case is opened in the same session.

  7. When finished with the first page of options, select Next to proceed to the second page of options.

  8. On the second Options page of the wizard, define additional processing options for the native documents:

    • File Extension Filters (optional): Specify the types of files to be ingested by either or both of the following options:

      • Include: Specify explicit file types to ingest. Leave this field blank to include all supported file types in the location specified for ingestion. If there are unsupported file types in that location, an error will be recorded in the ingestion log.

      • Exclude: Specify file types to be ignored during the ingestion process. If you know that unsupported files are included in the location specified for ingestion, you can exclude them with this option to avoid error messages.

      • TIP: Ensure all needed file extensions are entered. For example, if you want to include .DOC, .DOCM, and .DOCX files, all extensions must be entered.

    • Custodian (optional): To specify a custodian for all documents being ingested, take one of the following steps:

      Note: The name will be added to the CUSTODIAN field for all documents, except as noted for deduplication.

      • Select an existing custodian name (if a list exists).

      • Enter the name of an existing custodian (using the same capitalization).

      • Enter a new custodian name. In this case, you will be asked to verify the addition of a new value during the ingestion.

      Additionally, select or clear the Make Primary option:

        • Option selected: For any document being ingested that is a duplicate of one in the case, if a custodian is defined for the original document, the “ingestion custodian” will be added to the CUSTODIAN field (and will replace the existing custodian). The existing custodian will be added to the DUP_CUSTODIAN field.

        • Option not selected: If you do not select this option, for any document being ingested that is a duplicate of one in the case, if a custodian is defined for the original document, the ingestion custodian will be added to the document’s DUP_CUSTODIAN field.

    • Deduplication (optional): To ensure that duplicate documents are not ingested into the case, complete the following steps. (A SHA1_HASH field is required for deduplication, and a DUP_CUSTODIAN field must exist if you want to define a primary custodian.)

      • Select Perform Deduplication.

      • If you selected a custodian, select the type of analysis to be performed:

        • Case Level: The files being ingested will be compared to all documents in the case.

        • Custodian Level: The files being ingested will be compared only to files for which the custodian is the same as the custodian specified earlier in this wizard.

    • Email Hash Options (optional):

      1. Select the details to be factored into the calculation of SHA1_HASH values for emails (see the following figure). For example, if only Subject is selected, then only the email Subject field will be used to calculate the SHA1_HASH value.

      2. Depending on the options selected above, choose from the following options.

        Option

        Description

        Alternate Email Date

        If the Email Date option is selected, select either Creation Date or Last Modified date to be used in the hash analysis for emails for which no sent date exists (such as draft messages).

        Email Body Whitespace

        If the Body option is selected, choose to Retain (include) or Remove white space between lines of text in the hash analysis.

        Use Start/End Times

        If the Email Date option is selected, select this option to use a calendar item’s start or end date in the hash analysis for calendar items for which no sent date exists (such as draft appointments).

        If this option is not selected, then the last modified/created date is used.

    • Extract inline images (optional): Select this option to extract images or embedded objects in emails as separate documents. For example, if the body of an email includes an inserted image and two images in the footer, all three images will become separate documents. Documents created in this way are considered attachments to the original email.

    • Password Handling (optional): If some of the files being ingested are password-protected:

      1. Identify or create a plain text “password” file (such as .TXT or .CSV) that includes all needed passwords, one per line.

      2. In the Password List field, enter the complete path and filename for the password file, or click Browse and navigate to/select the file.

        NOTES: When problems exist with password-protected files (e.g. you do not have a password file, some passwords are missing or incorrect):

        • Although the files are not ingested, records are added for the password-protected files and the following fields are populated: BegDoc, Filename, Extracted Path, MD5_Hash, and SHA1_Hash. These are the default field names in the Native Ingestion template. The DocumentType field includes “EXTRACTION ERROR.”

        • The error log lists files that are not ingested because they are password protected.

        • If there are any password-protected documents in your case that are not covered in your password list, you can use the “Reprocess” option to ingest them into the case. See Reprocess Native Files.

  9. Once all options have been properly defined, click Next to proceed to the next step in the wizard.
  10. The Field Mapping step opens in the wizard. Local identifies metadata found in the native files (for example, the Author and Title properties of a Microsoft Word document; the To, From, Subject details for emails, etc.), and matches it to fields in the case to the extent possible.

    The original file name and path details can also be mapped if your case includes appropriate fields (see Prepare to Ingest Native Files).

    Complete metadata mapping as follows:

    1. Evaluate the mapping that has been completed. If all mapping is correct, skip to the next step.

    2. For each field to be corrected, double-click the field in the Database Field list and take one of the following actions:

      • To change a mapped database field, click the correct field in the drop-down list.

      • To unmap a field, click <Not Assigned> at the top of the drop-down list.

      • To create a new field for the metadata, click <New Field> at the top of the drop-down list and complete the New Field dialog box.

        Note: After the ingestion is complete, revise the field definition if needed; see Change Field Definitions. Or, evaluate the native file metadata; if any database fields are missing, stop the native ingestion, add new fields in System Administration with needed flags/options, then return to the native ingestion process.

  11. Once fields have been properly mapped, click Start to begin the ingestion. Wait as the job is processed. Status displays on the Processing step of the wizard. If needed, you can cancel a job during ingestion by clicking the Cancel Ingestion button on the Native Ingestion dialog that displays.

    Note: The time it takes to ingest documents varies depending on the number and types of files chosen and the options selected.

  12. When the ingestion is complete, details regarding the ingestion are shown on the Confirmation step of the wizard. Details include:

    • The number of documents ingested.

    • The number of errors that occurred, if any.

    • The number of duplicate documents found, if deduplication was performed.

    • The amount of time it took to ingest the documents.
  13. To view details about the ingestion, click Log File. A separate log file is generated for each ingestion session; clicking View Log File opens the current log. Files are named with a date/time indicator and they are located in the “Native Ingestion Logs” folder in the case data directory.

  14. If deduplication was performed and duplicates were identified, you can view the deduplication log file by clicking the Dedupe Log File button. Files are named with the ingestion date and time, using the format YYYYMMDD_HHMMSS.csv (for example, 20190507_105145.csv).

  15. Check the ingested files in the Review document details tabs (Quick View, Extracted Text, etc.), or in Administration as follows:

    1. In the Administration module, expand Case Management by clicking on the green arrow.

    2. Click Case Management.

    3. Select the needed client.

    4. Select the needed case.

    5. Click the Database Records tab.

    6. Click List all Records and navigate to the records created for the newly ingested native files.

    7. Click a document (if the page count is zero), or click next to a document (page count >0) of interest and click a page in the document.

    8. Ensure that the Native File field includes the correct file name and size; the Extracted Text field may also contain details (depending on file type). The following figure shows an example.

    9. NOTES:

      • An Image File should be included for native files that are image formats (TIFF, JPG, PDF, etc.), but for other files, the page count will be zero (until/unless images are ingested/created using bulk TIFF.)

      • Also review field data. Click the Field Name or Field Value column heading to sort the column. Also, the column width(s) can be increased by dragging the heading boundary to the desired size.

    10. If the native file path is wrong, correct it as explained in Validate Paths and Fields.

    11. Repeat this procedure as needed to check other newly ingested native files.

  16. Inform your users that the new native files are available in the case and explain proper use of these files in their case review.

     

Related Topics

Overview: Processing Files

Reprocess Native Files