Case (Project) Options

When you create a case, you set the project options for the case. This topic details the options available when creating a case. This topic can be useful if you want to print out a copy of all of the Case (Options) for later use. If you want the complete procedure for creating a case, see Create a New Case (Project).

Tip: If you want to expand all of the collapsed sections in the topic, so that you can print the full list of options, click the expand button in the Ipro Help Center toolbar.

For more information on the Case (Project) Options, click on the link below.

ClosedDiscovery Options

On the Discovery Options tab, there are three sub-tabs you can work with to define Discovery Job options at the Case (Project) level, as well as to define Password Handling options for the case.

ClosedDiscovery: General Options

On the General tab, set Discovery options.

  1. Calculate Page Count - Select this check box to calculate an initial page count of the selected files, before processing. When you run reports, you can choose to include the page count.

    • If you choose not to process unknown files, those files will display on the summary report, but their page count will be zero.
    • You cannot use this setting to count the number of pages in emails.

    Note: This is a preliminary count. It does not reflect the number of pages that will be used by metadata, place holders for unknown or exception files, or blank pages (if you choose to drop blank pages).

  2. Enhanced Password Detection - When this option is selected, specific file types are checked for password protection at Discovery time. A password-protected document is defined as a document in which a prompt asks for a password on attempting to open the document in its native application. Otherwise, if the document can be opened and viewed in its native application, the document is not considered password protected. To see the documents with password protection, run the Detailed Error Report. Any password protection errors that occurred during Discovery can be corrected before running Processing Jobs and/or Data Extract Jobs on the data set to save time during QC.

  3. Node Handling - PDF Portfolio files allow email boxes to be stored/converted within a folder structure. This folder structure information will be extracted and available for export in the existing ‘MailFolder’ metadata field.

    • Treat Archives as Directories: This option check box is selected by default. When the check box is selected, the files in the archived folder are treated as parent and child docs when running a Discovery Job. In addition, WINMAIL.DAT attachments are treated the same as archives and will be processed the same as ZIP files. The following are treated as archive files:

      • FI_ZIP = 1802
      • FI_ZIPEXE = 1803
      • FI_ARC = 1804
      • FI_TAR = 1807
      • FI_STUFFIT = 1812
      • FI_LZH =  1813
      • FI_LZH_SFX = 1814
      • FI_GZIP = 1815
      • Ipro_FI_RAR = 13000
      • FI_TNEF = 1197
    • Treat PDF Portfolios/Packages as containers: This option check box is selected by default.

      • When the check box is selected, the PDF Portfolio file is treated as a directory and its contents extracted and treated as loose files (except children of the contained PDFs). The PDF Portfolio is not treated as an item, only as a container in the Nodes table. Documents inside the PDF package are treated as parent files.
      • When the check box is cleared, the PDF Portfolio file is treated as a file parent and its contents extracted and treated as attachments in the Items table. The PDF Portfolio is treated as an item and can be processed, filtered, or exported.
  4. Mailstores - There are several IBM-specific settings that can be set for a Discovery job.

    • Use legacy Lotus Notes Handling - Legacy Lotus Notes handling uses the IBM (formerly Lotus) UI for Discovery and is considerably slower than current IBM (formerly Lotus) Mail discovery.

      Important: This option is required for hash compatibility to deduplicate across older jobs discovered with the legacy versions 5.0 and earlier.

    • Create working copy of Outlook mail stores - By default, this option check box is cleared for both new and existing Cases (Projects).
      • When this option check box is cleared, the discovery of PSTs is made directly from the PST; no copies are made.
      • When this option check box is selected, if any PSTs are encountered in a Discovery Job, a copy of the PST is made to a working directory located under the Discovery Job and Discovery is performed on that copy. Once the Job completes, all working copies of PSTs in the Job are deleted. If a node-level error on the PST is requeued after the Discovery Job is complete, the source PST is copied again. The working copy is made again in this instance only if the option is selected.
  5. Email Deduplication - The method of gathering and creating the MD5 hash value for newly created Projects. Hashing of emails uses the UTC time to ensure proper deduplication across time zones.

    In most cases, MD5 hash values are calculated on the file itself. For more reliable deduplication of emails though, it is required that deduplication occur on the information contained within it and not the file itself. There are many reasons for this; the simplest is that when an email is saved out of its container (PST, NSF, etc.) the file created contains information that would change the hash value of the same email each time that the email was saved out.

    When an email is discovered within eCapture, it is assigned a hash value based on fields chosen by the user. The values of these fields are concatenated and the text is hashed. To generate a hash value, the user selects from the following email fields:

    • Subject

    • From/Author

    • Attachment Count

    • Body - When this option check box is selected, the default setting is to include the body whitespace. Whitespace in the email body could cause slight differences between the same emails, which could result in different hashes being generated. If you do not want to include the whitespace, select Remove from the Body Whitespace drop-down menu to remove all whitespace between lines of text in the email body before hashing.

    • E-mail Date: The following message types use the specified date values: Outlook: Sent Date, IBM (formerly Lotus) Notes: Posted Date, RFC822: Date, and GroupWise: Delivered Date. From the Alternate Email Date drop-down menu, select either Creation Date or Last Modification Date. The selected value will be used when calculating the MD5 hash if the normal E-mail Date value is not present. This commonly occurs for Draft messages that have not been sent.

    • Attachment Names

    • Recipients

    • CC

    • BCC

    Start Time is always used if it exists.

    By default, Subject, From/Author, Email Date, and an Alternate Email Date Creation Date are used for email hash generation.

  6. File Extraction - Treat email inline images as attachments

    • When this check box is selected, inline images in email messages (e.g., signature files) are re extracted as attachments and treated as child documents. Apple Mail Message (EMLX) files are supported. The attachments for EMLX files are extracted from the emails and it recognizes and handles the inline images. When EMLX files are processed or data extracted, they are treated as emails. The output resembles an email displayed in Outlook Express or Outlook.
    • When this check box is cleared, inline images are not extracted as children. The images are not treated as separate documents, and therefore will not be OCRed, language-identified, or indexed. The images are rendered inline as they would look in the native file.

    Note: Black Ice™ does not return text for any images that are printed. So extracted text for the (parent) document will not include text from the inline image. The images will be OCRed only if the image it is printed on does not have any text, and OCR Pages Missing Text is enabled under the Processing Job, General Options tab.

  7. Embedded File Extraction - eCapture can control which embedded object types are extracted from most Microsoft Office and Rich Text documents.

    Click Closedhere for more information about embedded files.

    An embedded file is an object that has been inserted into a document and, if extracted, can act as a standalone document. Multiple methods for embedding object and files are available for Microsoft Office documents through the Microsoft Office Object dialog box.

    The following embedded file types each refer to a specific method of embedding documents in Microsoft Office file types. Clearing an embedded file type option prevents its extraction from supported document types.

    • Excel Documents - When selected, the system extracts OLE embedded objects associated with the Microsoft Excel application.
    • Word Documents - When selected, the system extracts OLE embedded objects associated with the Microsoft Word application.
    • PowerPoint Documents - When selected, the system extracts OLE embedded objects associated with the Microsoft PowerPoint application.
    • E-mail File Attachments (Outlook.FileAttach) - When selected, the system extracts Outlook message objects from other Microsoft Office document formats that were embedded through the Outlook.FileAttach method.
    • Visio Drawings - When selected, the system extracts OLE embedded objects associated with the Microsoft Visio application.
    • Package-Embedded Documents - When selected, the system extracts files that were added to a Word document or an Excel spreadsheet. The actual documents being extracted are those documents embedded through the packager. The packager is a Microsoft Windows OS utility that allows the packages to be created for future integration into the file.
    • Acrobat Documents - When selected, the system extracts objects embedded with the AcroExch object type.
    • E-mail Message Attachments (MailMsgAtt) - When selected, the system extracts Outlook message objects from other Microsoft Office document formats that were embedded through the MailMsgAtt method.
    • E-mail File Attachments (MailFileAtt) - When selected, the system extracts Outlook message objects from other Microsoft Office document formats that were embedded through the MailFileAtt method.
    • Images - This option was added to disable (check box cleared) or enable (check box selected) extraction of embedded image items for Microsoft Office embedded files (Excel, Word, PowerPoint, etc.).

      Note: To maintain backward compatibility of existing jobs, the Images option check box will be selected if the option is not found in the SETTINGS.INI file.

ClosedDiscovery: Indexing Options

Click the Indexing Options tab to set the indexing options for Discovery Jobs.

  1. If you want to create an index during initial discovery, select the Create Search Index check box.

    IMPORTANT: THIS OPTION MUST REMAIN SELECTED FOR MULTI-LANGUAGE DOCUMENT DETECTION.

  2. Under Search Indexing, set the Search Indexing options. eCapture uses dtSearch to provide full text searching of files before processing. This feature provides advanced search functions including fuzzy searching, synonym searching, and more. Search options are available in the Flex Processor Rules Manager.

    To facilitate the searching that will take place during an electronic data discovery (EDD) session, establish the method for searching unsupported files and the treatment of hyphens during searches.

    • Index Numbers - Select this option if you want to be able to search for numbers.
    • Recognize Dates, e-mail address, and credit card numbers - Select this option to search for dates (in any format), email addresses (or parts of email addresses), and credit card numbers.
    • Auto Break CJK Words - Select this option when indexing documents containing CJK (Chinese, Japanese, Korean) languages. It breaks up the CJK words as if each character is a CJK word.
    • Use filtering to index corrupt or encrypted documents - When selected, this option applies the filtering algorithm to attempt to recover text from corrupt or encrypted documents. If this option is not selected, corrupt or encrypted documents will be considered indexing failures.
    • Index Discovery Path - When selected, the Discovery path will be searched. Otherwise, if not selected, searching the Discovery path would create false-positive hits.
    • Set the options that control how eCapture processes Binary files. For more information about dtSearch and the files it recognizes, click Closedhere.

      dtSearch recognizes and supports many types of files, including word processor, email, and PDF files (see http://support.dtsearch.com/faq/dts0103.htm for a list of file types that dtSearch recognizes and supports). Non-text files that are in formats that dtSearch does not support are indexed and searched as binary files. Examples of binary files are executables, fragments of documents that were recovered from an undelete process, or blocks of data recovered forensically. Because an individual file can include plain text, Unicode text, and fragments from, for example, DOC or XLS files, much of the content would be missed if the files were indexed and searched as if they were simple text files.

      • Filter Binary Unicode - Use a text selection algorithm to filter text from binary files. The algorithm scans for sequences of single-byte, UTF8, or Unicode in the file. This option is recommended for forensic searches, especially when files may contain text in languages other than English.
      • Filter Binary - Extract plain text items from the binary files.
      • Index Binary - Index all of the contents of binary files as single-byte text.
      • Skip Binary - Do not index binary files.
    • Set the options that control how hyphens are treated during an EDD search.

      • Hyphens as spaces - Treats hyphens found in the files as spaces. For example, a search for “first-class” will match incidences of “first class” in the files being searched.
      • Hyphens as searchable - Searches hyphens. For example, a search for “first-class” will match only incidences of “first-class” in the files being searched.
      • Ignore Hyphens - Ignores hyphens entered in the search criteria. For example, a search for “first-class” will match incidences of “firstclass” in the files being searched.
      • Index all three ways - Indexes terms containing hyphens using all three hyphen options (i.e. "First-class " will be indexed as "First-Class" "FirstClass", and "First Class").

      For more information on hyphens and how they are treated during an EDD search, see the dtSearch documentation here.

  3. Set the Parent/child text handling options. These options are used to specify how text of parent and child documents should be handled during indexing and are specific to emails (IBM [formerly Lotus] Notes and Outlook) and any edocs (non-emails) that contain embedded documents.

    • Index child text with parent text - Merges and indexes the text of a child document with that of its parent.
    • Separate child and parent text - Indexes the text of a child document separately from its parent. The following string is added as an include filter: *.MSG *.MSG>*.body *.EML *.EML>*.body. This occurs while indexing. Two documents are produced in the index for .EML and .MSG files: one is for the body and the other is for the email (headers...). Any attachments are not included in that index.
  4. Set the OCR settings. There are some important considerations about how OCR takes place. Click Closedhere for more information.

    Note: If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Discovery and Data Extract jobs use the same OCR and Time Zone Handling options. For more information about setting options at the Case (Project) level, see Create a New Case (Project).

    The OCR Settings available for Discovery Jobs are outlined in the following table.

    Option

    Description

    OCR images as necessary

    Images will be OCRed for indexing/language identification if necessary. The OCR text obtained from the image is then passed on to dtSearch for indexing. The OCR will be indexed and available to be searched on in the Flex Processor.

    OCR PDF documents

    PDFs with no embedded text: perform OCR before indexing or language identification. PDF pages with embedded text (text-behind) will have text extracted. Comments on a PDF file are also extracted.

    1. The OCR text is added to any extracted text from the PDF.

    2. The text obtained through OCR, along with the extracted text from the PDF, is passed to dtSearch for indexing.

    3. The OCR is then indexed and available to be searched in the Flex Processor.

     

    Note: If selected, this will impact the time for the Discovery process. OCR Text obtained through OCR could contain duplicate words as appended to the extracted text file. Search hits could be inflated by these results.

    OCR PowerPoint Documents

    Perform OCR on Microsoft PowerPoint files during indexing to get text from embedded content in the slides. This results in slower indexing speeds for PowerPoint files, but more accurate search results.

    PDF page character threshold

    Optional: Select PDF page character threshold and indicate a value. The default value is 25 characters. The maximum value is 10000. If the value is less than 25, eCapture sends the page to be OCRed; otherwise, the text is just indexed. If necessary, enter a different value.

    Minimum average OCR confidence [1-100]

    The level range settings are from 1 to 100. The default is 50. The confidence level is the average percentage of confidence for each document for all pages within a document on which OCR was performed. Success or failure of a document for indexing preparation is based on the average confidence level of the document. If the average confidence level is below the selected threshold, the page is considered as an indexing error and is available for re-queueing. The Discovery Job Status Information Panel displays OCR Applied[Errors], where Applied shows the number of documents that required OCR (not OCRed) and where [Errors] shows the number of those documents that did not meet the specified average confidence level.

    Note: For calculating average document confidence, pages in PDF docs with text behind them are considered 100%. OCR failures are considered 0%.

    Use OCR Workers

    Optional: Select to enable the OCR Worker Task Table drop-down list and select a task table. If a custom task table is selected, Enterprise OCR tasks are sent to those Workers assigned to the selected task table. See Assign Task Tables to Workers and Assign IPRO (Cloud) Workers for additional information.

    OCR Languages

    eCapture includes multi-language OCR capability. The QC document will contain the original OCR languages that were selected for the Discovery Job. A valid multi-language OCR license must be available in order to modify the original selected languages, if necessary.

    To reserve a portion of the multi-language OCR licenses for QC and to keep the Worker from consuming all available licenses, use the Multi-Language OCR License slider located in the Controller System Options dialog box.

    Click OCR Languages to display the Language OCR dialog box.

    After selecting the languages, click OK to close the dialog box. The selected languages display in the OCR Languages field. Place the mouse pointer on the OCR Languages field to display a tool tip that lists all the selected languages that were not visible in the OCR Languages field. The OCR Languages field is a read-only field.

    Click Closedhere to view a list of supported languages.

    • English

    • Arabic

    • Chinese Simplified

    • Chinese Traditional

    • Japanese

    • Korean

    • Afrikaans

    • Albanian

    • Basque

    • Belarusian

    • Bulgarian

    • Catalan

    • Croatian

    • Czech

    • Danish

    • Dutch

    • Estonian

    • Faorese

    • Finnish

    • French

    • Galician

    • German

    • Greek

    • Hungarian

    • Icelandic

    • Indonesian

    • Italian

    • Latvian

    • Lithuanian

    • Macedonian

     

    • Norwegian

    • Polish

    • Portuguese

    • Portuguese Brazil

    • Romanian

    • Russian

    • Serbian

    • Serbian Cyrillic

    • Slovak

    • Slovenian

    • Spanish

    • Swedish

    • Turkish

    • Ukrainian

    Click here to view some Closedcaveats to OCR Language handling.

    English is the only language that is selected by default. The more languages that are selected; the lower the confidence level will be for correctly identifying the languages in a document.

    • If English is selected, Arabic will not be available for selection.

    • If Arabic is selected, all other languages will not be available for selection.

    • If one of the CJK (Chinese, Japanese, Korean) languages are selected, then all remaining CJK languages will not be available for selection. Other languages (excluding Arabic) may be selected.

    • If Chinese Simplified is selected, Chinese Traditional, Japanese, and Korean will not be available for selection.

    • If Chinese Traditional is selected, Chinese Simplified, Japanese, and Korean will not be available for selection.

    • If Japanese is selected, Chinese Simplified, Chinese Traditional, and Korean will not be available for selection.

    • If Korean is selected, Chinese Simplified, Chinese Traditional, and Japanese will not be available for selection.

  5. If you selected Create Search Index and want to select an index location other than the default, click , next to the Index Location field. The User-Specified Index Path Information dialog box displays and contains additional information about user-specified index paths. This option is useful if you want to place the load of indexing on an alternate file server that is not handling other eCapture activities.

    1. Click OK to close the User-Specified Index Path Information dialog box. The Directory Browser dialog box appears.
    2. Navigate to the index location and click OK.

ClosedPassword Handling

Click the Password Handling Options tab to set Password handling options for the case. This tab allows you to add a list of passwords to the case, to unlock password-protected documents encountered while processing jobs, or reviewing documents in the QC application. A password-protected document is defined as a document in which a prompt asks for a password on attempting to open the document in its native application. Otherwise, if the document can be opened and viewed in its native application, the document is not considered password protected. The "Password Applied" flag, found in QC, is checked when the correct password is applied to a protected document.

To add individual passwords:

  1. Click Edit.

  2. Enter a password (one password on each line - do not include delimiters) and press Enter to go to the next line. Repeat this step for each password that must be added to the list.

  3. When finished, click Done.

To load a pre-defined list of passwords:

  1. Click Load. The Open dialog box appears.

  2. Navigate to the password list.

  3. Click Open. The password lists loads.

ClosedProcessing Job Options

The following sections describe how to set Processing Job options at the Case (Project) level and for individual Processing Jobs.

ClosedProcessing: General Options

ClosedProcessing: Excel Options

  1. Click the Excel tab to set the processing options for Excel files.

  2. Process with Outside-In (Stellent) - Select this option to:

    • Allow for faster and more consistent generation of images on the first pass
    • Reduce the amount of time spent manually QCing these document types

    When selected, only Outside-In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time-zone handling options configured for the Processing Job. All files processed by Outside-In (Stellent) receive the Stellent Processed flag in QC.

    The processing output differs when using Outside-In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports are similar if processing was done without Outside-In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside-In (Stellent).

  3. Comments - Set where you want comments displayed. Select from None, At end of sheet, or As displayed on sheet.
  4. Color Depth - Set the Color Depth options. Color processing for Excel files is handled separately from color processing of other types of files. This setting is independent of the General Color Depth.

    Single Page Output Type

    General Color Depth Options

    Rendered as

    Black&White (1-bit)

    Group 4 TIFF

    Grayscale (8-bit)

    LZW TIFF

    256 Color (8-bit)

    LZW TIFF

    True Color (24-bit)

    JPEG

    Multi-Page TIFF Output Type

    General Color Depth Options

    Rendered as

    Black&White (1-bit)

    Group 4 TIFF

    Grayscale (8-bit)

    LZW TIFF

    256 Color (8-bit)

    LZW TIFF

    True Color (24-bit)

    JTIFF - (JPEG compressed TIFF)

  5. Paper Size - Click the drop-down menu and select an output paper size for documents during processing.

    Note: For Excel Only - For Custom[8.5x11.0in], the Custom Paper Size dialog box appears.

    The Custom Paper size defaults to 8.5x11 inches. The range values are shown for both inches and millimeters. Maximum size in inches is 50.00x70.00; for millimeters, it's 1270.00x1778.00. When this option is selected, the document will be processed through the PDF driver (Text-Based PDF creation) regardless of the Flex Processor option selected. OCRing is not applicable in this instance. Export settings will be limited to Text-Based PDF Output only, even if image format is selected. Non-Excel documents will export as usual.

  6. Center on Page - Determines where to center the image on the page.

    • Horizontally

    • Vertically

  7. Page Order - Determines the page order to be used for imaging.

    • As is

    • Down, and then over

    • Over, and then down

  8. Orientation - Determines the orientation of the page at the time of printing.

    • As is

    • Portrait

    • Landscape

  9. Scaling - Specifies whether the image should be scaled and how. If scaling is used, the options are adjusted to a percent of the current size or fit to page.

    • As is

    • Adjust to % normal size

    • Fit to page

  10. If you want to set more granular processing options for Excel files, click the Advanced Options button and the Advanced Excel Imaging dialog box appears.

  11. At the top of the dialog box, set the options for how to handle headers, footers, and other content in the Excel workbook. Click the Defaults button to revert to the default settings for these options, as shown in the following image:

    If you have trouble locating the referenced options in Excel, click Closedhere to view information about how to navigate in Excel to the option.

  12. Set the remaining settings in the Advanced Excel Imaging dialog box.

    The table below provides a list of the available options.

    Setting

    Options

    Date field handling:

    • Replace with date created - will replace with creation date.

    • Replace with date last saved - will replace current date with last saved dated.

    • Replace with comments - displays the Date Field Comments field where you can enter the text that should replace the contents of the date field.

    • Replace with field code

    • Do not replace - will not replace the date (e.g., Macros)

    Header/Footer Filename field handling

    If path or filename options are found in an Excel header or footer, you can select from the following options to handle these occurrences.

    • Replace with filename (no path) - inserts the unqualified filename

    • Replace with filepath - inserts the fully-qualified path of the original file

    • Replace with comments - displays the Header/Footer Filename field comments field where you can enter your own comments

    • Replace with field code - replaces outputs &[Path] and/or &[File]

    • Remove - removes the codes entirely

    Generate metadata

    Select Generate a metadata summary images for each Excel spreadsheet, and then under Spreadsheet Metadata Summary Options select the individual types of metadata to capture.

    • Document Properties

    • Comments

    • Formulas

    • Linked Content - The data collected will include hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag will be added. A separate page entitled Document Properties is generated and is placed at the end of each Microsoft Excel document.

    For more information about metadata, click Closedhere.

    Who creates the metadata? The native program (such as Microsoft Excel or Outlook) creates the metadata and maintains it with the native file (the letter or email).

    What does eCapture do with this data? When a document is processed, the metadata is collected from the document and stored in the database.

    How is metadata useful? It gives you valuable information as to “Who knew what, and when.” It can tell you who wrote a document and who edited it last. It also shows you a file’s revision number, the character count, and many other pieces of information about a file summary image for each Excel spreadsheet.

    Blank page removal

    This option is available if the Remove Blank Pages option is selected under the General Options tab. Select from the following two options to remove blank pages:

    • Based on selected Page Order: Down, then over or Over, then down.

      • If Down, then over is selected, all vertical page columns that are blank will be removed.

      • If Over, then down is selected, all horizontal page rows where all pages in a horizontal run are blank will be removed.

    • Based on both Page Order options: This bases the removal of blank pages on both horizontal page-rows and vertical page-columns.

    Example of Page Removal

    The following example pertains to using a spreadsheet with 12 pages that will be rendered.

      • If the sheet's page order is Over, then down, eCapture removes all horizontal page rows where all pages in a horizontal run are blank. In order to do that, eCapture steps through all HPageBreaks and makes sure the range from the first column to the last column is blank.

      • If eCapture determines that 1-3 is blank, then they will be hidden. If eCapture determines that 4-6 is blank, then they will be hidden, and so on.

      • If the sheet's page order is Down, then over, eCapture will remove all vertical page columns that are blank.

      • If eCapture determines that 1-A is blank, then they will be hidden. If eCapture determines that 2-B is blank, then they will be hidden, and so on.

    By using this algorithm, all blank pages will not be eliminated, though many of them will be.

    Note: All page-hiding is done by setting horizontal regions' RowHeight properties and vertical regions’ ColumnWidth properties to 0.

  13. Click OK to exit the Advanced Excel Imaging dialog box.

ClosedProcessing: Word Options

ClosedProcessing: PowerPoint Options

ClosedData Extract Options

The following steps describe how to set the options available for creating a Data Extract Job.

Set the General Options

Retry errors with Outside In (Stellent) - Used to image Microsoft Office (Excel, Word, and/or PowerPoint) documents. The Outside In (Stellent) option:

When this check box is selected, only Outside In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time zone handling options configured for the Data Extract Job. All files processed by Outside In (Stellent) receive the Stellent Processed flag in QC.

The processing output differs when using Outside In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports will be similar if processing is done without Outside In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside In (Stellent).

Replace tabs with spaces when extracting Excel text - When this check box is selected, the extracted Excel text will look similar to the following:

Column A Column B

Value1 Value2

The column data is separated by a space rather than a tab (which can be, for example, the equivalent of five spaces). Therefore, if the check box is cleared, then the column data of the extracted Excel text is separated by a tab (five spaces) and would look similar to the following:

Column A     Column B

Value1     Value2

Expand Pivot Tables when extracting Excel text - By default, this check box is cleared. If pivot tables exist, then they will be expanded when this check box is selected. A flag is also set in QC to indicate that the Pivot table exists in the worksheet.

Set the OCR Options for a Specific Data Extract Job

Note: If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Discovery Jobs and Data Extract Jobs use the same OCR and Time Zone Handling options. For more information about setting options at the Case (Project) level, see Create a New Case (Project).

 

The OCR Settings available for Data Extract Jobs are outlined in the following table.

Option

Description

OCR images as necessary

Select this check box to OCR images. Images will be OCRed for indexing/language identification if necessary. The OCR text obtained from the image is then passed on to dtSearch for indexing. The OCR will be indexed and available to be searched on in the Flex Processor.

OCR PDF documents

PDFs with no embedded text: perform OCR before indexing or language identification. PDF pages with embedded text (text-behind) will have text extracted. Comments on a PDF file are also extracted.

  1. The OCR text is added to any extracted text from the PDF.

  2. The text obtained through OCR, along with the extracted text from the PDF, is passed to dtSearch for indexing.

  3. The OCR is then indexed and available to be searched in the Flex Processor.

OCR PowerPoint Documents

Select this check box to perform OCR on Microsoft PowerPoint files during Data Extract to get text from embedded content in the slides. This results in slower speeds for PowerPoint files, but more accurate text extraction.

PDF page character threshold

Select a PDF page character threshold and indicate a value. The default value is 25 characters. If the value is less than 25, eCapture sends the page to be OCRed. If necessary, enter a different value.

Minimum average OCR confidence [1-100]

The level range settings are from 1 to 100. The default is 50. The OCR Confidence Level is the average percentage of confidence for each document, for all pages within a document on which OCR was performed. Success or failure of a document for flagging is based on the average confidence level of the document. If the average confidence level is below the selected threshold, the document is flagged in QC with the OCR Low Confidence Flag.

Note: For calculating average document confidence, pages in PDF docs with text behind them are considered 100%. OCR failures are considered 0%.

OCR Languages

eCapture includes multi-language OCR capability. The QC document will contain the original OCR languages that were selected for the Data Extract Job. A valid multi-language OCR license must be available in order to modify the original selected languages, if necessary.

To reserve a portion of the multi-language OCR licenses for QC and to keep the Worker from consuming all available licenses, use the Multi-Language OCR License slider located in the Controller System Options dialog box.

Click OCR Languages to display the Language OCR dialog box.

After selecting the languages, click OK to close the dialog box. The selected languages display in the OCR Languages field. Place the mouse pointer on the OCR Languages field to display a tool tip that lists all the selected languages that were not visible in the OCR Languages field. The OCR Languages field is a read-only field.

Click Closedhere to view a list of supported languages.

  • English

  • Arabic

  • Chinese Simplified

  • Chinese Traditional

  • Japanese

  • Korean

  • Afrikaans

  • Albanian

  • Basque

  • Belarusian

  • Bulgarian

  • Catalan

  • Croatian

  • Czech

  • Danish

  • Dutch

  • Estonian

  • Faorese

  • Finnish

  • French

  • Galician

  • German

  • Greek

  • Hungarian

  • Icelandic

  • Indonesian

  • Italian

  • Latvian

  • Lithuanian

  • Macedonian

 

  • Norwegian

  • Polish

  • Portuguese

  • Portuguese Brazil

  • Romanian

  • Russian

  • Serbian

  • Serbian Cyrillic

  • Slovak

  • Slovenian

  • Spanish

  • Swedish

  • Turkish

  • Ukrainian

Click here to view some Closedcaveats to OCR Language handling.

English is the only language that is selected by default. The more languages that are selected, the lower the confidence level will be for correctly identifying the languages in a document.

  • If English is selected, Arabic will not be available for selection.

  • If Arabic is selected, all other languages will not be available for selection.

  • If one of the CJK (Chinese, Japanese, Korean) languages are selected, then all remaining CJK languages will not be available for selection. Other languages (excluding Arabic) may be selected.

  • If Chinese Simplified is selected, Chinese Traditional, Japanese, and Korean will not be available for selection.

  • If Chinese Traditional is selected, Chinese Simplified, Japanese, and Korean will not be available for selection.

  • If Japanese is selected, Chinese Simplified, Chinese Traditional, and Korean will not be available for selection.

  • If Korean is selected, Chinese Simplified, Chinese Traditional, and Japanese will not be available for selection.

Set the Appropriate Option for Lotus Notes

Set the Appropriate Option for Time Zone Handling

For more information about Time Zone Handling, see How eCapture Handles Dates and Time Zones.

Note: If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Processing and Data Extract Jobs use the same OCR and Time Zone Handling options. For more information about setting options at the Case (Project) level, see Create a New Case (Project).

Closed Common Options

If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Processing and Data Extract jobs use the same OCR and Time Zone Handling options. However, if you are setting job options for a specific Processing or Data Extract job they are set on the General and Data Extraction options tabs, respectively. For more information, see:

  1. Set the OCR options.
    1. OCR pages missing text - Select OCR Pages missing text to OCR pages within documents that are missing text. Optionally, select PDF page character threshold and indicate a value. The default value is 25 characters. The maximum value is 10000. If the value is less than 25, eCapture will send the page to be OCRed. If necessary, enter a different value.
    2. PDF page character threshold - Select a PDF page character threshold and indicate a value. The default value is 25 characters. If the value is less than 25, eCapture will send the page to be OCRed. If necessary, enter a different value.
    3. Minimum average OCR confidence - The level range settings are from 1 up to 100. The default is 50. The OCR Confidence Level is the average of confidence per document, for all pages within a document on which OCR was performed. Success or failure of a document for flagging is based on the average confidence level of the document. If the average confidence level is below the selected threshold, the document will be flagged in QC with the OCR Low Confidence Flag.
    4. OCR languages - eCapture includes multi-language OCR capability. The QC document will contain the original OCR languages that were selected for the Data Extract job. A valid multi-language OCR license must be available in order to modify the original selected languages, if necessary.

      To reserve a portion of the multi-language OCR licenses for QC and to keep the Worker from consuming all available licenses, use the Multi-Language OCR License slider located in the Controller System Options dialog.

      Click OCR Languages to display the Language OCR dialog.

      After selecting the languages, click OK to close the dialog. The selected languages appear in the OCR Languages field. Place the mouse pointer on the OCR Languages field to display a tooltip that lists all the selected languages that were not visible in the OCR Languages field. The OCR Languages field is a read-only field.

      Click here to view some Closedcaveats to OCR Language handling.

      English is the only language that is selected by default. The more languages that are selected; the lower the confidence level will be for correctly identifying the languages in a document.

      • If English is selected, Arabic will not be available for selection.

      • If Arabic is selected, all other languages will not be available for selection.

      • If one of the CKJ (Chinese, Korean, Japanese) languages are selected, then all remaining CKJ languages will not be available for selection. Other languages (excluding Arabic) may be selected.

      • If Chinese Simplified is selected, Chinese Traditional, Korean, and Japanese will not be available for selection.

      • If Chinese Traditional is selected, Chinese Simplified, Korean, and Japanese will not be available for selection.

      • If Korean is selected, Chinese Simplified, Chinese Traditional, and Japanese will not be available for selection.

      • If Japanese is selected, Chinese Simplified, Chinese Traditional, and Korean will not be available for selection.

  2. Set the Time Zone Handling options, as appropriate. The options are:

    • Convert all times to UTC
    • Specify Time Zone

    For more information on Time Zone Handling, see How eCapture Handles Dates and Time Zones.

Note: If you are configuring job options for a specific Processing or Data Extract job, you set these options on the Data Extract Job Options dialog for Data Extract Jobs and on the Processing Job Options > General Options tab, for Processing Jobs.

ClosedFiltering Options

The following sections describe how to define a Flex Processor Rule on the Filtering Tab by clicking on the Manage Flex Processor Rules button. The Flex Processor dialog displays. On the Filtering tab, you can also create a rule using the New Rule Wizard, for more information, see Create Rules By Using the Flex Processor Rules Manager Wizard.

ClosedDefine the Basic Action and Scope of a Flex Processor Rule

ClosedDefine the General Criteria for a Flex Processor Rule

When you first create a Flex Processor Rule, you set basic rule information and then, if necessary, add general criteria for the rule.

To define General Criteria for a Flex Processor Rule:

  1. Check the All Files option if you want to apply the rule to all of the files in the Processing or Data Extract Job. This option is typically used for the first Rule in a Rule set so you can start with everything and then remove or placeholder certain files based on more specific criteria. From the Action drop-down list select Image (if a Processing Job) or Data Extract (if a Data Extract Job).

    The All Files option is an exclusive criterion (it cannot be combined with other criteria).

  2. (Optional) Select Process Job Duplicates and/or Data Extract Job Duplicates and then select the level from their respective drop-down lists. (Selecting one or both of these options enables de-duplication.) The options are:

    • Current: documents which are duplicates of the current document only will be removed

    • Custodian: documents which are duplicates of any document within the custodian will be removed

    • Case (Project): documents which are duplicates of any document within the case (project) will be removed

    • Client: documents which are duplicates of any document within the client will be removed

    Duplicates are determined by matching the MD5 hashes of files.

    • If Advanced Duplicate Checking is enabled, then MD5 hash matches are verified with bit-by-bit comparison before being flagged as a match.
    • File Name Match requires that the filenames of the two files (loose files only, not e-mails) must be the same. Bit-by-bit comparison and file name comparison do not occur for e-mail types.

      Note: If de-duplication is selected all other criteria is not available.

    • A file is checked for duplication when a job starts. At this time, the SelectionIDs are assigned to the documents. These SelectionIDs are closely tied with the order that the documents were discovered. Documents are distributed to workers and it is at this time that the document is checked against all previously "processed" documents (the originals) in line with the selected scope and duplication options.
    • Ensure the appropriate Action is selected. If necessary, determine whether or not a de-duplication flag should be set.
  3. If you selected Process Job Duplicates and/or Data Extract Job Duplicates, set the Scope options:

    • Maintain Family Structure: The action will be performed on a file if the criteria match the file or the file's parent. To look at it from the other direction, if a parent file matches a Rule's criteria, the action of that Rule will be applied to that parent document and all of its children. Only an entire family of documents are considered duplicates. If a parent document is not identified as a duplicate, but its child document is, no documents would be identified as a duplicate and hence no documents removed.
      • Allow Child Originals: If the Process Job Duplicates or Data Extract Duplicates option is checked and the Scope is set to Maintain Family Structure, you have the option to check the Allow Child Originals check box. This option controls how child documents are compared during de-duplication. This allows documents, including loose files, to de-duplicate against child documents predicated on the order they are processed. For example, if two Word documents exist with the same MD5Hash value, one as a child attachment to an Email parent, the other as a loose Parent, the loose Parent (Word document) is removed. However, if the loose Parent (Word document) is encountered before the Email (parent) and its Word (child attachment) the Word (child attachment) is not removed. Leave this option unchecked to force duplicate checks at the parent level only.

        Note: A system-level default can be set by updating the DedupAllowChildOriginals column in the ConfigurationProperties table in the configuration database to either true or false. However, the setting in the Flex Processor rule takes precedence.

        If the Maintain Family Structure option is checked:

        Child items still inherit the status of the parent. If the parent is de-duplicated, the child is also de-duplicated.

        Loose (independent) files can still be filtered if they match the rule criteria or are not selected by rule criteria (no Effective Rule). With de-duplication enabled, loose files will always be checked against parent documents, but have the potential to be checked against child documents ONLY if the parent/child combination are marked as "originals". If the loose file is marked as an original the parent document will still be checked against the loose file, but the child document will not because it inherits its parent's status due to the selected Family Scope.

        For example:

        EM1 (e-mail) as 3 attachments, Doc1_Att, Tiff1_Att, & Excel1_Att. Two independent files, Tiff1 & Excel1, are duplicates of Tiff1_Att and Excel1_Att. The documents are selected in this order:

        EM1

        Doc1_Att

        Tiff1_Att

        Excel1_Att

        Tiff1

        Excel1

        Assuming the parent is not a duplicate, it is then considered an original, as are all of its children. When the loose documents are checked, they are checked against all files, including the children. Because they are duplicates of two of the attachments, they are removed.

        If the documents are selected in this order:

        Tiff1

        Excel1

        EM1

        Doc1_Att

        Tiff1_Att

        Excel1_Att

        the loose files are now considered originals. The parent is checked against these two files; it is not a duplicate, so it is not removed. The attachments, though duplicates of the loose files, inherit the status of the parent, and are also not removed.

    • Treat Documents Individually: The file is evaluated independent of its family. Any document can be considered a duplicate regardless if it is a parent document or a child document.

      EM1 (e-mail) selected for processing

      EM1 is selected to process.

      Doc1 is selected to process as child of EM1 unless a duplicate, not selected if a duplicate.

      Tiff1 is processed as child of EM1 unless a duplicate, not selected if a duplicate.

      Excel1 is processed as child of EM1 unless a duplicate, not selected if a duplicate.

      EM1 not selected (filtered, not a search result, or a duplicate)

      EM1 not selected to process.

      Doc1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

      Tiff1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

      Excel1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

  4. (Optional) Check Allow Child Originals. Allows documents, including loose files, to de-duplicate against child documents. If unchecked, forces duplicate checks at the parent level only. This option is disabled for the Scope: Treat documents individually
  5. (Optional) Check File Size. When File Size is selected for a rule, it applies to the files in the Processing or Data Extract Job which have sizes on disk either greater than or equal to, or less than or equal to, the size specified. The size is expressed in KB. For example, a 1 MB file will be entered as 1024 KB.

  6. (Optional) Check File Types. In the File Types section you can check the file types affected by the rule. eCapture recognizes documents by their actual content and not the file extension. Keep this in mind as you exclude/include file types for a Processing or Data Extract Job. You can filter (exclude) a myriad of file types by simply selecting the file type check box. When the Processing or Data Extract Job runs, it will process only those file types that you want and exclude all others that you selected in the Filters dialog box.

    For example, you discovered a directory containing 15 different types of files. Some of these files were word processing documents. You want to run a Processing Job that includes only Microsoft Word documents.

    There is a separate category for Microsoft Word documents (and subcategories of all the versions of Microsoft Word under the Microsoft Word category) as well as a separate generic Word Processing category which contains subcategories of all other word processing file types such as Lotus Word Pro, WordStar, .RTF, etc. If you check only the box next to Microsoft Word, you would automatically exclude any other type of word processing files that exist in the Discovery Job that you selected. The Processing Job will process those documents that it recognizes as Microsoft Word documents based on their actual content.

    These file types are based on the Oracle® Outside In Technology (formerly Stellent) identification criteria.

    Click Select All to select every file type.

    Click Clear All to clear all the selected file types.

  7. (Optional) You can also specify specific extensions of files you want to be affected by a given rule. Click the button to add the extension to the list. Repeat for each extension.

  8. (Optional) To import a list of file extensions from a .CSV file, click the button. Select the .CSV file and click Open.

    An Import From File progress bar appears. If any errors were encountered during the import, such as duplicates, an Information dialog box appear with the errors.

    • The .CSV file may contain extensions with or without . (period).
    • Make sure that the .CSV file contains only one column of file extensions with each extension occupying its own row, e.g. Range A1 through A50 or Range E1 through E50.
    • The file extensions are alphabetized upon import into the Flex Processor.
  9. If you want to remove a specific extension from the list, select the extension and click the button.

  10. Click the button to remove all extensions from the list.
  1. Check the All Files option if you want to apply the rule to all of the files in the Processing or Data Extract Job. This option is typically used for the first Rule in a Rule set so you can start with everything and then remove or placeholder certain files based on more specific criteria. From the Action drop-down list select Image (if a Processing Job) or Data Extract (if a Data Extract Job).

    The All Files option is an exclusive criterion (it cannot be combined with other criteria).

  2. (Optional) Select Process Job Duplicates and/or Data Extract Job Duplicates and then select the level from their respective drop-down lists. (Selecting one or both of these options enables de-duplication.) The options are:

    • Current: documents which are duplicates of the current document only will be removed

    • Custodian: documents which are duplicates of any document within the custodian will be removed

    • Case (Project): documents which are duplicates of any document within the case (project) will be removed

    • Client: documents which are duplicates of any document within the client will be removed

    Duplicates are determined by matching the MD5 hashes of files.

    • If Advanced Duplicate Checking is enabled, then MD5 hash matches are verified with bit-by-bit comparison before being flagged as a match.
    • File Name Match requires that the filenames of the two files (loose files only, not e-mails) must be the same. Bit-by-bit comparison and file name comparison do not occur for e-mail types.

      Note: If de-duplication is selected all other criteria is not available.

    • A file is checked for duplication when a job starts. At this time, the SelectionIDs are assigned to the documents. These SelectionIDs are closely tied with the order that the documents were discovered. Documents are distributed to workers and it is at this time that the document is checked against all previously "processed" documents (the originals) in line with the selected scope and duplication options.
    • Ensure the appropriate Action is selected. If necessary, determine whether or not a de-duplication flag should be set.
  3. If you selected Process Job Duplicates and/or Data Extract Job Duplicates, set the Scope options:

    • Maintain Family Structure: The action will be performed on a file if the criteria match the file or the file's parent. To look at it from the other direction, if a parent file matches a Rule's criteria, the action of that Rule will be applied to that parent document and all of its children. Only an entire family of documents are considered duplicates. If a parent document is not identified as a duplicate, but its child document is, no documents would be identified as a duplicate and hence no documents removed.
      • Allow Child Originals: If the Process Job Duplicates or Data Extract Duplicates option is checked and the Scope is set to Maintain Family Structure, you have the option to check the Allow Child Originals check box. This option controls how child documents are compared during de-duplication. This allows documents, including loose files, to de-duplicate against child documents predicated on the order they are processed. For example, if two Word documents exist with the same MD5Hash value, one as a child attachment to an Email parent, the other as a loose Parent, the loose Parent (Word document) is removed. However, if the loose Parent (Word document) is encountered before the Email (parent) and its Word (child attachment) the Word (child attachment) is not removed. Leave this option unchecked to force duplicate checks at the parent level only.

        Note: A system-level default can be set by updating the DedupAllowChildOriginals column in the ConfigurationProperties table in the configuration database to either true or false. However, the setting in the Flex Processor rule takes precedence.

        If the Maintain Family Structure option is checked:

        Child items still inherit the status of the parent. If the parent is de-duplicated, the child is also de-duplicated.

        Loose (independent) files can still be filtered if they match the rule criteria or are not selected by rule criteria (no Effective Rule). With de-duplication enabled, loose files will always be checked against parent documents, but have the potential to be checked against child documents ONLY if the parent/child combination are marked as "originals". If the loose file is marked as an original the parent document will still be checked against the loose file, but the child document will not because it inherits its parent's status due to the selected Family Scope.

        For example:

        EM1 (e-mail) as 3 attachments, Doc1_Att, Tiff1_Att, & Excel1_Att. Two independent files, Tiff1 & Excel1, are duplicates of Tiff1_Att and Excel1_Att. The documents are selected in this order:

        EM1

        Doc1_Att

        Tiff1_Att

        Excel1_Att

        Tiff1

        Excel1

        Assuming the parent is not a duplicate, it is then considered an original, as are all of its children. When the loose documents are checked, they are checked against all files, including the children. Because they are duplicates of two of the attachments, they are removed.

        If the documents are selected in this order:

        Tiff1

        Excel1

        EM1

        Doc1_Att

        Tiff1_Att

        Excel1_Att

        the loose files are now considered originals. The parent is checked against these two files; it is not a duplicate, so it is not removed. The attachments, though duplicates of the loose files, inherit the status of the parent, and are also not removed.

    • Treat Documents Individually: The file is evaluated independent of its family. Any document can be considered a duplicate regardless if it is a parent document or a child document.

      EM1 (e-mail) selected for processing

      EM1 is selected to process.

      Doc1 is selected to process as child of EM1 unless a duplicate, not selected if a duplicate.

      Tiff1 is processed as child of EM1 unless a duplicate, not selected if a duplicate.

      Excel1 is processed as child of EM1 unless a duplicate, not selected if a duplicate.

      EM1 not selected (filtered, not a search result, or a duplicate)

      EM1 not selected to process.

      Doc1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

      Tiff1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

      Excel1 is selected to process as normal document unless a duplicate, not selected if a duplicate.

  4. (Optional) Check Allow Child Originals. Allows documents, including loose files, to de-duplicate against child documents. If unchecked, forces duplicate checks at the parent level only. This option is disabled for the Scope: Treat documents individually
  5. (Optional) Check File Size. When File Size is selected for a rule, it applies to the files in the Processing or Data Extract Job which have sizes on disk either greater than or equal to, or less than or equal to, the size specified. The size is expressed in KB. For example, a 1 MB file will be entered as 1024 KB.

  6. (Optional) Check File Types. In the File Types section you can check the file types affected by the rule. eCapture recognizes documents by their actual content and not the file extension. Keep this in mind as you exclude/include file types for a Processing or Data Extract Job. You can filter (exclude) a myriad of file types by simply selecting the file type check box. When the Processing or Data Extract Job runs, it will process only those file types that you want and exclude all others that you selected in the Filters dialog box.

    For example, you discovered a directory containing 15 different types of files. Some of these files were word processing documents. You want to run a Processing Job that includes only Microsoft Word documents.

    There is a separate category for Microsoft Word documents (and subcategories of all the versions of Microsoft Word under the Microsoft Word category) as well as a separate generic Word Processing category which contains subcategories of all other word processing file types such as Lotus Word Pro, WordStar, .RTF, etc. If you check only the box next to Microsoft Word, you would automatically exclude any other type of word processing files that exist in the Discovery Job that you selected. The Processing Job will process those documents that it recognizes as Microsoft Word documents based on their actual content.

    These file types are based on the Oracle® Outside In Technology (formerly Stellent) identification criteria.

    Click Select All to select every file type.

    Click Clear All to clear all the selected file types.

  7. (Optional) You can also specify specific extensions of files you want to be affected by a given rule. Click the button to add the extension to the list. Repeat for each extension.

  8. (Optional) To import a list of file extensions from a .CSV file, click the button. Select the .CSV file and click Open.

    An Import From File progress bar appears. If any errors were encountered during the import, such as duplicates, an Information dialog box appear with the errors.

    • The .CSV file may contain extensions with or without . (period).
    • Make sure that the .CSV file contains only one column of file extensions with each extension occupying its own row, e.g. Range A1 through A50 or Range E1 through E50.
    • The file extensions are alphabetized upon import into the Flex Processor.
  9. If you want to remove a specific extension from the list, select the extension and click the button.

  10. Click the button to remove all extensions from the list.

ClosedDefine the Date Criteria for a Flex Processor Rule

You can set date criteria on a rule, which will narrow the discovery to files based on a specific date range.

Note: E-mails will use E-mail Date, while loose files will be filtered by Last Modified Date. For e-mails with no E-mail Date, you may select a behavior from the drop down list as described in step 3 below.

To define Date Filters:

  1. Select the Filter by Date option.
  2. Specify the date range (Start Date and End Date) for files that you want to select. Only files whose dates fall within the selected range will be selected during discovery sessions. Note: If the work is ongoing, use an end date as far into the future as possible so you may re-use the Rule, if necessary. The filter starts/ends at midnight on the selected date. If the Start Date is 2/12/2004, this includes files created on or after 2/12/2004. Similarly, if the End Date is 2/20/2004 this includes files created on or before 2/20/2004.

  3. (Optional) For e-mails with no E-mail Date, select from one of the following behaviors:

    • Use Creation Date

    • Use Last Modification Date

    • Always Include

    • Never Include

  1. Select the Filter by Date option.
  2. Specify the date range (Start Date and End Date) for files that you want to select. Only files whose dates fall within the selected range will be selected during discovery sessions. Note: If the work is ongoing, use an end date as far into the future as possible so you may re-use the Rule, if necessary. The filter starts/ends at midnight on the selected date. If the Start Date is 2/12/2004, this includes files created on or after 2/12/2004. Similarly, if the End Date is 2/20/2004 this includes files created on or before 2/20/2004.

  3. (Optional) For e-mails with no E-mail Date, select from one of the following behaviors:

    • Use Creation Date

    • Use Last Modification Date

    • Always Include

    • Never Include

ClosedDefine the Search Criteria for a Flex Processor Rule

You can define Search Criteria to be used when a Flex Processor Rule is executed. If you do not run a search, then every item from the Discovery Job will be selected. Otherwise, you can run a search and specify the search criteria when creating Data Extraction Jobs or Processing Jobs.

The search filters the Data Extraction and Processing Job results according to text contained within the files.

Important: If the option, Create dtSearch index during initial discovery, was cleared for a new Discovery Job, then searching is not available for a new Processing or Data Extract Job that includes that non-indexed Discovery Job.

To define the Search Criteria for a Flex Processor Rule:

  1. In the Search Request box, enter the search phrase or the search words. During a word search, parents are automatically selected when a child meets a search requirement. The family settings determine this behavior.
  2. Click located in the upper right portion of the Search Request box to display the Search Request dialog box. This dialog box shows a list of previously run searches conducted for a Case's (Project’s) Processing and/or Data Extract Jobs and the search strings for each of the Processing and/or Data Extract Jobs. The Search Request dialog box can be dragged around the desktop and resized if necessary.

    This feature allows you to use the same search options and search string for a new Processing and/or Data Extract Job rather than manually selecting the search options again and retyping in the same search string.

    Note: If you cancel out of this dialog box, then the search terms remain unchanged.

  3. Select the search item in the listview screen. When you select it, you will see its search string displayed in the text box below.

    Note: Clicking a search item in the listview will replace whatever is in the textbox with the search string of the selected search.

  4. Select one of the following options:

    • Use all search options - to use the search options that were selected for that search item.

    • Use search string only - to change only the search string.

  5. Click OK to replace the search form’s search string with the current contents of the search request textbox. When you click OK, the Search Criteria tab displays again. You can modify the search options, if necessary.

  6. Continue selecting additional options in the Flex Processor Rules Manager. The search will be added to the listview in the Search Request dialog box. You may then select that search item for a future search.
  7. Set the Search For option. For more information, click Closedhere.

    There are 4 options under Search for: Any Words, All Words, Boolean-Search (and, or, not, ...), and Natural Language. Only one can be selected at a time.

    • Any Words: This search request is for unstructured natural language or "plain English" queries. The Boolean operators AND & OR are disregarded. Examples follow:

      • Quotation Marks: You may use "quotation marks" around phrases.

        For example, "personal computer". Quotes are used when the search requires that the words are contiguous and in the order they are indicated.

      • Plus + and Minus - Signs: Add + in front of any word or phrase to require it. Add - in front of any word or phrase or to exclude it.

        Example: "personal computer" -monitor +"flash drive"

    • All Words: This search request is similar to Any Words (previous bullet item), with the exception that all of the words in the search request must be present for a document.
    • Boolean Search: Activates and, or, not, w/5, w/25, and fields under the Search Request box. Use these as you compose your search request. The following table describes Boolean examples/interpretations and additional search options.

      Examples of Boolean Search Terms

      Boolean Usage Example

      Interpretation

      computer and monitor

      both words must be present

      computer or monitor

      either word can be present

      computer w/5 monitor

      computer must occur within 5 words of monitor

      computer not w/5 monitor

      computer must occur, but not within 5 words of monitor

      computer not monitor

      only computer must be present

      [fieldname] contains smith

      the field name must contain smith

      computer w/5 xfirstword

      computer must occur in the first five words

      computer w/5 xlastword

      computer must occur in the last five words

  8. Use Special Characters, if necessary.

    Use ? to match any single character. For example, appl? matches apple or apply

    Use * to match any characters. For example, m*g matches mustang, morning, mug, etc.

    ~~ matches a numeric range. For example, 14~~18 looks for 14, 15, 16, 17, or 18

  9. Click to display the Search Fields dialog box.

  10. Select the metadata field from the list and click OK. For example, if you selected Filename, the Search Request box would contain the following:

    From the Search Request box: (Filename contains ( ))

    The cursor automatically appears between (  )) ready for an entry. Enter the filename. The finished result would look like this:

    From the Search Request box: (Filename contains (ProfessionalRe­port.doc))

  11. To select an additional metadata field, click and repeat the above instructions.
  12. To search for dates, email addresses, or credit card numbers:

    Ensure that the option, Recognize Dates, Email Addresses, and Credit Card Numbers, is selected under Search Indexing in the Discovery Options dialog box for the relevant Discovery Job(s). See Modify a Completed Discovery Job for more information.

    To search for dates (in various formats), email addresses (complete or partial addresses), or credit card numbers, enter:

    • date()  e.g. date(jan 15 2006) or date(15 Jan 06) or any of these other formats:

      date(2006/01/15)

      date(1/15/06)

      date(1-15-06)

      date(The fifteenth of January, two thousand six)

    • mail() -  e.g. mail(sales@iprotech.com) or mail(s*@iprotech.com)

    • creditcard() - e.g. creditcard(5555 6666 9999 3333) or any of these other formats:

      creditcard(5555666699993333)

      creditcard(5555-6666-9999-3333)

  13. Check the Natural Language option if you want to enter natural language text. This option automatically weights the words in an "Any Words" search to disregard words such as AND and OR and focus on the more relevant, less frequently found words. For example, enter the terms Find the memo on ski-induced paralysis to weight "ski-induced" and "paralysis" very high in the search results, helping to weed out hits for "memo".

  14. Check Stemming to extend a search to cover grammatical variations. Use ~ at the end of the word to search for stemming variations. For example, enter the terms fish~ swamp applied~ to find fish, fishing, swamp, as well as applying, applies, and apply.

    Stemming rules are designed to work with the English language. They are stored in the stemming.dat file in the dtSearch folder. The default path starts with the directory you indicated during the eCapture installation followed by \Shared\dtSearch.

  15. Check Phonic to look for words that sound like the word you entered in the search request. For example, enter #Smith to find Smith, Smithe, and Smythe.

    For best results, use a # in front of individual words to be searched phonically. If you simply select Phonic searching under Search Features, the search will apply phonic rules to all words and can return too many inap­propriate results.

  16. Check Synonyms to find synonyms established by eCapture’s dtSearch function or user-defined. Use & at the end of the word to search for its synonyms. For example, enter watchful& monitor to search for the word watchful or its synonyms and/or the word monitor (without synonyms).
  17. Check the Related Words option to support synonym searches. Standard synonyms and related words are supplied by WordNet (supplied with dtSearch and built into eCapture).
  18. Check Fuzzy Searching to find words even if they are misspelled. A search for alphabet with a fuzziness of 1 would also find alphaqet. With a fuzziness of 3, the same search would find both alphaqet and alpkaqet. It is useful for text that may contain typographical errors or that has been scanned and OCRed. Use the slide meter to adjust the fuzzy search level.
  19. Check Include Non-indexed Files as Matches to pull all Non-Indexed files that dtSearch could not Index and whose hits could not be applied. This is a useful option because it can create and apply a flag, such as NON-Indexed File, and then export out only this data collection for review in order to verify that no Privileged or Hot documents were missed. File examples include: PDFs, Graphics, JPEGs, TIFFs, etc.
  20. Click Apply Language Analyzer and create a new rule if you have a job that requires multi-language capability handling. For example, CJK (Chinese, Japanese, Korean) text appears as lines of characters with no spaces between the words. The Language Analyzer provides a way to add customized word breaking and morphological analysis (components, morphemes, which comprise words) to the dtSearch engine. The ApplyLanguageAnalyzer field (FilterManager) carries over to rules for importing, exporting, and Master Rules operations. This option is disabled by default.
  21. Click to display the Search Status dialog. The Rule ID is displayed in the Title Bar. Immediately after the search progress completes, the Search Hits Preview dialog appears. (Note: Not available if the Discovery Job is not completed.) The Search Hits Preview dialog displays the following search results in a grid format for each file that meets the criteria:

    • ItemID

    • Name of the File

    • Score (Percentage Value)

    • Hits - total number of search terms that appear in a single document. For example, the number 7 may indicate that a single term appeared 7 times in the document or that 2 terms appeared a total of 7 times: one term 3 times and the other term 4 times.

    • Location (File’s path)

    • Size of the File

  22. Select an item and click to view the file in its native application. The native application must be installed on the workstation. If it is not, the Windows dialog box appears with a message stating that "Windows cannot open this file:" and offers additional options for opening the file.
  23. To save the results to a .CSV file, click to open the Save As a .CSV File dialog. Navigate to the location to save the file. Accept or change the default filename. Click Save.
  1. In the Search Request box, enter the search phrase or the search words. During a word search, parents are automatically selected when a child meets a search requirement. The family settings determine this behavior.
  2. Click located in the upper right portion of the Search Request box to display the Search Request dialog box. This dialog box shows a list of previously run searches conducted for a Case's (Project’s) Processing and/or Data Extract Jobs and the search strings for each of the Processing and/or Data Extract Jobs. The Search Request dialog box can be dragged around the desktop and resized if necessary.

    This feature allows you to use the same search options and search string for a new Processing and/or Data Extract Job rather than manually selecting the search options again and retyping in the same search string.

    Note: If you cancel out of this dialog box, then the search terms remain unchanged.

  3. Select the search item in the listview screen. When you select it, you will see its search string displayed in the text box below.

    Note: Clicking a search item in the listview will replace whatever is in the textbox with the search string of the selected search.

  4. Select one of the following options:

    • Use all search options - to use the search options that were selected for that search item.

    • Use search string only - to change only the search string.

  5. Click OK to replace the search form’s search string with the current contents of the search request textbox. When you click OK, the Search Criteria tab displays again. You can modify the search options, if necessary.

  6. Continue selecting additional options in the Flex Processor Rules Manager. The search will be added to the listview in the Search Request dialog box. You may then select that search item for a future search.
  7. Set the Search For option. For more information, click Closedhere.

    There are 4 options under Search for: Any Words, All Words, Boolean-Search (and, or, not, ...), and Natural Language. Only one can be selected at a time.

    • Any Words: This search request is for unstructured natural language or "plain English" queries. The Boolean operators AND & OR are disregarded. Examples follow:

      • Quotation Marks: You may use "quotation marks" around phrases.

        For example, "personal computer". Quotes are used when the search requires that the words are contiguous and in the order they are indicated.

      • Plus + and Minus - Signs: Add + in front of any word or phrase to require it. Add - in front of any word or phrase or to exclude it.

        Example: "personal computer" -monitor +"flash drive"

    • All Words: This search request is similar to Any Words (previous bullet item), with the exception that all of the words in the search request must be present for a document.
    • Boolean Search: Activates and, or, not, w/5, w/25, and fields under the Search Request box. Use these as you compose your search request. The following table describes Boolean examples/interpretations and additional search options.

      Examples of Boolean Search Terms

      Boolean Usage Example

      Interpretation

      computer and monitor

      both words must be present

      computer or monitor

      either word can be present

      computer w/5 monitor

      computer must occur within 5 words of monitor

      computer not w/5 monitor

      computer must occur, but not within 5 words of monitor

      computer not monitor

      only computer must be present

      [fieldname] contains smith

      the field name must contain smith

      computer w/5 xfirstword

      computer must occur in the first five words

      computer w/5 xlastword

      computer must occur in the last five words

  8. Use Special Characters, if necessary.

    Use ? to match any single character. For example, appl? matches apple or apply

    Use * to match any characters. For example, m*g matches mustang, morning, mug, etc.

    ~~ matches a numeric range. For example, 14~~18 looks for 14, 15, 16, 17, or 18

  9. Click to display the Search Fields dialog box.

  10. Select the metadata field from the list and click OK. For example, if you selected Filename, the Search Request box would contain the following:

    From the Search Request box: (Filename contains ( ))

    The cursor automatically appears between (  )) ready for an entry. Enter the filename. The finished result would look like this:

    From the Search Request box: (Filename contains (ProfessionalRe­port.doc))

  11. To select an additional metadata field, click and repeat the above instructions.
  12. To search for dates, email addresses, or credit card numbers:

    Ensure that the option, Recognize Dates, Email Addresses, and Credit Card Numbers, is selected under Search Indexing in the Discovery Options dialog box for the relevant Discovery Job(s). See Modify a Completed Discovery Job for more information.

    To search for dates (in various formats), email addresses (complete or partial addresses), or credit card numbers, enter:

    • date()  e.g. date(jan 15 2006) or date(15 Jan 06) or any of these other formats:

      date(2006/01/15)

      date(1/15/06)

      date(1-15-06)

      date(The fifteenth of January, two thousand six)

    • mail() -  e.g. mail(sales@iprotech.com) or mail(s*@iprotech.com)

    • creditcard() - e.g. creditcard(5555 6666 9999 3333) or any of these other formats:

      creditcard(5555666699993333)

      creditcard(5555-6666-9999-3333)

  13. Check the Natural Language option if you want to enter natural language text. This option automatically weights the words in an "Any Words" search to disregard words such as AND and OR and focus on the more relevant, less frequently found words. For example, enter the terms Find the memo on ski-induced paralysis to weight "ski-induced" and "paralysis" very high in the search results, helping to weed out hits for "memo".

  14. Check Stemming to extend a search to cover grammatical variations. Use ~ at the end of the word to search for stemming variations. For example, enter the terms fish~ swamp applied~ to find fish, fishing, swamp, as well as applying, applies, and apply.

    Stemming rules are designed to work with the English language. They are stored in the stemming.dat file in the dtSearch folder. The default path starts with the directory you indicated during the eCapture installation followed by \Shared\dtSearch.

  15. Check Phonic to look for words that sound like the word you entered in the search request. For example, enter #Smith to find Smith, Smithe, and Smythe.

    For best results, use a # in front of individual words to be searched phonically. If you simply select Phonic searching under Search Features, the search will apply phonic rules to all words and can return too many inap­propriate results.

  16. Check Synonyms to find synonyms established by eCapture’s dtSearch function or user-defined. Use & at the end of the word to search for its synonyms. For example, enter watchful& monitor to search for the word watchful or its synonyms and/or the word monitor (without synonyms).
  17. Check the Related Words option to support synonym searches. Standard synonyms and related words are supplied by WordNet (supplied with dtSearch and built into eCapture).
  18. Check Fuzzy Searching to find words even if they are misspelled. A search for alphabet with a fuzziness of 1 would also find alphaqet. With a fuzziness of 3, the same search would find both alphaqet and alpkaqet. It is useful for text that may contain typographical errors or that has been scanned and OCRed. Use the slide meter to adjust the fuzzy search level.
  19. Check Include Non-indexed Files as Matches to pull all Non-Indexed files that dtSearch could not Index and whose hits could not be applied. This is a useful option because it can create and apply a flag, such as NON-Indexed File, and then export out only this data collection for review in order to verify that no Privileged or Hot documents were missed. File examples include: PDFs, Graphics, JPEGs, TIFFs, etc.
  20. Click Apply Language Analyzer and create a new rule if you have a job that requires multi-language capability handling. For example, CJK (Chinese, Japanese, Korean) text appears as lines of characters with no spaces between the words. The Language Analyzer provides a way to add customized word breaking and morphological analysis (components, morphemes, which comprise words) to the dtSearch engine. The ApplyLanguageAnalyzer field (FilterManager) carries over to rules for importing, exporting, and Master Rules operations. This option is disabled by default.
  21. Click to display the Search Status dialog. The Rule ID is displayed in the Title Bar. Immediately after the search progress completes, the Search Hits Preview dialog appears. (Note: Not available if the Discovery Job is not completed.) The Search Hits Preview dialog displays the following search results in a grid format for each file that meets the criteria:

    • ItemID

    • Name of the File

    • Score (Percentage Value)

    • Hits - total number of search terms that appear in a single document. For example, the number 7 may indicate that a single term appeared 7 times in the document or that 2 terms appeared a total of 7 times: one term 3 times and the other term 4 times.

    • Location (File’s path)

    • Size of the File

  22. Select an item and click to view the file in its native application. The native application must be installed on the workstation. If it is not, the Windows dialog box appears with a message stating that "Windows cannot open this file:" and offers additional options for opening the file.
  23. To save the results to a .CSV file, click to open the Save As a .CSV File dialog. Navigate to the location to save the file. Accept or change the default filename. Click Save.

ClosedDefine the Advanced Criteria for a Flex Processor Rule

You can define advanced criteria for a given Flex Processor Rule. These settings identify files for action mapping. These different selection types depend on hash values or Item IDs, which need to be identified in order to be used. NIST NSRL files have already been identified through NIST. The following procedure describes how to set the Advanced Criteria for a given rule.

Important: When loading or importing lists, the existing list is overwritten. If you want to import more than one list, create a separate, additional rule.

  1. If desired, click on the ItemIDs option or the ItemGUIDs option.

    • Filtering by ItemID is typically done when producing files that were part of previous jobs from the same Client. Because ItemIDs apply only within a given Client, importing ItemID lists from other Clients will lead to incorrect results. Importing of Item IDs is useful for targeted TIFFing.

      Note: Item ID list rules will not transfer to other jobs, master rule sets, or case (project) default options. The original item IDs associated with the native files that were included in the selected Discovery job or jobs can be loaded for use in a rule.

    • Filtering by ItemGUIDs (Globally Unique Identifiers) gives a more reliable method to positively identify eCapture Items records for a Client.
  2. Click either the button or the button.
  3. When you select Import From Another Job, the Import from Job dialog displays.

    1. Select the job you want to import from.
    2. Select either:

      • Items Processed - Specify which statuses (e.g. Queued, Error, etc.) to import.

      • Items with no effective rule - This option allows for the capability of using all items not in the results of the selected job.

      The Flex Processor Rules Manager will then place the Item IDs that meet those criteria into the list.

  4. Select Load from File if you want to load a file of Item IDs into a rule. The file’s format should be one Item ID per line, with no punctuation. Only the ItemIDs that are already part of the selected Discovery Jobs of the current Job will be included. Use the Data Extract Import option when creating a new Job to automatically select Discovery Jobs based on the ItemIDs.
  5. If you want to import a list of IDs into the Flex Processor Rules Manager to produce just the desired files from the same PST, click the Load From File button below the E-mail Entry IDs box. A rule with a list of E-mail Entry IDs loaded will apply to the files in the Processing and Data Extract Jobs whose e-mail entry IDs are an exact match.

    • The file’s format is one EntryID per line, with no punctuation. If the PST from which the entry IDs were extracted is not part of the job, there will be no matches for the rule.
    • Flex Processor Rules Manager will match the filenames, without extensions, with the EntryID imported from the file.

    Note: This will not extract files from the containers; nor is it effective for removing e-mail.

  6. If desired, check the NIST NSRL Matches check box. The optional NIST database must be loaded and set up for use with eCapture in order to use this feature. A rule with this selected will apply to the files in the Processing or Data Extract Job whose MD5 hashes match those of files in the NSR Library published by NIST. It is typically used in a Remove rule to eliminate non-responsive files such as OS files.

    The option will be disabled unless the NIST match was completed on all Discovery Jobs that contribute to this Process Job/Data Extract Job. If not all of the discovery jobs have been NIST Matched, the following information message displays when you hover over the exclamation point next to the NIST check box.

    Important: This is an exclusive criterion (it cannot be combined with other criteria).

  7. If desired, check the Custom Hash List Matches check box and then select the HASH list from the drop down menu. The hash lists must be loaded before using this feature.

    • In most cases, the Action will either be Remove or Placeholder. Multiple Custom Hash Lists can be used on one Job; however, a separate rule must be created for each list.
    • When the Job is processed, the MD5 hashes of the times in the job will be matched against the MD5 hashes of the entries in the Custom Hash List. Any matching items will have the appropriate action applied. At this point, the later rules will supersede the earlier rules.
    • In most cases, this option is used with the action of either Remove or Placeholder. Multiple Custom Hash Lists can be used on one Job; however, a separate rule must be created for each list.

    Important: This is an exclusive criterion (it cannot be combined with other criteria).

  8. Click or to load all Parent item IDs or Children item IDs (respectively). The Scope rule is automatically changed to Treat items in a family separately to ensure desired output. Changing the scope rule may produce incorrect output.

    • A Parent item ID rule loads the item IDs for the parent documents. This essentially suppresses embedded file extraction items from being processed.
    • The Child item ID rule loads the item IDs for the attachments. This option allows for attachments to be exported or to be used as a last rule to remove attachments and maintain parent (top level) item IDs only. The processing would be matched to the original source media.

    These rule options are used in conjunction with the Export option, Use filename for Image Key (located in the last export wizard screen when running an export job), in order to maintain the original document numbering as the file goes through each phase in eCapture.

    Important: This feature is grayed out and not available until the Discovery Job has completed.

  1. If desired, click on the ItemIDs option or the ItemGUIDs option.

    • Filtering by ItemID is typically done when producing files that were part of previous jobs from the same Client. Because ItemIDs apply only within a given Client, importing ItemID lists from other Clients will lead to incorrect results. Importing of Item IDs is useful for targeted TIFFing.

      Note: Item ID list rules will not transfer to other jobs, master rule sets, or case (project) default options. The original item IDs associated with the native files that were included in the selected Discovery job or jobs can be loaded for use in a rule.

    • Filtering by ItemGUIDs (Globally Unique Identifiers) gives a more reliable method to positively identify eCapture Items records for a Client.
  2. Click either the button or the button.
  3. When you select Import From Another Job, the Import from Job dialog displays.

    • Select the job you want to import from.
    • Select either:

      • Items Processed - Specify which statuses (e.g. Queued, Error, etc.) to import.

      • Items with no effective rule - This option allows for the capability of using all items not in the results of the selected job.

      The Flex Processor Rules Manager will then place the Item IDs that meet those criteria into the list.

  4. Select Load from File if you want to load a file of Item IDs into a rule. The file’s format should be one Item ID per line, with no punctuation. Only the ItemIDs that are already part of the selected Discovery Jobs of the current Job will be included. Use the Data Extract Import option when creating a new Job to automatically select Discovery Jobs based on the ItemIDs.
  5. If you want to import a list of IDs into the Flex Processor Rules Manager to produce just the desired files from the same PST, click the Load From File button below the E-mail Entry IDs box. A rule with a list of E-mail Entry IDs loaded will apply to the files in the Processing and Data Extract Jobs whose e-mail entry IDs are an exact match.

    • The file’s format is one EntryID per line, with no punctuation. If the PST from which the entry IDs were extracted is not part of the job, there will be no matches for the rule.
    • Flex Processor Rules Manager will match the filenames, without extensions, with the EntryID imported from the file.

    Note: This will not extract files from the containers; nor is it effective for removing e-mail.

  6. If desired, check the NIST NSRL Matches check box. The optional NIST database must be loaded and set up for use with eCapture in order to use this feature. A rule with this selected will apply to the files in the Processing or Data Extract Job whose MD5 hashes match those of files in the NSR Library published by NIST. It is typically used in a Remove rule to eliminate non-responsive files such as OS files.

    The option will be disabled unless the NIST match was completed on all Discovery Jobs that contribute to this Process Job/Data Extract Job. If not all of the discovery jobs have been NIST Matched, the following information message displays when you hover over the exclamation point next to the NIST check box.

    Important: This is an exclusive criterion (it cannot be combined with other criteria).

  7. If desired, check the Custom Hash List Matches check box and then select the HASH list from the drop down menu. The hash lists must be loaded before using this feature.

    • In most cases, the Action will either be Remove or Placeholder. Multiple Custom Hash Lists can be used on one Job; however, a separate rule must be created for each list.
    • When the Job is processed, the MD5 hashes of the times in the job will be matched against the MD5 hashes of the entries in the Custom Hash List. Any matching items will have the appropriate action applied. At this point, the later rules will supersede the earlier rules.
    • In most cases, this option is used with the action of either Remove or Placeholder. Multiple Custom Hash Lists can be used on one Job; however, a separate rule must be created for each list.

    Important: This is an exclusive criterion (it cannot be combined with other criteria).

  8. Click or to load all Parent item IDs or Children item IDs (respectively). The Scope rule is automatically changed to Treat items in a family separately to ensure desired output. Changing the scope rule may produce incorrect output.

    • A Parent item ID rule loads the item IDs for the parent documents. This essentially suppresses embedded file extraction items from being processed.
    • The Child item ID rule loads the item IDs for the attachments. This option allows for attachments to be exported or to be used as a last rule to remove attachments and maintain parent (top level) item IDs only. The processing would be matched to the original source media.

    These rule options are used in conjunction with the Export option, Use filename for Image Key (located in the last export wizard screen when running an export job), in order to maintain the original document numbering as the file goes through each phase in eCapture.

    Important: This feature is grayed out and not available until the Discovery Job has completed.

ClosedAdvanced Options

Alternative system directories may be specified for the output files generated by Discovery, Data Extract, and/or Processing Jobs. This allows you to use larger capacity storage devices. This is also useful for organizing different Cases (Projects) under the same Client that may use different storage devices. The assignment of the system directories is done at the Case (Project) level.

  1. Click on the Advanced tab.

  2. On new and existing Cases (Projects), the paths shown are the defaults and are indicated with informational text in each field. These default paths are the paths that were specified at the time the Client was added. This makes it easy to locate the original paths.

    A directory must exist for each Job type whether it is the default directory or an assigned alternative directory. If an alternative path is cleared from any of the fields, the system reverts to the default path and displays the informational text.

  3. Optional: For the Discovery Job, click , select a location, and click OK. The specified path cannot exceed 100 characters.
  4. Optional: For the Data Extract Job, click , select a location, and click OK. The specified path cannot exceed 100 characters.
  5. Optional: For the Processing Job, click , select a location, and click OK. The specified path cannot exceed 100 characters.
  6. Check Save as system default to save the alternative system directories as the System Default. If this is done, new cases (projects) and jobs will be organized under dedicated directories per client.

The following directory structure is created for each alternative path when the Case (Project) settings are saved and/or a Job is created: Project\Job Type\Job Directory. For example: the alternative specified directory for Discovery Jobs in ProjectID 7 is \\data\ecapture\ClientDirectory\LocationOne, then the directory structure for DiscoveryJobID 77 would be: \\data\ecapture\ClientDirectory\LocationOne\PR000009\DiscoveryJobs\DJ000077

All subdirectories under the Job directory will remain the same for each Job type. When a Job is running, all output files will go to the alternative specified location.

Changes may be made to the assigned system directory at any time. Jobs in progress or completed are not affected by the changes made to the assigned system directories. Directories that are removed will not be deleted on disk. Only new Jobs will use the newly specified locations.

Note: In the Limited Controller, the options can only be modified at the time the Case (Project) is first created.

Different Clients using the same Alternate Paths

Scenario: If two Clients are created and will use the same alternate Job paths, the system creates a unique Identifier (e.g. 5K82SZHA) Client directory for each Client. Therefore, job names may be identical for each Client, but data will not be combined. For example, the paths for each identically named Discovery Job are:

C:\AltPath\5K82SZHA\PR000001\Discovery Jobs\DJ000001

and

C:\AltPath\5KRTOUIY\PR000001\Discovery Jobs\DJ000001

The unique identifier is stored in the Clients table SystemDirectoryName field.

This structure allows each Case (Project) to maintain its own directory when System Wide defaults are used.

Note: If a Client is deleted, the job directories and files are also deleted, but an empty directory structure down to the Job level remains.

ClosedStreaming Discovery Job Options

The following sections describe how to set Streaming Discovery Job options at the Case (Project) level and for individual Streaming Discovery Jobs.

ClosedStreaming Discovery: Discovery Options

ClosedStreaming Discovery: Filtering Options

ClosedStreaming Discovery: Imaging Options

Streaming Imaging options are defined on five different tabs, General, Excel, Word, PowerPoint, and Placeholder. See the following sections for more information.

Important: To define imaging options for the Streaming Discovery Job, you must first select the check box Enable Imaging located on the General tab. Once selected, the imaging options display on all five tabs.

ClosedStreaming Discovery Imaging: General Options

ClosedStreaming Discovery Imaging: Excel Options

  1. Click the Excel tab to set the processing options for Excel files.

  2. Process with Outside-In (Stellent) - Selecting this option to:

    • Allow for faster and more consistent generation of images on the first pass
    • Reduce the amount of time spent manually QCing these document types

    When selected, only Outside-In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time-zone handling options configured for the Processing Job. All files processed by Outside-In (Stellent) receive the Stellent Processed flag in QC.

    The processing output differs when using Outside-In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports are similar if processing was done without Outside-In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside-In (Stellent).

  3. Comments - Set where you want comments displayed. Select from None, At end of sheet, or As displayed on sheet.
  4. Color Depth - Set the Color Depth options. Color processing for Excel files is handled separately from color processing of other types of files. This setting is independent of the General Color Depth.

    Single Page Output Type

    General Color Depth Options

    Rendered as

    Black&White (1-bit)

    Group 4 TIFF

    Grayscale (8-bit)

    LZW TIFF

    256 Color (8-bit)

    LZW TIFF

    True Color (24-bit)

    JPEG

    Multi-Page TIFF Output Type

    General Color Depth Options

    Rendered as

    Black&White (1-bit)

    Group 4 TIFF

    Grayscale (8-bit)

    LZW TIFF

    256 Color (8-bit)

    LZW TIFF

    True Color (24-bit)

    JTIFF - (JPEG compressed TIFF)

  5. Paper Size - Click the drop-down menu and select an output paper size for documents during processing.

    Note: For Excel Only - For Custom[8.5x11.0in], the Custom Paper Size dialog box appears.

    The Custom Paper size defaults to 8.5x11 inches. The range values are shown for both Units: Inches and Millimeters. Maximum size in Inches 50.00x70.00; for Millimeters 1270.00x1778.00. When this option is selected, the document will be processed through the PDF driver (Text-Based PDF creation) regardless of the Flex Processor option selected. OCRing is not applicable in this instance. Export settings will be limited to Text-Based PDF Output only, even if image format is selected. Non-Excel documents will export as usual.

  6. Center on Page - Determines where to center the image on the page.

    • Horizontally

    • Vertically

  7. Page Order - Determines the page order to be used for imaging.

    • As is

    • Down, and then over

    • Over, and then down

  8. Orientation - Determines the orientation of the page at the time of printing.

    • As is

    • Portrait

    • Landscape

  9. Scaling - Specifies whether or not the image should be scaled and how. If scaling is used the options are adjusted to a percentage of the current size, or is modified to fit the page.

    • As is

    • Adjust to % normal size

    • Fit to page

  10. If you want to set more granular processing options for Excel files, click the Advanced Options button. The Advanced Excel Imaging dialog box appears.

  11. At the top of the dialog box, set the options for how to handle headers, footers, and other content in the Excel workbook. Click the Defaults button to revert to the default settings for these options, as shown in the following image:

    If you have trouble locating the referenced options in Excel, click Closedhere to view information about how to navigate in Excel to the option.

  12. Set the remaining settings in the Advanced Excel Imaging dialog box.

    The following table provides a list of the available options.

    Setting

    Options

    Date field handling:

    • Replace with date created - will replace with creation date.

    • Replace with date last saved - will replace current date with last saved dated.

    • Replace with comments - displays the Date Field Comments field where you can enter the text that should replace the contents of the date field.

    • Replace with field code

    • Do not replace - will not replace the date (e.g., Macros)

    Header/Footer Filename field handling

    If path or filename options are found in an Excel header or footer, you can select from the following options to handle these occurrences.

    • Replace with filename (no path) - inserts the unqualified filename

    • Replace with filepath - inserts the fully-qualified path of the original file

    • Replace with comments - displays the Header/Footer Filename field comments field where you can enter your own comments

    • Replace with field code - replaces outputs &[Path] and/or &[File]

    • Remove - removes the codes entirely

    Generate metadata

    Select Generate a metadata summary images for each Excel spreadsheet, and then under Spreadsheet Metadata Summary Options select the individual types of metadata to capture.

    • Document Properties

    • Comments

    • Formulas

    • Linked Content - The data collected will include hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag will be added. A separate page entitled Document Properties is generated and is placed at the end of each Microsoft Excel document.

    For more information about metadata, click Closedhere.

    Who creates the metadata? The native program (such as Microsoft Excel or Outlook) creates the metadata and maintains it with the native file (the letter or email).

    What does eCapture do with this data? When a document is processed, the metadata is collected from the document and stored in the database.

    How is metadata useful? It gives you valuable information as to “Who knew what, and when.” It can tell you who wrote a document and who edited it last. It also shows you a file’s revision number, the character count, and many other pieces of information about a file summary image for each Excel spreadsheet.

    Blank page removal

    This option is available if the Remove Blank Pages option is selected under the General Options tab. Select from the following two options to remove blank pages:

    • Based on selected Page Order: Down, then over or Over, then down.

      • If Down, then over is selected, all vertical page columns that are blank will be removed.

      • If Over, then down is selected, all horizontal page rows where all pages in a horizontal run are blank will be removed.

    • Based on both Page Order options: This bases the removal of blank pages on both horizontal page-rows and vertical page-columns.

    Example of Page Removal

    The following example pertains to using a spreadsheet with 12 pages that will be rendered.

      • If the sheet's page order is Over, then down, eCapture removes all horizontal page rows where all pages in a horizontal run are blank. In order to do that, eCapture steps through all HPageBreaks and makes sure the range from the first column to the last column is blank.

      • If eCapture determines that 1-3 is blank, then they will be hidden. If eCapture determines that 4-6 is blank, then they will be hidden, and so on.

      • If the sheet's page order is Down, then over, eCapture will remove all vertical page columns that are blank.

      • If eCapture determines that 1-A is blank, then they will be hidden. If eCapture determines that 2-B is blank, then they will be hidden, and so on.

    By using this algorithm, all blank pages will not be eliminated, though many of them will be.

    Note: All page-hiding is done by setting horizontal regions' RowHeight properties and vertical regions’ ColumnWidth properties to 0.

  13. Click OK to exit the Advanced Excel Imaging dialog box.

ClosedStreaming Discovery Imaging: Word Options

ClosedStreaming Discovery Imaging: PowerPoint Options

ClosedStreaming Discovery Imaging: Placeholder Options

ClosedStreaming Discovery: Export Options

Option

Description

Select Export Series (optional)

Select from an existing Export Series from the drop-down menu. If an Export Series is not selected, the Enterprise Streaming Discovery Job will not be exported to a review application. However, the job may be manually exported if desired. For more information, see Re-Export a Streaming Discovery Job. If an Export Series is selected, the area below in the dialog box displays the options/settings from that Export Series.

Important: If you are creating images during a Streaming Discovery Job, you must create and select an export series.

Export Interval (min)

This export interval setting dictates how often documents are exported to the specified export destination (Ipro Eclipse or Relativity).

Note:This option is not available unless an Export Series is selected, or a new Export Series is created.

The default setting is 30 minutes for new Case (Projects), where no System default options are in place. This change was made to reduce the number of created exports from large Streaming Discovery Jobs to better manage the volume of exports.

Any Streaming Discovery Jobs initiated under Cases (Projects) created before version 2016.2.0, the five‑ minute default setting remains.

The maximum setting is 60 minutes. If an existing Export Series is selected and the export interval is set to 0, only one Export Job will be created on completing the Enterprise Streaming Discovery Job.

As documents are created, Export Jobs are continuously created (based on the export interval setting). Each Export Job is started immediately on creation regardless of job size.

Only completed families are considered for export. Generally, the longer the interval setting, the more documents for each Export Job. The Enterprise Streaming Discovery Job may complete before all the Export Jobs complete; however, it will not be marked as Complete until the last set of documents start to export.

The Export Jobs inherit the settings from the parent Export Series; including the numbering schema. For direct export to the Review application (Eclipse or Relativity), the same eCapture auto-load rules apply: one load file for each volume.

Create New Export Series

Create a new Direct-to-Eclipse or Direct-to-Relativity Export Series. When a new Export Series is created (for Eclipse or Relativity), the criteria display in the Export Options dialog box as shown in the following figure:

In the previous figure, an existing Export Series was selected and shows the options/settings that were selected for that Export Series.

The bottom section shows the Export Fields that were selected for the Export Series Job. for more information about creating an export series, see Create an Export Series.

For Enterprise Streaming Discovery Job Export Series, the Export Series is Data Extract only.

Save settings as Case (Project) default

Displays when setting options at the Job Level. Select this option to retain these settings for future Enterprise Streaming Discovery Jobs created for the Case (Project).

Auto Publish Errors

Select this option to automatically publish Streaming Discovery node or item level errors (if any) so they may be moved forward to review without having to modify the Streaming Discovery Job and visually inspect the errors. Once it is completed, all remaining failures are published. This option is cleared by default; and if left cleared, no actions are performed.

Once the job completes normally, and if there are node level or item level errors, it will re-queue those errors one time and set the job to publish. The job displays in the Job Queue pane. Once the re-queue is completed, all remaining failures are published.

To see the nodes that were re-queued, open the AutoRequeue.TXT file stored in the Discovery Jobs folder. An example of the data is shown here:

NodeIDRequeue – NodeID: 1313

NodeIDRequeue – NodeID: 1314

NodeIDRequeue – NodeID: 1338

Errors do not get published if the auto publish option is selected on the case (project) level and cleared on the job level. This option is cleared by default; and if left cleared, no actions are performed.

Save as system default

Displays when setting options at the Case (Project) Level. Select this option to retain these settings for future Cases (Projects) created for the Client. The settings are saved to the eCapture Configuration database.

Related Topics