Create a Streaming Discovery Job

Using eCapture, you can create a single, optimized Job type called an Enterprise Streaming Discovery Job. This Job type is unique because:

  • It combines both a traditional Discovery Job and a Data Extract Job.
  • It creates a single Job to push data through to the review process and reduces the number of starts and stops required by traditional methods.
  • Images may be generated during a Streaming Discovery Job allowing for automatic loading into a review platform such as Ipro Eclipse.
  • Document families are available for the review process sooner due to the use of family-based task distribution. All document families keep moving forward through the Enterprise Streaming Discovery phases as soon as they are ready. The Enterprise Streaming Workers are constantly processing data so the data can move through the filtering phase and then to the export phase.

The following procedure describes how to create a Streaming Discovery Job when you have already created your Clients, Cases (Projects), and Custodians.

Note: if you have not created Clients, Cases (Projects), or Custodians, see:

Procedure: Create a Streaming Discovery Job

To create a Streaming Discovery Job:

  1. In the Client Management Tree View, select a Client, then select a Case (Project) for the Client, and select a Custodian. Under the Custodian, right-click the Streaming Discovery Jobs folder and select New Streaming Discovery Job.

  2. You may see a warning that indicates that there are standard Data Extract Jobs and Processing Jobs in the system, and that Streaming Discovery Jobs do not de-duplicate eCapture Jobs. However, the Case Hash Conversion utility was added to the Controller that converts the legacy Jobs (Data Extract and Processing) by looking at the exported documents and rehashing these items by using streaming. Conversion is restricted to emails. For more information, see Modify Streaming Discovery Jobs. When you have finished reading the message, click OK.

    The New Streaming Discovery Job dialog box appears.

  3. Enter a Streaming Discovery Job Name.
  4. Enter a Description.
  5. Enter a Batch ID. A maximum of 20 characters are permitted. This field can be selected for export load files, endorsements, and custom placeholders.
  6. Click the button to open the Directory Browser dialog box.
  7. Select the directory to discover. Use the UNC path to ensure consistent drive mappings for your site configuration. The selected directory displays in the Directories list. If you selected the incorrect directory, simply select it, and click the button. The directory is removed from the Directories list.
  8. Repeat steps 6 and 7 to select additional directories.
  9. Select a task table from the drop-down menu. The task table that displays in the field is based on the last task table selected for the Custodian. For more information about creating task tables, see Create Task Tables.
  10. Select Expedite Job if you want the job moved to the front of the queue. Otherwise, it displays at the end of the queue.
  11. Clear Show Job Options after creation if you do not want the Job Options to appear.
  12. Click OK. The Streaming Discovery Job Options dialog box appears.
  13. Set the Streaming Discovery Job Options, including Discovery, Imaging, Export, and Filtering Options.

    Click the following options for information about each of the four main settings tabs used when configuring a Streaming Discovery Job.

    ClosedStreaming Discovery: Discovery Options

    ClosedStreaming Discovery: Filtering Options

    ClosedStreaming Discovery: Imaging Options

    Streaming Imaging options are defined on five different tabs, General, Excel, Word, PowerPoint, and Placeholder. See the following sections for more information.

    Important: To define imaging options for the Streaming Discovery Job, you must first select the check box Enable Imaging located on the General tab. Once selected, the imaging options display on all five tabs.

    ClosedStreaming Discovery Imaging: General Options

    ClosedStreaming Discovery Imaging: Excel Options

    1. Click the Excel tab to set the processing options for Excel files.

    2. Process with Outside-In (Stellent) - Selecting this option to:

      • Allow for faster and more consistent generation of images on the first pass
      • Reduce the amount of time spent manually QCing these document types

      When selected, only Outside-In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time-zone handling options configured for the Processing Job. All files processed by Outside-In (Stellent) receive the Stellent Processed flag in QC.

      The processing output differs when using Outside-In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports are similar if processing was done without Outside-In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside-In (Stellent).

    3. Comments - Set where you want comments displayed. Select from None, At end of sheet, or As displayed on sheet.
    4. Color Depth - Set the Color Depth options. Color processing for Excel files is handled separately from color processing of other types of files. This setting is independent of the General Color Depth.

      Single Page Output Type

      General Color Depth Options

      Rendered as

      Black&White (1-bit)

      Group 4 TIFF

      Grayscale (8-bit)

      LZW TIFF

      256 Color (8-bit)

      LZW TIFF

      True Color (24-bit)

      JPEG

      Multi-Page TIFF Output Type

      General Color Depth Options

      Rendered as

      Black&White (1-bit)

      Group 4 TIFF

      Grayscale (8-bit)

      LZW TIFF

      256 Color (8-bit)

      LZW TIFF

      True Color (24-bit)

      JTIFF - (JPEG compressed TIFF)

    5. Paper Size - Click the drop-down menu and select an output paper size for documents during processing.

      Note: For Excel Only - For Custom[8.5x11.0in], the Custom Paper Size dialog box appears.

      The Custom Paper size defaults to 8.5x11 inches. The range values are shown for both Units: Inches and Millimeters. Maximum size in Inches 50.00x70.00; for Millimeters 1270.00x1778.00. When this option is selected, the document will be processed through the PDF driver (Text-Based PDF creation) regardless of the Flex Processor option selected. OCRing is not applicable in this instance. Export settings will be limited to Text-Based PDF Output only, even if image format is selected. Non-Excel documents will export as usual.

    6. Center on Page - Determines where to center the image on the page.

      • Horizontally

      • Vertically

    7. Page Order - Determines the page order to be used for imaging.

      • As is

      • Down, and then over

      • Over, and then down

    8. Orientation - Determines the orientation of the page at the time of printing.

      • As is

      • Portrait

      • Landscape

    9. Scaling - Specifies whether or not the image should be scaled and how. If scaling is used the options are adjusted to a percentage of the current size, or is modified to fit the page.

      • As is

      • Adjust to % normal size

      • Fit to page

    10. If you want to set more granular processing options for Excel files, click the Advanced Options button. The Advanced Excel Imaging dialog box appears.

    11. At the top of the dialog box, set the options for how to handle headers, footers, and other content in the Excel workbook. Click the Defaults button to revert to the default settings for these options, as shown in the following image:

      If you have trouble locating the referenced options in Excel, click Closedhere to view information about how to navigate in Excel to the option.

    12. Set the remaining settings in the Advanced Excel Imaging dialog box.

      The following table provides a list of the available options.

      Setting

      Options

      Date field handling:

      • Replace with date created - will replace with creation date.

      • Replace with date last saved - will replace current date with last saved dated.

      • Replace with comments - displays the Date Field Comments field where you can enter the text that should replace the contents of the date field.

      • Replace with field code

      • Do not replace - will not replace the date (e.g., Macros)

      Header/Footer Filename field handling

      If path or filename options are found in an Excel header or footer, you can select from the following options to handle these occurrences.

      • Replace with filename (no path) - inserts the unqualified filename

      • Replace with filepath - inserts the fully-qualified path of the original file

      • Replace with comments - displays the Header/Footer Filename field comments field where you can enter your own comments

      • Replace with field code - replaces outputs &[Path] and/or &[File]

      • Remove - removes the codes entirely

      Generate metadata

      Select Generate a metadata summary images for each Excel spreadsheet, and then under Spreadsheet Metadata Summary Options select the individual types of metadata to capture.

      • Document Properties

      • Comments

      • Formulas

      • Linked Content - The data collected will include hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag will be added. A separate page entitled Document Properties is generated and is placed at the end of each Microsoft Excel document.

      For more information about metadata, click Closedhere.

      Who creates the metadata? The native program (such as Microsoft Excel or Outlook) creates the metadata and maintains it with the native file (the letter or email).

      What does eCapture do with this data? When a document is processed, the metadata is collected from the document and stored in the database.

      How is metadata useful? It gives you valuable information as to “Who knew what, and when.” It can tell you who wrote a document and who edited it last. It also shows you a file’s revision number, the character count, and many other pieces of information about a file summary image for each Excel spreadsheet.

      Blank page removal

      This option is available if the Remove Blank Pages option is selected under the General Options tab. Select from the following two options to remove blank pages:

      • Based on selected Page Order: Down, then over or Over, then down.

        • If Down, then over is selected, all vertical page columns that are blank will be removed.

        • If Over, then down is selected, all horizontal page rows where all pages in a horizontal run are blank will be removed.

      • Based on both Page Order options: This bases the removal of blank pages on both horizontal page-rows and vertical page-columns.

      Example of Page Removal

      The following example pertains to using a spreadsheet with 12 pages that will be rendered.

        • If the sheet's page order is Over, then down, eCapture removes all horizontal page rows where all pages in a horizontal run are blank. In order to do that, eCapture steps through all HPageBreaks and makes sure the range from the first column to the last column is blank.

        • If eCapture determines that 1-3 is blank, then they will be hidden. If eCapture determines that 4-6 is blank, then they will be hidden, and so on.

        • If the sheet's page order is Down, then over, eCapture will remove all vertical page columns that are blank.

        • If eCapture determines that 1-A is blank, then they will be hidden. If eCapture determines that 2-B is blank, then they will be hidden, and so on.

      By using this algorithm, all blank pages will not be eliminated, though many of them will be.

      Note: All page-hiding is done by setting horizontal regions' RowHeight properties and vertical regions’ ColumnWidth properties to 0.

    13. Click OK to exit the Advanced Excel Imaging dialog box.

    ClosedStreaming Discovery Imaging: Word Options

    ClosedStreaming Discovery Imaging: PowerPoint Options

    ClosedStreaming Discovery Imaging: Placeholder Options

    ClosedStreaming Discovery: Export Options

    Option

    Description

    Select Export Series (optional)

    Select from an existing Export Series from the drop-down menu. If an Export Series is not selected, the Enterprise Streaming Discovery Job will not be exported to a review application. However, the job may be manually exported if desired. For more information, see Re-Export a Streaming Discovery Job. If an Export Series is selected, the area below in the dialog box displays the options/settings from that Export Series.

    Important: If you are creating images during a Streaming Discovery Job, you must create and select an export series.

    Export Interval (min)

    This export interval setting dictates how often documents are exported to the specified export destination (Ipro Eclipse or Relativity).

    Note:This option is not available unless an Export Series is selected, or a new Export Series is created.

    The default setting is 30 minutes for new Case (Projects), where no System default options are in place. This change was made to reduce the number of created exports from large Streaming Discovery Jobs to better manage the volume of exports.

    Any Streaming Discovery Jobs initiated under Cases (Projects) created before version 2016.2.0, the five‑ minute default setting remains.

    The maximum setting is 60 minutes. If an existing Export Series is selected and the export interval is set to 0, only one Export Job will be created on completing the Enterprise Streaming Discovery Job.

    As documents are created, Export Jobs are continuously created (based on the export interval setting). Each Export Job is started immediately on creation regardless of job size.

    Only completed families are considered for export. Generally, the longer the interval setting, the more documents for each Export Job. The Enterprise Streaming Discovery Job may complete before all the Export Jobs complete; however, it will not be marked as Complete until the last set of documents start to export.

    The Export Jobs inherit the settings from the parent Export Series; including the numbering schema. For direct export to the Review application (Eclipse or Relativity), the same eCapture auto-load rules apply: one load file for each volume.

    Create New Export Series

    Create a new Direct-to-Eclipse or Direct-to-Relativity Export Series. When a new Export Series is created (for Eclipse or Relativity), the criteria display in the Export Options dialog box as shown in the following figure:

    In the previous figure, an existing Export Series was selected and shows the options/settings that were selected for that Export Series.

    The bottom section shows the Export Fields that were selected for the Export Series Job. for more information about creating an export series, see Create an Export Series.

    For Enterprise Streaming Discovery Job Export Series, the Export Series is Data Extract only.

    Save settings as Case (Project) default

    Displays when setting options at the Job Level. Select this option to retain these settings for future Enterprise Streaming Discovery Jobs created for the Case (Project).

    Auto Publish Errors

    Select this option to automatically publish Streaming Discovery node or item level errors (if any) so they may be moved forward to review without having to modify the Streaming Discovery Job and visually inspect the errors. Once it is completed, all remaining failures are published. This option is cleared by default; and if left cleared, no actions are performed.

    Once the job completes normally, and if there are node level or item level errors, it will re-queue those errors one time and set the job to publish. The job displays in the Job Queue pane. Once the re-queue is completed, all remaining failures are published.

    To see the nodes that were re-queued, open the AutoRequeue.TXT file stored in the Discovery Jobs folder. An example of the data is shown here:

    NodeIDRequeue – NodeID: 1313

    NodeIDRequeue – NodeID: 1314

    NodeIDRequeue – NodeID: 1338

    Errors do not get published if the auto publish option is selected on the case (project) level and cleared on the job level. This option is cleared by default; and if left cleared, no actions are performed.

    Save as system default

    Displays when setting options at the Case (Project) Level. Select this option to retain these settings for future Cases (Projects) created for the Client. The settings are saved to the eCapture Configuration database.

  14. When you have finished setting up the job, click OK.

    The Enterprise Streaming Discovery Job displays in the eCapture Controller Job Queue and is ready to be started. When the Job finishes, it disappears from the Job Queue.

  15. In the Client Management Tree View, select the Streaming Discovery Job to view the Status and Summary Panel, the pane on the right.

    In the following figure, the Job Status indicates Complete.

    The value next to Processed Items indicates the number of items found, in this example 18 items were found. There are other counts that include:

    • Processing Errors – Indicates the number of encountered processing errors for the Job.
    • OCR Applied [Errors] – If OCR was enabled for the Job, the first value represents the number of pages that had OCR. The value in the brackets indicates the number of OCR errors.
    • Unextractable Items – indicates the number of items that could not get extracted.
    • Unpublished Errors – This field indicates the number of errors that can be published either individually for the job or as part of the entire case.
    • Filtered Items [Duplicates] – Shows total filtered items count (e.g. date ranges, file type filters, de-duplication and de-NIST) and the number of duplicates in brackets.
    • Exported - Indicates the number of documents that were exported for the Job, the number of Export Errors, and the number of Loaded [Errors]. The previous figure indicates exporting options were included for this Job. If they were not, then exporting may take place manually. For more information, see Re-Export a Streaming Discovery Job. In the previous figure, the Export Grid column indicates 4 Ipro (Cloud) Streaming Discovery Jobs with a total of 6 exported documents without any Export or Loaded errors.

      • Exported [Errors] - Shows the total number of exported documents with any encountered errors in brackets.
      • Loaded [Errors] - Shows the total number of exported documents with any autoload errors (for either Ipro Eclipse or Relativity) in brackets.

        Note: For Streaming Discovery and Export Jobs that are not autoloading into Eclipse or Relativity; and for incomplete Export Jobs, the Loaded field is blank.

      Exported and Loaded document counts reflect completed Export jobs only. As Export Jobs connected to Streaming Discovery Jobs are completed, the counts increase. The Loaded count matches the Exported documents count unless errors occur during an Eclipse or Relativity autoload.

Use the Export Jobs Grid

The Export Jobs grid is useful for associating Enterprise Streaming Discovery Jobs with Data Extract Export Jobs and for determining when multiple Enterprise Streaming Discovery Jobs are associated with a single Export Series. In addition, the Export Jobs grid shows the ID number, name of the Enterprise Streaming Discovery Job, its Completion status, Exported [Errors], and Loaded [Errors].

  1. Double-click a Streaming Discovery Job in the Export Jobs grid.

    The Client Management Tree View refreshes highlighting the selected Export Job and the Status and Summary Panel, on the right, displays the details about the completed job.

  2. Click View Settings to view the setting information for the selected job.

  3. Click View Output to view the directory containing the extracted data.

  4. Click the View Report button to view the Export Summary Report.

The Enterprise Streaming Discovery Job may be exported as many times as required. Each time it is exported, the Exported documents count increases by the number of unfiltered/non-duplicate documents in the Enterprise Streaming Discovery Job.

When the Enterprise Streaming Discovery Job is deleted, the Export Series is also deleted.

The ‘CustomExport’ stored procedure in the Client database includes a parameter. The value in this parameter indicates whether the Export Job is the last in the Export Series. If associated Export Jobs are pending or in progress, the empty stored procedure is set to a value of ‘0’ indicating it is not the last Export Job in the Export Series. If the completed Export Job is the last job in the Export Series, the empty stored procedure is set to a value of ‘1’ indicating it is the final Export Job of the Export Series. If errors are re-queued and/or published, the parameter is not modified. for more information about adding custom stored procedures, during the installation database update phase, see Work with Custom Stored Procedures.

 

Related Topics

Overview: Enterprise Streaming Discovery