Create a Streaming Discovery Job
Using eCapture, you can create a single, optimized Job type called an Enterprise Streaming Discovery Job. This Job type is unique because:
- It combines both a traditional Discovery Job and a Data Extract Job.
- It creates a single Job to push data through to the review process and reduces the number of starts and stops required by traditional methods.
- Images may be generated during a Streaming Discovery Job allowing for automatic loading into a review platform such as Ipro Eclipse.
- Document families are available for the review process sooner due to the use of family-based task distribution. All document families keep moving forward through the Enterprise Streaming Discovery phases as soon as they are ready. The Enterprise Streaming Workers are constantly processing data so the data can move through the filtering phase and then to the export phase.
The following procedure describes how to create a Streaming Discovery Job when you have already created your Clients, Cases (Projects), and Custodians.
|
Note: if you have not created Clients, Cases (Projects), or Custodians, see: |
Procedure: Create a Streaming Discovery Job
To create a Streaming Discovery Job:
-
In the Client Management Tree View, select a Client, then select a Case (Project) for the Client, and select a Custodian. Under the Custodian, right-click the Streaming Discovery Jobs folder and select New Streaming Discovery Job.
-
You may see a warning that indicates that there are standard Data Extract Jobs and Processing Jobs in the system, and that Streaming Discovery Jobs do not de-duplicate eCapture Jobs. However, the Case Hash Conversion utility was added to the Controller that converts the legacy Jobs (Data Extract and Processing) by looking at the exported documents and rehashing these items by using streaming. Conversion is restricted to emails. For more information, see Modify Streaming Discovery Jobs. When you have finished reading the message, click OK.
The New Streaming Discovery Job dialog box appears.
- Enter a Streaming Discovery Job Name.
- Enter a Description.
- Enter a Batch ID. A maximum of 20 characters are permitted. This field can be selected for export load files, endorsements, and custom placeholders.
- Click the button to open the Directory Browser dialog box.
- Select the directory to discover. Use the UNC path to ensure consistent drive mappings for your site configuration. The selected directory displays in the Directories list. If you selected the incorrect directory, simply select it, and click the button. The directory is removed from the Directories list.
- Repeat steps 6 and 7 to select additional directories.
- Select a task table from the drop-down menu. The task table that displays in the field is based on the last task table selected for the Custodian. For more information about creating task tables, see Create Task Tables.
- Select Expedite Job if you want the job moved to the front of the queue. Otherwise, it displays at the end of the queue.
- Clear Show Job Options after creation if you do not want the Job Options to appear.
- Click OK. The Streaming Discovery Job Options dialog box appears.
-
Set the Streaming Discovery Job Options, including Discovery, Imaging, Export, and Filtering Options.
Click the following options for information about each of the four main settings tabs used when configuring a Streaming Discovery Job.
Streaming Discovery: Discovery Options
Option
Description
Container Handling
PDF Portfolio files allow email boxes to be stored/converted within a folder structure. As of 2018.5.2, this folder structure information is extracted and available for export in the existing ‘MailFolder’ metadata field.
-
Treat archives as directories: Select this option if you want the files in the archived folder to be treated as parent and child docs when running a Discovery Job. In addition, WINMAIL.DAT attachments are treated like archives and will be processed like .ZIP files. The following are treated like archive files:
FI_ZIP = 1802
FI_ZIPEXE = 1803
FI_ARC = 1804
FI_TAR = 1807
FI_STUFFIT = 1812
FI_LZH = 1813
FI_LZH_SFX = 1814
FI_GZIP = 1815
IPRO_FI_RAR = 13000
FI_TNEF = 1197
-
Treat PDF Portfolios/Packages as Containers: This option is selected by default. The PDF Portfolio file is treated as a directory and its contents extracted and treated as loose files (except children of the contained PDFs). The PDF Portfolio will not be treated as an item, only as a container in the Nodes table. Documents inside the PDF package are treated as parent files. If this option is not selected, the PDF Portfolio file will be treated as a file parent and its contents extracted and treated as attachments in the items table. The PDF Portfolio will be treated as an item and can be processed/filtered/exported.
Enable File Extraction
The Enable File Extraction check box is selected by default. The related Extract options are also selected by default and may be cleared independently, if desired.
If the Enable File Extraction check box is cleared (the related Extract options are also cleared) and data is submitted for extraction; no extraction occurs from file types, such as mail stores and archives. This enables documents to be sent through Streaming Discovery knowing that all the docs were already extracted including file parents (e.g., emails and edocs).
Note: Node records are generated for container files such as .PST, .NSF, and archives; however, no items are extracted. The status indicator states: "No Content extracted, file extraction disabled by user".
-
Extract email inline images: When enabled, inline images in email messages (e.g., signature files) are extracted as attachments and treated as child documents. Apple Mail Message (EMLX) files are supported. The attachments for EMLX files are extracted from the emails and it recognizes and handles the inline images. When EMLX files are processed or data extracted, they are treated as emails. The output resembles an email displayed in Outlook Express or Outlook.
When disabled, inline images are not extracted as children. The images are not treated as separate documents, and therefore are not OCRed, language-identified, or indexed. The images are rendered inline as they would look in the native file.
Black Ice™ does not return text for any images that are printed. Therefore, extracted text for the (parent) document does not include text from the inline image.
The images are only OCRed if the image it is printed on does not have any text, and the option OCR Pages Missing Text is enabled under the Processing Job, General Options tab.
-
Extract Embedded Files: An embedded file is an object that has been inserted into a document and, if extracted, can act as a standalone document. This option consolidates Excel documents, Word documents, PowerPoint documents, Email File Attachments (Outlook.FileAttach), Visio drawings, Package-Embedded documents, Acrobat documents, Email Message Attachments (MailMsgATT), and Email File Attachments (MailFileAtt).
When selected, the embedded files are extracted as separate documents and treated as child documents. If this option is not selected, then the embedded files are not extracted as separate documents.
All files embedded inside of non-emails (e-docs) are extracted. These files are sent through the discovery, text extraction, metadata extraction and export with their parent. However, if this option is not selected, all files embedded inside of non-emails (edocs) are not extracted. They are ignored and only the parent document is processed.
OCR
-
OCR images: Images are OCRed to retrieve any available text from the image. The OCR is available for indexing and searching in the Review application.
-
OCR PDF Pages Missing Text: PDFs with no embedded text perform OCR before indexing or language identification. PDF pages with embedded text (text-behind) will have text extracted. Comments on a PDF file are also extracted. The OCR text is added to any extracted text from the PDF. All text is available for indexing and searching in the Review application.
Optionally, select the option OCR any page with fewer than n characters and indicate a value. The default value is 25 characters. The maximum value is 10000. If the value is less than 25, eCapture will send the page to be OCRed; otherwise, the text will just be extracted. If necessary, enter a different value.
-
OCR PowerPoint Documents: Turn this option on to perform OCR on Microsoft PowerPoint files during indexing to get text from embedded content in the slides. This results in slower indexing speeds for PowerPoint files, but more accurate search results.
-
Minimum average OCR confidence (1-100): The level range settings are from 1 to 100. The default is 50. The confidence level is the average percentage of confidence for each document for all pages within a document on which OCR was performed. Success or failure of OCR results is based on the average confidence level of the document. If the average confidence level is below the selected threshold, the page is considered as an OCR error.
The Discovery Job Status and Summary Panel displays OCR Applied[Errors], where Applied shows the number of pages that required OCR (not OCRed) and where [Errors] shows the number of those pages that did not meet the specified average confidence level.
Note: For calculating average page confidence, pages in PDF docs with text behind them are considered 100%. OCR failures are considered 0%.
-
Use OCR Workers: Select this option to simultaneously create an Enterprise OCR job with the Enterprise Streaming Discovery Job. The Job remains active until the Enterprise Streaming Discovery Job is complete.
OCR must be complete before the document is eligible for export.Workers that are Enterprise Eligible or Enterprise Exclusive will accept OCR tasks if licensing is available. A different task table may be specified for Enterprise OCR Workers.
Selecting this option can improve performance. If the Use OCR Workers option is not selected, OCR tasks are assigned to licensed Enterprise Streaming Discovery Workers.
OCR Worker Task Table: If a custom task table is selected from the drop-down menu,Enterprise OCR tasks are sent to those Workers assigned to the selected task table.
Note: For information about the OCR Worker Task Table, see Create Task Tables and Assign Task Tables to Workers.
-
OCR Languages: eCapture includes multi-language OCR capability. The QC document contains the original OCR languages that were selected for the Data Extract Job. A valid multi-language OCR license must be available in order to modify the original selected languages, if necessary.
To reserve a portion of the multi-language OCR licenses for QC and to keep the Worker from consuming all available licenses, use the Multi-Language OCR License slider located in the Controller System Options dialog box.
Click OCR Languages to display the Language OCR dialog box.
After selecting the languages, click OK to close the dialog box. The selected languages appear in the OCR Languages field. Place the mouse pointer on the OCR Languages field to display a tool tip that lists all the selected languages that were not visible in the OCR Languages field. The OCR Languages field is a read-only field.
Click here to view a list of supported languages.
-
English
-
Arabic
-
Chinese Simplified
-
Chinese Traditional
-
Japanese
-
Korean
-
Afrikaans
-
Albanian
-
Basque
-
Belarusian
-
Bulgarian
-
Catalan
-
Croatian
-
Czech
-
Danish
-
Dutch
-
Estonian
-
Faorese
-
Finnish
-
French
-
Galician
-
German
-
Greek
-
Hungarian
-
Icelandic
-
Indonesian
-
Italian
-
Latvian
-
Lithuanian
-
Macedonian
-
Norwegian
-
Polish
-
Portuguese
-
Portuguese Brazil
-
Romanian
-
Russian
-
Serbian
-
Serbian Cyrillic
-
Slovak
-
Slovenian
-
Spanish
-
Swedish
-
Turkish
-
Ukrainian
Click here to view some caveats to OCR Language handling.
English is the only language that is selected by default. The more languages that are selected; the lower the confidence level will be for correctly identifying the languages in a document.
-
If English is selected, Arabic will not be available for selection.
-
If Arabic is selected, all other languages will not be available for selection.
-
If one of the CJK (Chinese, Japanese, Korean) languages are selected, then all remaining CJK languages will not be available for selection. Other languages (excluding Arabic) may be selected.
-
If Chinese Simplified is selected, Chinese Traditional, Japanese, and Korean will not be available for selection.
-
If Chinese Traditional is selected, Chinese Simplified, Japanese, and Korean will not be available for selection.
-
If Japanese is selected, Chinese Simplified, Chinese Traditional, and Korean will not be available for selection.
-
If Korean is selected, Chinese Simplified, Chinese Traditional, and Japanese will not be available for selection.
-
Time Zone Handling
-
Convert all times to UTC: Default setting.
-
Specify Time Zone: Select this option to specify a time zone to convert original times to the times for the selected time zone. For example, you might select the time zone of the workstation where the files originated. The selected time zone will be applied to Metadata output from the IPRO (Cloud) Streaming Discovery worker. Updates to extracted text will only be applied to the header of emails (the Sent Date).
For more information about Time Zone Handling, see How eCapture Handles Dates and Time Zones
De-duplication
A list of matching hash values is retrieved for each parent document. The de-duplication scope is determined by grouping the results by Case (Project) - e.g., all documents or by Custodian.
De-duplication occurs after Date, File Type and File Extension filters are applied.
De-duplication is always performed at the parent level. If a parent is marked as a duplicate, then it, along with the rest of its family, is not exported.
From the de-duplication drop-down menu, select one of the following:
-
Custodian: (default option) Documents that are duplicates of any documents within the Custodian are removed.
-
Case (Project): Documents that are duplicates of any documents within the Case (Project) are removed.
-
None: All documents including duplicates are exported.
Displays the Custom Email Hash dialog box. Select from the following options:
Some emails may have identical values in the properties that eCapture uses to generate hashes; however, the values may differ in the attachment contents. Family hash accounts for this by using the hash values of the extracted attachments to calculate a second hash for the email parent.
De-duplication may be performed on parent hash values rather than family hash values for newly created Streaming Discovery Jobs that are using version 2016.3.3 only. (Note: Existing Streaming Discovery Jobs retain the family hash setting.) The default setting uses family hash.
This setting is found in the DedupUseFamilyHash field of the ConfigurationProperties table for the eCapture Configuration database. The default value is 1. To switch to parent hash value, change the value from 1 to 0 in the DedupUseFamilyHash field. If the value is set to 0 in the ConfigurationProperties table, then family hashes will not be considered when applying de-duplication.
The method of gathering and creating the MD5 hash values changed for newly created Cases (Projects). Hashing of emails uses Coordinated Universal Time (UTC) to ensure proper de-duplication across time zones.
In most cases, MD5 hash values are calculated on the file itself. For more reliable de-duplication of emails though, it is required that de-duplication occur on the information contained within it and not the file itself. There are many reasons for this; the simplest is that when an email is saved out of its container (PST, NSF, etc.), the file created contains information that would change the hash value of the same email each time the email is saved out.
When an email is discovered within eCapture, it is assigned a hash value based on fields chosen by the user. The values of these fields are concatenated, and the text is hashed. Select from the following email fields to generate the hash value:
-
Subject
-
From/Author
-
Attachment Count
-
Body Whitespace - Whitespace in the email body could cause slight differences between the same emails, which could result in different hashes being generated.
On the Body Whitespace drop-down menu, select either Remove or Include (default). Remove - removes all whitespace between lines of text in the email body before hashing. Include - keeps the whitespace.
-
Email Date: The following message types use the specified date values: Outlook: Sent Date, IBM Notes: Posted Date, RFC822: Date, and GroupWise: Delivered Date.
-
Attachment Names
-
Recipients
-
CC
-
BCC
Select from either Creation Date or Last Modification Date. The selected value will be used when calculating the MD5 hash value if the normal Email Date value is not present. This commonly occurs for Draft messages that have not been sent.
Start Time is always used if it exists.
By default, Subject, From/Author, Email Date, and an Alternate Email Date of Creation Date are used for email hash generation.Save settings as Case (Project) default
Displays when setting options at the Job level. Select this option to retain these settings for future Enterprise Streaming Discovery Jobs created for the Case (Project).
Save as system default
Displays when setting options at the Case (Project) level. Select this option to retain these settings for future Cases (Projects) created for the Client. The settings are saved to the eCapture Configuration database.
Streaming Discovery: Filtering Options
Option
Description
Dates
Date filters are applied to parent documents only. Date filter dates are in Coordinated Universal Time (UTC). For emails, Sent Date is used. There is no alternate date fall back. All emails are included that do not have a Sent Date (draft emails, etc.) as well. For non-emails, Last Modified Date is used. Any item that does not have a Last Modified Date is included.
All Documents: By default, all document families are exported unless the option Specify Date Range/s is chosen.
-
Specify Date Range/s: When this option is chosen, a single date-range pick list displays and defaults to the date range of the document set. The default time for the beginning date in the range is 12:00AM and the default time for the ending date in the range is 11:59PM. These default times apply to any date ranges that are added when filtering.
From: Click the button to display the calendar and select a month, day, and year.
To: Click the button to display the calendar and select a month, day, and year.
To specify an additional date range, click the button. Each time the button is clicked, another date range appears.
Multiple date ranges allows specific document families with specific date ranges to be included. Those document families whose dates do not fall within the designated ranges are excluded from export.
To remove a date range filter, click thebutton. If there is only one date range, the date range closes and reverts to All Documents.
File Types/Extensions
Export these File Types: Filters determine the types of files that you can bring into an electronic discovery job during an Enterprise Streaming Discovery session. The settings made here determine the file types you will be able to export in an Enterprise Streaming Discovery Job.
File Type and File Extension Filters are applied only to the matching files. These filters are inclusive; only selected file types or specified file extensions are exported. If at least one file in a document family is being included, then the entire family gets exported.
eCapture recognizes documents by their actual content and not the file extension.
You can filter (exclude) a myriad of file types by simply selecting the file type check box. When the processing job runs, it will process only those file types that you want and exclude all others that you selected in the Filters dialog box.
For example, you discovered a directory containing 15 different types of files. Some of these files were word processing documents. You want to run a Streaming Discovery Job that includes only Microsoft Word documents.
There is a separate category for Microsoft Word documents (and subcategories of all the versions of Microsoft Word under the Microsoft Word category) as well as a separate generic Word Processing category that contains subcategories of all other word processing file types such as Lotus Word Pro, WordStar, .RTF, and so on.
If you ask for only Microsoft Word DOC files then you would also select the generic Word Processing category to automatically exclude any other type of word processing file that exists in the Discovery Job that you selected. The Processing Job will process those documents that it recognizes as Microsoft Word documents based on their actual content.
The following file types are based on the Oracle’s Outside-In identification criteria.
Select All: Select every file type.
Clear All: Clear all the selected file types.
Export these File Extensions: You can specify specific extensions of files you want to export. Click Add to add the extension to the list. Repeat for each extension.
Load From File: To import a list of file extensions from a CSV file, click Load From File. Select the CSV file and click Open. An Import From File progress bar appears. If any errors, such as duplicates, were encountered during the import, an Information dialog box displays and contains the errors. The CSV file may contain extensions with or without a "." (period). Ensure that the CSV file contains only one column of file extensions, with each extension occupying its own row, e.g., Range A1 through A50 or Range E1 through E50. The file extensions are alphabetized when imported into the Flex Processor.
If you want to remove a specific extension from the list, select the extension and click Remove.
Clear removes all the extensions from the list.
Remove NIST Matches
NIST removal matching applies only to the parent document or loose documents. It does not apply to child documents. If a parent document is a NIST match, the entire family is then removed including its children.
During the filtering phase, document hashes are compared to the hashes in the NIST database. If the document hash is found, it is marked as a NIST match and will be excluded from Export Jobs.
NIST match removal is applied to documents that were slated to be exported after applying the date, file type, and extension filters.
For information about installing and using the optional NIST databases and the Ipro NIST Loader, see Use the NIST Loader Utility.
For more information about using hash lists and configuring eCapture to use NIST, see Load Custom Hash Lists and Establish a Connection with the SQL Server and Set the System Options.
Save settings as Case (Project) default
Displays when setting options at the Job level. Select this option to retain these settings for future Enterprise Streaming Discovery Jobs created for the Case (Project).
Save as system default
Appears when setting options at the Case (Project) level. Select this option to retain these settings for future Cases (Projects)created for the Client. The settings are saved to the eCapture Configuration database.
Streaming Discovery: Imaging Options
Streaming Imaging options are defined on five different tabs, General, Excel, Word, PowerPoint, and Placeholder. See the following sections for more information.
Important: To define imaging options for the Streaming Discovery Job, you must first select the check box Enable Imaging located on the General tab. Once selected, the imaging options display on all five tabs.
Streaming Discovery Imaging: General Options
- Click the General Options tab.
-
Set the OCR options.
Note: If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Processing Jobs and Data Extract Jobs use the same OCR and Time Zone Handling options. For more information about setting options at the Case (Project) level, see Create a New Case (Project).
- OCR pages missing text - Select OCR Pages missing text to OCR pages within documents that are missing text. Optionally, select PDF page character threshold and indicate a value. The default value is 25 characters. The maximum value is 10000. If the value is less than 25, eCapture will send the page to be OCRed. If necessary, enter a different value.
- PDF page character threshold - Select a PDF page character threshold and indicate a value. The default value is 25 characters. If the value is less than 25, eCapture will send the page to be OCRed. If necessary, enter a different value.
- Minimum average OCR confidence - The level range settings are from 1 up to 100. The default is 50. The OCR Confidence Level is the average of confidence per document, for all pages within a document on which OCR was performed. Success or failure of a document for flagging is based on the average confidence level of the document. If the average confidence level is below the selected threshold, the document will be flagged in QC with the OCR Low Confidence Flag.
-
OCR languages - eCapture includes multi-language OCR capability. The QC document will contain the original OCR languages that were selected for the Data Extract job. A valid multi-language OCR license must be available in order to modify the original selected languages, if necessary.
To reserve a portion of the multi-language OCR licenses for QC and to keep the Worker from consuming all available licenses, use the Multi-Language OCR License slider located in the Controller System Options dialog box.
Click OCR Languages to display the Language OCR dialog box.
After selecting the languages, click OK to close the dialog box. The selected languages appear in the OCR Languages field. Place the mouse pointer on the OCR Languages field to display a tool tip that lists all the selected languages that were not visible in the OCR Languages field. The OCR Languages field is a read-only field.
Click here to view some caveats to OCR Language handling.
English is the only language that is selected by default. The more languages that are selected, the lower the confidence level will be for correctly identifying the languages in a document.
-
If English is selected, Arabic will not be available for selection.
-
If Arabic is selected, all other languages will not be available for selection.
-
If one of the CJK (Chinese, Japanese, Korean) languages are selected, then all remaining CJK languages will not be available for selection. Other languages (excluding Arabic) may be selected.
-
If Chinese Simplified is selected, Chinese Traditional, Japanese, and Korean will not be available for selection.
-
If Chinese Traditional is selected, Chinese Simplified, Japanese, and Korean will not be available for selection.
-
If Japanese is selected, Chinese Simplified, Chinese Traditional, and Korean will not be available for selection.
-
If Korean is selected, Chinese Simplified, Chinese Traditional, and Japanese will not be available for selection.
-
- Set the Color Depth, Paper Size, and other basic options.
General Color Depth - Applies to everything else outside of the five types (Word, Excel, PowerPoint, PDf, and Native TIFF) that eCapture does not process through Oracle (formerly Stellent). There are three exceptions to this rule: Lotus Notes, Internet Explorer, and Outlook Express; which also fall under the General type. All other email, except for Lotus Notes and Outlook Express at this time, are always Group 4 TIFF because it is rendered from text.
Single Page Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JPEG
Multi-Page TIFF Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JTIFF - (JPEG compressed TIFF)
Image Color Depth - Applies to: BMP, TIFF, PCX, GIF, WPG, WINDOWSICON, WINDOWSCURSOR, MACPAINT, CGM, DCX, SUNRASTER, KODAKPCD, PNG, DGN, PBM, and ADOBE PHOTOSHOP. However, if Lead fails to open a file, it then goes to Oracle (formerly Stellent) and uses the General Color Depth options.
Image Color Depth Options
Rendered as
As Is
If Original is Black&White, then Group 4 TIFF; otherwise, it will be a JPG matching bit depth.
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
True Color (24-bit)
JPG
PDF Color Depth - Select a PDF Color Depth. A PDF always uses the selected color depth setting in the PDF area. There are two possible outcomes:
Successful Use of the Adobe Library
PDF Color Depth Options
Rendered as
As Is
If Original is Black&White, then Group 4 TIFF; otherwise, it will be a JPG matching bit depth.
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
JPG (8-bit)
True Color (24-bit)
JPG
Unsuccessful Extraction of the Adobe Library
PDF Color Depth Options
Rendered as
As Is
Always 24-bit JPG
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
JPG (8-bit)
True Color (24-bit)
JPG
- PDF Paper Size - Select an output paper size for PDFs. When the As Is option is selected, the internal PDF document size is used to draw the image.
Paper Size - Click the drop-down menu and select an output paper size for documents during processing.
- Image to PDF - When this option is selected, the system creates a PDF of the reprocessed document and places it in the Output directory with a .PDF extension.
Max Page Threshold - Set a Max Page Threshold (1 to 10000) if you want to limit the number of pages produced by larger files. By default, this option is not selected. If the Page Threshold is reached, the items are not flagged as exceptions, but flagged as Page Threshold Exceeded. All pages processed up until the threshold is reached are included in the document. The first page is the Page Threshold Exceeded placeholder, and subsequent pages are those that were processed within the Max Page Threshold setting.
Placeholder pages over threshold - Select this option to apply a placeholder to pages exceeding the threshold value indicated in the Max Page Threshold field.
Text handling - On the drop-down menu, choose either:
Truncate text to max pages - text is truncated to match the output of pages that fall under the threshold (existing behavior).
Retain all text for document - document text is associated to the number of pages below the set threshold value and all subsequent pages are blank.
-
Set the Time Zone Handling, as appropriate.
- Convert all times to UTC
- Specify Time Zone
For more information about Time Zone Handling, see How eCapture Handles Dates and Time Zones.
Note: If you are setting Case (Project) Level options, OCR and Time Zone Handling options are defined on the Common Options tab because Processing Jobs and Data Extract Jobs use the same OCR and Time Zone Handling options. For more information about setting options at the Case (Project) level, see Create a New Case (Project).
-
Click Advanced Options to set more complex General Options rules. The Advanced Imaging dialog box appears.
- Remove blank pages - Select this option and then set the Blank page threshold (1 to 2000) to a value that eliminates the speckles without eliminating any punctuation marks from the pages. eCapture removes any images that have fewer "dots" than this threshold. If this setting is too high, you may lose images with a few short words. We suggest a setting of 50 as a starting point.
- Process CSV files with Microsoft Excel - Select this option to process CSV files by using Microsoft Excel instead of Oracle (formerly Stellent).
- Process HTML files with Internet Explorer - Select this option to process HTML files by using Internet Explorer instead of Oracle (formerly Stellent).
-
Enable internet links in emails - This option controls whether inline images are downloaded from the internet. Clearing this option can improve performance on environments that do not have internet access.
-
Set Lotus Notes options, as appropriate:
- High Speed (Optimized for speed)
- Medium Speed (Balance of speed and quality)
- Low Speed (Optimized for highest quality output)
-
Click the Outlook/EML link, Select Handling/Order. The Outlook/EML Text Cutoff Handling dialog box appears. Select an option and click either the or to move it to a specific order location. Repeat for additional options. Options include:
-
Attempt in Landscape w Shrink to Fit
-
Attempt in Portrait w Shrink to Fit
-
Attempt in RTF
-
Attempt in Text
-
Assign Text Cutoff Flag and Manage in QC - This is the default setting. It cannot be repositioned.
-
-
Click the Lotus Notes link, Select Handling/Order. The Lotus Notes Text Cutoff Handling dialog box appears. Select an option and click either the or to move it to a specific order location. Repeat for additional options. Options include:
-
Attempt in Landscape
-
Attempt in Text
-
Assign Text Cutoff Flag and Manage in QC - This is the default setting. It cannot be repositioned.
-
- Click OK to exit the Advanced General Options dialog box.
Streaming Discovery Imaging: Excel Options
-
Click the Excel tab to set the processing options for Excel files.
-
Process with Outside-In (Stellent) - Selecting this option to:
- Allow for faster and more consistent generation of images on the first pass
- Reduce the amount of time spent manually QCing these document types
When selected, only Outside-In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time-zone handling options configured for the Processing Job. All files processed by Outside-In (Stellent) receive the Stellent Processed flag in QC.
The processing output differs when using Outside-In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports are similar if processing was done without Outside-In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside-In (Stellent).
- Comments - Set where you want comments displayed. Select from None, At end of sheet, or As displayed on sheet.
-
Color Depth - Set the Color Depth options. Color processing for Excel files is handled separately from color processing of other types of files. This setting is independent of the General Color Depth.
Single Page Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JPEG
Multi-Page TIFF Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JTIFF - (JPEG compressed TIFF)
-
Paper Size - Click the drop-down menu and select an output paper size for documents during processing.
Note: For Excel Only - For Custom[8.5x11.0in], the Custom Paper Size dialog box appears.
The Custom Paper size defaults to 8.5x11 inches. The range values are shown for both Units: Inches and Millimeters. Maximum size in Inches 50.00x70.00; for Millimeters 1270.00x1778.00. When this option is selected, the document will be processed through the PDF driver (Text-Based PDF creation) regardless of the Flex Processor option selected. OCRing is not applicable in this instance. Export settings will be limited to Text-Based PDF Output only, even if image format is selected. Non-Excel documents will export as usual.
-
Center on Page - Determines where to center the image on the page.
-
Horizontally
-
Vertically
-
-
Page Order - Determines the page order to be used for imaging.
-
As is
-
Down, and then over
-
Over, and then down
-
-
Orientation - Determines the orientation of the page at the time of printing.
-
As is
-
Portrait
-
Landscape
-
-
Scaling - Specifies whether or not the image should be scaled and how. If scaling is used the options are adjusted to a percentage of the current size, or is modified to fit the page.
-
As is
-
Adjust to % normal size
-
Fit to page
-
-
If you want to set more granular processing options for Excel files, click the Advanced Options button. The Advanced Excel Imaging dialog box appears.
-
At the top of the dialog box, set the options for how to handle headers, footers, and other content in the Excel workbook. Click the Defaults button to revert to the default settings for these options, as shown in the following image:
If you have trouble locating the referenced options in Excel, click here to view information about how to navigate in Excel to the option.
-
Do not include headers - View > Header and Footer: Header/Footer Tab, Header drop-down list, None
-
Do not include footers - View > Header and Footer: Header/Footer Tab, Footer drop-down list, None
-
Reveal hidden columns - Format > Column > Unhide
-
Reveal hidden rows - Format > Row > Unhide
-
Unhide worksheets - Format > Sheet > Unhide
-
Unhide very hidden worksheets - Unhides worksheets that were hidden by a Microsoft Visual Basic for Applications program that assigned the property xlSheetVeryHidden. (From the Microsoft Excel Help File: If sheets are hidden by a Microsoft Visual Basic for Applications program that assigns the property xlSheetVeryHidden, you cannot use the Unhide command to display the sheets. If you are using a workbook with Visual Basic macros and have problems with hidden sheets, contact the owner of the workbook for more information.)
-
Autofit columns - Double click the right boundary of the column heading for that row.
-
Autofit rows- Double click the boundary below that row heading.
- Wrap text - Format > Cells: Alignment Tab, Wrap Text Option.
-
Print gridlines - File > Page Setup: Sheet Tab, Under Print, select Gridlines check box.
-
Unhide windows - Window > Unhide.
-
Apply Autofilter - Data > Filter > AutoFilter
- No fill color (for cells) - Format > Cells: Patterns Tab, Under Color, click No Color.
-
Clear print area - File > Print Area > Clear Print Area.
-
Clear print title columns - File > Page Setup: Sheet Tab, under Print Titles select the columns to repeat range.
-
Clear print title rows - File > Page Setup: Sheet Tab, under Print Titles select the rows to repeat range.
-
Display headings - File > Page Setup: Sheet Tab, under Print, select the Row and column headings check box.
-
Expand Pivot Tables - Right click Pivot Table to display context menu. Choose Expand/Collapse > Expand.
-
-
Set the remaining settings in the Advanced Excel Imaging dialog box.
The following table provides a list of the available options.
Setting
Options
Date field handling:
-
Replace with date created - will replace with creation date.
-
Replace with date last saved - will replace current date with last saved dated.
-
Replace with comments - displays the Date Field Comments field where you can enter the text that should replace the contents of the date field.
-
Replace with field code
-
Do not replace - will not replace the date (e.g., Macros)
Header/Footer Filename field handling
If path or filename options are found in an Excel header or footer, you can select from the following options to handle these occurrences.
-
Replace with filename (no path) - inserts the unqualified filename
-
Replace with filepath - inserts the fully-qualified path of the original file
-
Replace with comments - displays the Header/Footer Filename field comments field where you can enter your own comments
-
Replace with field code - replaces outputs &[Path] and/or &[File]
-
Remove - removes the codes entirely
Generate metadata
Select Generate a metadata summary images for each Excel spreadsheet, and then under Spreadsheet Metadata Summary Options select the individual types of metadata to capture.
-
Document Properties
-
Comments
-
Formulas
-
Linked Content - The data collected will include hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag will be added. A separate page entitled Document Properties is generated and is placed at the end of each Microsoft Excel document.
For more information about metadata, click here.
Who creates the metadata? The native program (such as Microsoft Excel or Outlook) creates the metadata and maintains it with the native file (the letter or email).
What does eCapture do with this data? When a document is processed, the metadata is collected from the document and stored in the database.
How is metadata useful? It gives you valuable information as to “Who knew what, and when.” It can tell you who wrote a document and who edited it last. It also shows you a file’s revision number, the character count, and many other pieces of information about a file summary image for each Excel spreadsheet.
Blank page removal
This option is available if the Remove Blank Pages option is selected under the General Options tab. Select from the following two options to remove blank pages:
-
Based on selected Page Order: Down, then over or Over, then down.
-
If Down, then over is selected, all vertical page columns that are blank will be removed.
-
If Over, then down is selected, all horizontal page rows where all pages in a horizontal run are blank will be removed.
-
-
Based on both Page Order options: This bases the removal of blank pages on both horizontal page-rows and vertical page-columns.
Example of Page Removal
The following example pertains to using a spreadsheet with 12 pages that will be rendered.
-
If the sheet's page order is Over, then down, eCapture removes all horizontal page rows where all pages in a horizontal run are blank. In order to do that, eCapture steps through all HPageBreaks and makes sure the range from the first column to the last column is blank.
-
If eCapture determines that 1-3 is blank, then they will be hidden. If eCapture determines that 4-6 is blank, then they will be hidden, and so on.
-
If the sheet's page order is Down, then over, eCapture will remove all vertical page columns that are blank.
-
If eCapture determines that 1-A is blank, then they will be hidden. If eCapture determines that 2-B is blank, then they will be hidden, and so on.
By using this algorithm, all blank pages will not be eliminated, though many of them will be.
Note: All page-hiding is done by setting horizontal regions' RowHeight properties and vertical regions’ ColumnWidth properties to 0.
-
- Click OK to exit the Advanced Excel Imaging dialog box.
Streaming Discovery Imaging: Word Options
-
Process with Outside-In (Stellent) - Selecting this option:
- Allows for faster and more consistent generation of images on the first pass
- Reduces the amount of time spent manually QCing these document types
When selected, only Outside-In (Stellent) is used to process images; the Microsoft related options are grayed out by default. Full metadata is extracted and time zone imaged output reflects the time zone handling options configured for the Processing Job. All files processed by Outside-In (Stellent) receive the Stellent Processed flag in QC.
The processing output will differ when using Outside-In (Stellent) to view and image documents. However, the QC applied flags, metadata, and optional summary reports will be similar if processing was done without Outside-In (Stellent). Other processing options, including Flex Processor processing options, are respected when using Outside-In (Stellent).
-
Select the option Show Hidden Text to see hidden text, if any, contained in Word documents.
-
Select the appropriate revision option. The option you select determines how the system handles revisions within Word documents.
-
As is - Print the document as it is according to the Office Settings on the computer.
-
Detail Revisions - Print the document with revisions shown.
-
Final Copy (hide revisions) - Print the document with no revisions shown.
-
Both Copies - Documents are printed. If a document has revisions, it's printed again with the revisions shown. Documents with revisions will then have two sets of images, one right after the other.
-
-
Select the appropriate orientation option. The option you select determines how the system orients images of Word documents.
-
As is
-
Portrait
-
Landscape
-
-
Select the Scale to Page option to scale the contents of the page to fit in the printable area. This sets the PrintZoomPageWidth and PrintZoomPageHeight to the paper size of the printer when printing Word documents.
-
Color Depth - Color processing for Word documents is handled separately from color processing of other types of files. This setting is independent of the General Color Depth options located in the Processing Options: General Options tab.
Single Page Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JPEG
Multi-Page TIFF Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JTIFF - (JPEG compressed TIFF)
- Select the appropriate Paper Size for Word documents.
- If you want to set more granular options for handling of Word documents, click the Advanced Options button.
In the Field Handling section, select the Date Field Handling options:
Replace with date created - will replace with creation date.
Replace with date last saved - will replace current date with last saved dated.
Replace with comments - displays the Date Field Comments field where you can enter the text that should replace the contents of the date field.
Replace with field code
Do not replace - will not replace the date (e.g. Macros)
Remove - removes the codes entirely.
In the Field Handling section, select the Filename handling options:
Replace with filename (no path)
Replace with filepath
Replace with comments - displays the Filename Comments field where you can enter the text that should replace the filename
Replace with field code
Do not replace
Set the metadata options for Word documents
Select Generate metadata. The native program, in this case Word, creates the metadata and maintains it with the native file. When a document is processed, the metadata is collected from the document and stored in the database. Metadata gives you valuable information as to “Who knew what, and when.” It can tell you who wrote a document and who edited it last. It also shows you a file’s revision number, the character count, and many other pieces of information about a file.
Select the individual types of metadata to capture under Document Metadata Summary Options:
Document Properties
Revisions
Comments
Routing Slips
Linked Content - The data collected will include hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag will be added.
A separate page entitled Document Properties is generated and is placed at the end of each Microsoft Word document. For example, The Document Properties page may contain the following data:
Title, Author, Company, Attached Template, Page count, Paragraph Count, Line Count, Word Count, Character Count (spaces excluded), and Character Count (spaces included).
- When finished setting Advanced Options, click OK to exit the Advanced Word Imaging dialog box.
- When finished setting Word Options, click OK to exit the Options for Processing dialog box or click one of the other tabs to set options for other types of files.
Streaming Discovery Imaging: PowerPoint Options
-
Select Original Settings (As Is) to use Microsoft PowerPoint’s default settings.
-
Select the Page Orientation. The options are: As is, Portrait, and Landscape.
-
Select the Slide Orientation. the options are: As is, Portrait, and Landscape.
-
Select the Color Depth to be used for processing PowerPoint presentations. Color processing for PowerPoint presentations is handled separately from color processing of other types of files. This setting is independent of the General Color Depth options located in the Processing Options: General Options tab.
Single Page Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JPEG
Multi-Page TIFF Output Type
General Color Depth Options
Rendered as
Black&White (1-bit)
Group 4 TIFF
Grayscale (8-bit)
LZW TIFF
256 Color (8-bit)
LZW TIFF
True Color (24-bit)
JTIFF - (JPEG compressed TIFF)
-
Select the Output Type. The options are: Slides, Outline, Notes Pages (notes and slide on one page), Notes Pages Split (notes and slide on separate page), or Handouts.
-
Select a Slide Size. Choose a slide size or As Is from the drop-down menu.
-
Select an output Paper Size or As Is from the drop-down menu.
-
To select more complex PowerPoint options, click the Advanced Options button.
- Print Hidden Slides - Select this option to print slides that are hidden from the slide show.
- Print Comments - Select this options to print comments for your slides.
- Frame Slides - Selecting this option prints a border around each slide.
- Scale to Fit Page - Select this option to ensure all available text appears on the slide that was imaged from eCapture,
-
Handouts - Select the desired handout options:
-
Slides per Page
-
Order (if generating 4 or more slides per page)
-
- Include Linked Content Summary - Select this option to ensure that the data collected includes hyperlinks and OLE linked files. If any linked content exists in a document, a QC flag is added.
-
Headers and Footers - For Headers and Footers, you can set options for Slides or Notes & Handouts. The tabs that display are based on the Output Type selected on the basic PowerPoint Options tab. The options are: Slides, Outline, Notes Pages (notes and slide on one page), Notes Pages Split (notes and slide on separate page), or Handouts.
Slides: For the Output Type of Slides, select from the following options from the Slide Tab:
- Select Date and Time if you want the page header to list the Date last saved or the Date created at the top of the image.
-
If Date and Time is selected, you can select the Update Automatically option. Select Date last saved or Date created.
-
Format: Select a format option for the date and time.
-
Select Fixed if you want to manually enter a fixed date and time in the image header.
- Select Footer if you want a footer at the bottom of the image.
-
If Footer is selected, enter static text that you want printed at the bottom of the image or check As is to maintain the existing footer for the slide.
- If Footer is selected, select a Slide Number option to define whether a slide number should show on the image. The options are: As is, Show, Do not show.
-
If Footer is selected, select a Show on Title Slide option to define whether to show the footer on the title slide image. The options are: As is, Show, Do not show.
Other than Slides: If, on the basic PowerPoint imaging options tab you set the Output Type to anything other than Slides, select from the following options on the Notes and Handouts tab:
- Select Date and Time if you want the notes/handouts to list the date/time.
-
If Date and Time is selected, select the Update Automatically option: Select Date last saved or Date created.
- Format: Select a format option for the date and time.
-
Select Fixed if you want to manually enter a fixed date and time in the image header.
-
Select Header if you want a header at the top of the image. You can either enter a fixed text to add or check the As Is option to maintain the existing headers.
-
Select Footer if you want a footer at the bottom of the image.
-
If Footer is selected, you can enter static text that you want printed at the bottom of the image.
- If Footer is selected, select a Page Number option to define whether or not a page number should show on the image. The options are : As is, Show, Do not show.
- Click OK to exit the Advanced PowerPoint Options dialog box.
- Click OK to exit the Options for Processing dialog box, or click one of the other tabs to set options for other types of files.
Streaming Discovery Imaging: Placeholder Options
- Click the Placeholder tab.
-
Click the button to create a new placeholder. The Create New Placeholder dialog box appears.
-
Enter a Placeholder Name. When you are finished creating your placeholder, the Placeholder Name will display in the Placeholder grid located in the Placeholder tab.
-
Select the check boxes next to the File Types/Extensions for which you want to have placeholders when a Streaming Discovery Imaging Job runs. By default, all File Types/Extensions are unselected.
- Click Select All to select all file types.
- Click Clear All to clear the selections and individually select the desired file types.
-
Expand a file type to view its subcategories. Filtering may be done on specific subcategories of a file type.
eCapture recognizes documents by their actual content and not the file extension. Keep this in mind as you exclude/include file types. You can filter (exclude) a myriad of file types by simply ensuring that the File Types/Extensions are unselected. When the Job runs, it will create placeholders for only those file types that are selected. These file types are based on the Oracle’s Outside-In identification criteria.
- If you want to add more file extensions to the placeholder definition you are creating, in the Placeholder these File Extensions list box, click to add the extension to the list. At least one file type or category must be selected. Repeat this step for each extension. File extensions are automatically alphabetized.
- If you want to remove a file extension, in the Placeholder these File Extensions list box, select the extension and click .
- If you want to clear all the extensions from the list, in the Placeholder these File Extensions list box, click .
-
To import a list of file extensions from a CSV file, in the Placeholder these File Extensions box.
- Click .
- Select the CSV file.
- Click Open. An Import From File progress bar appears. If any errors were encountered during the import, such as duplicates, an Information dialog box displays with the errors. The CSV file may contain extensions with or without a "." (period). Ensure the CSV file contains only one column of file extensions with each extension occupying its own row, for example, Range A1 through A50 or Range E1 through E50. The file extensions are alphabetized as they are imported.
- Set the File Size parameters. The default setting is None. If specified, file sizes may be Over or Under a specified amount. The selected file size applies to the files in the Imaging Job that have sizes on disk that are either greater than or equal to, or less than or equal to, the size specified. The size is expressed in KB. For example, a 1 MB file is entered as 1024 KB.
- Select the Extract Text of Document check box to extract the document text. By default, this check box is cleared.
- Select the Apply Max Page Threshold check box and indicate a threshold value (1 to 10000) to limit the number of pages produced by larger files. By default, this check box is cleared. If the page threshold is reached, the items are flagged as Page Threshold Exceeded. All pages imaged up until the threshold is reached are included in the document. The first page is the Page Threshold Exceeded placeholder, and subsequent pages will be those that were processed within the Max Page Threshold setting.
-
To use a predefined placeholder:
- Click to display the Open dialog box.
- Select a placeholder image. File type options include JPG and TIF.
-
Click Open. The selected image displays in the view box underneath the option.
-
If you want to use a custom placeholder, click .
The Custom Placeholder Configuration dialog box appears.
Complete the necessary fields in the Custom Placeholder Configuration dialog box.
-
Click the drop-down menu located above the Available Fields list and select a specific field type. By default, All Fields is displayed.
-
To narrow the field list:
- To display all fields, delete the value (in this example, the word date), leaving the field empty, and click .
- Click to move a selected field from the Available Fields list to the Selected Fields list.
- Click to move a selected field from the Selected Fields list to the Available Fields list.
- Click to open the Insert Custom Field dialog box in which you can create new group fields and new user fields.
- Use the and arrows to change the order of the fields in the Selected Fields list. Select a field (or contiguous fields) and then use either arrow to reposition the selected field(s).
-
Select a field in the Selected Fields list. The selected field appears in the Font section. Click to open the Font dialog box.
- Select the desired Font, Font Style, and Size; then click OK to return to the Custom Placeholder Configuration dialog box. Repeat this step for each additional field.
-
In the Field Options section, if necessary, select Include labels with values. When selected, both the field label and its value are included.
-
Click . The Date Field Formatting Options dialog box appears.
-
Select the Date Field Formatting and Time Format for the custom placeholder.
-
If you want to change the date field to a different format, click the drop-down menu arrow and select from the following date formats:
-
YYYYMMDD
-
YYYY/MM/DD
-
MMDDYYYY
-
MM/DD/YYYY
-
DD/MM/YYYY
- Otherwise, select the option, Do Not Convert Date Fields.
-
-
If you want to change the Time Format, click the drop-down menu and select from the following options:
-
12-hour [displays time in 12-hour format e.g., 1:04]
-
24-hour [displays time in 24-hour format, e.g., 13:04]
-
Regional [formats the time according to the “default” regional settings of the Worker on which the document is being exported.
Note: Changing the format strings by using the Customize button of Regional Settings will have no effect; the actual region must be changed to see any effect.
-
-
Select Resolve times to second precision if you want to add seconds to all metadata date fields that have time. This does not apply to the images.
- By default, the Legacy Date Field Formatting check box is cleared. Clear this option to select from the Invalid date options and to select fields for date format handling.
-
-
If you cleared the Legacy Date Field Formatting check box, set the Invalid date options:
- Treat date values outside of specified range as invalid dates - Select this check box and then select a Start Date and End Date range. Any dates outside of the selected range are considered as invalid dates. The start date default is set to SQL minimum date. The end date default is set to SQL maximum date.
-
Choose one of the following options:
-
Invalid date field output value - enter text to display if an invalid date is encountered. This field may be left blank.
-
Invalid date field output do not convert - invalid dates will be output as a text field.
-
-
From the Available Fields list, select the fields you want to use for date formatting and move them to the Fields Selected for Date Format Handling list. There are a few considerations about date fields to keep in mind:
-
The only fields that are not present in the list are *DATE_ONLY* and *TIME_ONLY*. The fields in the Available Fields list comprise those that are marked as valid for date formatting. This is determined by the value of TRUE in the ExportAttemptDateParse field located in the EncounteredMetatdataFieldList table.
-
Date field formatting options affect only those fields in the Fields Selected for Date Format Handling list.
-
Date field formatting options are set at the Job level.
To select fields for date format handling.
-
Select a field for date format handling by selecting the field from the Fields Available for Date Format Handling list and clicking to move the single field to the Fields Selected for Date Format Handling list.
-
For two or more fields, Ctrl-click to select non-contiguous fields or Shift-click to select contiguous fields. After selecting the fields, click to move them to the Fields Selected for Date Format Handling list.
eCapture creates two additional fields that “split” the date and time into a Date Only field and a Time Only field. These two additional fields are displayed in the Available Fields list in the Export Wizard, Select Export Fields screen. For example, if the DueDate field was moved to the Fields Selected for Date Format Handling list, the following additional DueDate fields would display in the Available Fields list: DueDate*DATE ONLY* and DueDate*TIME ONLY*.
-
- When you are finished setting the Date Field Formatting options, click OK. The Custom Placeholder dialog box appears.
-
In the Placement Options dialog box, select the placement settings for the placeholder.
-
Set the alignment positioning for the placeholder.
-
Vertical Alignment: Determines placement along the vertical axis. Options include Top, Center, or Bottom. Top is the default.
-
Horizontal Alignment: Determines placement along the horizontal axis. Options include Left, Center, or Right. Left is the default.
-
- Set the Indentation (Left and Right) for the placeholder. This setting determines the horizontal spacing to the left or right of the page margins.
- Set the Truncation for the placeholder. Truncation determines the number of characters at which the field value will be truncated. The default value is 128 characters.
-
-
If you want to save your Custom Placeholder formatting to a file, to be used later, click Save, enter a Description for the placeholder, and then click OK. You will also be prompted to save the custom placeholder definition before you exit the Custom Placeholder Configuration dialog box.
-
When you are finished creating the custom placeholder, click OK. The Save Changes dialog box appears.
- Click Yes to save the custom placeholder definition.
- Enter a Description for the placeholder.
-
Click OK.
-
Use the zoom in or zoom out buttons to view the image before finalizing.
- To remove the selected image from the view box, click . The existing image must be removed before selecting a new image.
-
To exit the Custom Placeholder Configuration dialog box, click OK. The placeholder displays in the Placeholder grid.
Note: More than one placeholder may be created for the imaging job. When two or more placeholders exist for a Streaming Imaging job, rule functionality, similar to the Flex Processor, is used. Each placeholder’s document criteria selection is applied in placeholder order with the last placeholder rule (applied to the document) determining the processing output. The Placeholder rule order may be changed before starting the job.
-
The Description field will contain the following values based on the selected Placeholder criteria:
- File Types
- File Types, Extensions
- File Types, Extensions, File Size
-
File Types, File Size
To edit the Placeholder criteria, double-click the Description field of the desired Placeholder. The Edit Placeholder dialog box appears. Make the changes and click to return to the Placeholder grid.
- To delete a Placeholder from the grid, click . A prompt displays to confirm the deletion. Click .
- To change the order of the placeholders in the grid, select a placeholder and click or to move the selected placeholder into the correct position. Repeat this step for all placeholders until they are in the desired order.
Streaming Discovery: Export Options
Option
Description
Select Export Series (optional)
Select from an existing Export Series from the drop-down menu. If an Export Series is not selected, the Enterprise Streaming Discovery Job will not be exported to a review application. However, the job may be manually exported if desired. For more information, see Re-Export a Streaming Discovery Job. If an Export Series is selected, the area below in the dialog box displays the options/settings from that Export Series.
Important: If you are creating images during a Streaming Discovery Job, you must create and select an export series.
Export Interval (min)
This export interval setting dictates how often documents are exported to the specified export destination (Ipro Eclipse or Relativity).
Note:This option is not available unless an Export Series is selected, or a new Export Series is created.
The default setting is 30 minutes for new Case (Projects), where no System default options are in place. This change was made to reduce the number of created exports from large Streaming Discovery Jobs to better manage the volume of exports.
Any Streaming Discovery Jobs initiated under Cases (Projects) created before version 2016.2.0, the five‑ minute default setting remains.
The maximum setting is 60 minutes. If an existing Export Series is selected and the export interval is set to 0, only one Export Job will be created on completing the Enterprise Streaming Discovery Job.
As documents are created, Export Jobs are continuously created (based on the export interval setting). Each Export Job is started immediately on creation regardless of job size.
Only completed families are considered for export. Generally, the longer the interval setting, the more documents for each Export Job. The Enterprise Streaming Discovery Job may complete before all the Export Jobs complete; however, it will not be marked as Complete until the last set of documents start to export.
The Export Jobs inherit the settings from the parent Export Series; including the numbering schema. For direct export to the Review application (Eclipse or Relativity), the same eCapture auto-load rules apply: one load file for each volume.
Create New Export Series
Create a new Direct-to-Eclipse or Direct-to-Relativity Export Series. When a new Export Series is created (for Eclipse or Relativity), the criteria display in the Export Options dialog box as shown in the following figure:
In the previous figure, an existing Export Series was selected and shows the options/settings that were selected for that Export Series.
The bottom section shows the Export Fields that were selected for the Export Series Job. for more information about creating an export series, see Create an Export Series.
For Enterprise Streaming Discovery Job Export Series, the Export Series is Data Extract only.
Save settings as Case (Project) default
Displays when setting options at the Job Level. Select this option to retain these settings for future Enterprise Streaming Discovery Jobs created for the Case (Project).
Auto Publish Errors
Select this option to automatically publish Streaming Discovery node or item level errors (if any) so they may be moved forward to review without having to modify the Streaming Discovery Job and visually inspect the errors. Once it is completed, all remaining failures are published. This option is cleared by default; and if left cleared, no actions are performed.
Once the job completes normally, and if there are node level or item level errors, it will re-queue those errors one time and set the job to publish. The job displays in the Job Queue pane. Once the re-queue is completed, all remaining failures are published.
To see the nodes that were re-queued, open the AutoRequeue.TXT file stored in the Discovery Jobs folder. An example of the data is shown here:
NodeIDRequeue – NodeID: 1313
NodeIDRequeue – NodeID: 1314
NodeIDRequeue – NodeID: 1338
Errors do not get published if the auto publish option is selected on the case (project) level and cleared on the job level. This option is cleared by default; and if left cleared, no actions are performed.
Save as system default
Displays when setting options at the Case (Project) Level. Select this option to retain these settings for future Cases (Projects) created for the Client. The settings are saved to the eCapture Configuration database.
-
-
When you have finished setting up the job, click OK.
The Enterprise Streaming Discovery Job displays in the eCapture Controller Job Queue and is ready to be started. When the Job finishes, it disappears from the Job Queue.
-
In the Client Management Tree View, select the Streaming Discovery Job to view the Status and Summary Panel, the pane on the right.
In the following figure, the Job Status indicates Complete.
The value next to Processed Items indicates the number of items found, in this example 18 items were found. There are other counts that include:
- Processing Errors – Indicates the number of encountered processing errors for the Job.
- OCR Applied [Errors] – If OCR was enabled for the Job, the first value represents the number of pages that had OCR. The value in the brackets indicates the number of OCR errors.
- Unextractable Items – indicates the number of items that could not get extracted.
- Unpublished Errors – This field indicates the number of errors that can be published either individually for the job or as part of the entire case.
- Filtered Items [Duplicates] – Shows total filtered items count (e.g. date ranges, file type filters, de-duplication and de-NIST) and the number of duplicates in brackets.
-
Exported - Indicates the number of documents that were exported for the Job, the number of Export Errors, and the number of Loaded [Errors]. The previous figure indicates exporting options were included for this Job. If they were not, then exporting may take place manually. For more information, see Re-Export a Streaming Discovery Job. In the previous figure, the Export Grid column indicates 4 Ipro (Cloud) Streaming Discovery Jobs with a total of 6 exported documents without any Export or Loaded errors.
- Exported [Errors] - Shows the total number of exported documents with any encountered errors in brackets.
-
Loaded [Errors] - Shows the total number of exported documents with any autoload errors (for either Ipro Eclipse or Relativity) in brackets.
Note: For Streaming Discovery and Export Jobs that are not autoloading into Eclipse or Relativity; and for incomplete Export Jobs, the Loaded field is blank.
Exported and Loaded document counts reflect completed Export jobs only. As Export Jobs connected to Streaming Discovery Jobs are completed, the counts increase. The Loaded count matches the Exported documents count unless errors occur during an Eclipse or Relativity autoload.
Use the Export Jobs Grid
The Export Jobs grid is useful for associating Enterprise Streaming Discovery Jobs with Data Extract Export Jobs and for determining when multiple Enterprise Streaming Discovery Jobs are associated with a single Export Series. In addition, the Export Jobs grid shows the ID number, name of the Enterprise Streaming Discovery Job, its Completion status, Exported [Errors], and Loaded [Errors].
-
Double-click a Streaming Discovery Job in the Export Jobs grid.
The Client Management Tree View refreshes highlighting the selected Export Job and the Status and Summary Panel, on the right, displays the details about the completed job.
-
Click View Settings to view the setting information for the selected job.
-
Click View Output to view the directory containing the extracted data.
-
Click the View Report button to view the Export Summary Report.
The Enterprise Streaming Discovery Job may be exported as many times as required. Each time it is exported, the Exported documents count increases by the number of unfiltered/non-duplicate documents in the Enterprise Streaming Discovery Job.
When the Enterprise Streaming Discovery Job is deleted, the Export Series is also deleted.
|
The ‘CustomExport’ stored procedure in the Client database includes a parameter. The value in this parameter indicates whether the Export Job is the last in the Export Series. If associated Export Jobs are pending or in progress, the empty stored procedure is set to a value of ‘0’ indicating it is not the last Export Job in the Export Series. If the completed Export Job is the last job in the Export Series, the empty stored procedure is set to a value of ‘1’ indicating it is the final Export Job of the Export Series. If errors are re-queued and/or published, the parameter is not modified. for more information about adding custom stored procedures, during the installation database update phase, see Work with Custom Stored Procedures. |
Related Topics