Release Notes 2019.6.2

IPRO Tech, LLC is pleased to announce the 2019.6.2 release of eCapture.

This release includes several enhancements to the Streaming Discovery workflow including the ability to re-hash existing legacy cases to be compatible for de-duplication in Streaming. Workflow improvements have been introduced for publishing errors and exporting of previously completed jobs. This release also introduces OCR support for PowerPoint in Streaming Discovery jobs, as well as extraction of metadata from photographs and PDF Comments. Password handling has been expanded and now covers all job types (Discovery, Streaming Discovery, Data Extract, Process, and Enterprise Imaging). This includes extraction from PST as well as support within the QC application. In addition, a dependency on TLS 1.0\1.1 has also been removed during installation and the creation of Configuration databases.

Important Notes

  • This release supports upgrades from eCapture 2015.2.2 and higher.
  • As of the 2017.3.4 release, database upgrades are now completed using IPRO E.A.S.E. 2017.4.1 and the latest IPRO SQL Generator Utility

  • The IPRO ADD Job Status Updater Utility should be run immediately after the install has completed, if upgrading from eCapture 2016.3.1 or earlier to version 2019.6.2. This utility restores job status values for previously run Media Manager jobs using earlier versions of eCapture. See ReadMe.doc in the installer directory for additional information about this utility.
  • IPRO ADD version 2018.6.0 or later is required for compatibility with eCapture 2019.6.2.
  • If you run IPRO Eclipse, it must be upgraded to version 2018.6.0 to ensure compatibility with eCapture 2019.6.2.
  • The Authorization Service (Licensing Server) must be upgraded to 2017.3.4 or later to support all new product features.
  • All eCapture databases should be backed up before performing the upgrade.
  • Ensure that no jobs are running in the job queue while applying the upgrade.

Enhancements

  • Re-generate hash for Streaming de-duplication compatibility in legacy cases Case Hash Conversion allows a user to generate new hashes in legacy cases to enable de-duplication compatibility with Streaming Discovery. This enables cases to migrate and continue utilizing the increased speed and efficiency of Streaming Discovery.

    Note: Original hash values are maintained with the use of this process.

  • Publish errors workflow – Publishing errors is easier than ever now that the user is able to see the number of documents that are being held back by these errors, including the ability to publish errors at the case level on jobs that are eligible. Finally, exporting can be configured to auto-publish errors to review.
  • Streaming Discovery re-export – Existing Streaming Discovery jobs can now be re-exported allowing full configuration of all export settings. The ability to generate a carbon copy of a previous export, which maintains all original settings and numbering, is also available. If errors on an export are encountered, the ability to re-export a subset of those errors and their families has been added to avoid long wait times in large exports. Furthermore, when deleting the most recent export, an option to rollback numbering has also been provided. Finally, QC has been enhanced to aid in generating Export Sets by allowing ‘in’ and ‘not in’ search parameters which will accept a comma-separated list of Item IDs.
  • PowerPoint handling – A new streaming discovery option exists to enable OCR on embedded content within slides. The OCR text is appended to the end of the extracted text. Media objects such as audio, video, and embedded images are also now extracted from these files.
  • Extraction of PDF comments – Comments stored within PDF files are now included in the extracted text and the flag ‘PDF Comments’ is available for identifying files with this characteristic when using Streaming Discovery.
  • Metadata extraction from photographs – Information stored within a photograph such as the camera, associated settings, and GPS information are all extracted and available as fielded data on export.
  • Ability to remove the [NATIVEFILETYPE] when exporting natives – This field is no longer forced on export when native files are included.
  • Support of IBM Notes 9.0.1 FP 10 on Worker machines – The most recent fix pack for Notes 9.0.1 is certified for use on Worker machines.
  • Password handling and flagging – Password handling has been expanded to support all job types (Discovery, Streaming Discovery, Data Extract, Process, and Enterprise Imaging). This includes extraction from protected PST. Support has also been introduced within the QC application. Application of the ‘Password Applied’ QC flag is also improved.
  • Storage of directory, file count, and size information on ingestion to SQL – The source compressed size along with file and directory counts are now stored in the Client database for reporting purposes.
  • Efficiency improvements to ‘Streaming Deduplication Overlay With Paths’ Reporting – Path information is written to the Client database in order to reduce time when generating the report.

    Note: Upon upgrade, there will be little noticeable improvement the first time the report is generated. Subsequent reporting will reflect improvements; however, this is highly dependent on storage of the local machine generating the report. Significant savings of time have been associated with an SSD being present.

  • SQL Performance Improvements – Various improvements have been made to increase efficiency.
  • Eliminate dependency on TLS 1.0\1.1 – A utility used to generate the Configuration database has been updated to remove TLS 1.0\1.1.

Resolved Issues

Database

  • Issue AP-113 – Export intervals not followed when Queue Manager and SQL machines are set to different time zones.

    Resolution – Export intervals are followed when the Queue Manager and SQL machines are set to separate time zones.

  • Issue AP-824 – SQL Server CPU utilization spikes occur on Enterprise Jobs with OCR.

    Resolution – A data type mismatch in the MD5Hash field between tables managing OCR tasking caused high-CPU usage that could dramatically affect OCR performance and resources of the SQL server.

Ingestion

  • Issue AP-96 – Failure to extract files from MSG which are themselves attached to an MSG.

    Resolution – MSG files which are contained within parent MSG files are properly identified and files found within are extracted as expected.

  • Issue AP-116 – Re-queue of Streaming Discovery node/item-level exceptions may cause a new Enterprise OCR job to be created.

    Resolution – Logic has been improved so that the originally-associated Enterprise OCR job is utilized during the re-queue of any node/item-level exceptions, preventing the possibility of an additional job being created, which could lead to the job’s ‘…_StageDiscovery…’ database being deleted prior to resolving all exceptions.

  • Issue AP-127 – ‘Password Applied’ flag being improperly set.

    Resolution – The ‘Password Applied’ flag was found to be set for any document that a password was attempted on rather than those where a password was successfully applied and processed. This has been updated for proper handling.

  • Issue AP-144 – Inconsistent file-extraction counts for archives within archives when ‘Treat Archives as Directories’ is disabled during standard Discovery.

    Resolution – Handling has been updated to maintain consistency of file-extraction counts when archives containing archives are encountered and the ‘Treat Archives as Directories’ option is disabled.

  • Issue AP-216 – Certain date formatting (Mon, 01 Jan 2001 12:12:12 +0000) within an MBOX could cause it to be identified as a single EML.

    Resolution – Identification of MBOX stores have been updated to account for the additional date formatting scenario.

  • Issue AP-240 – EML files containing tab characters in the header field ‘X-ZANTAZ-RECIP:’ can be misidentified during ingestion.

    Resolution – Identification of these messages has been revised for proper file extraction.

  • Issue AP-724 – Identification of date formatting codes stored in Microsoft Word header/footer values.

    Resolution – Identification of date formatting codes has been implemented to better identify and handle these values stored in the header/footer of Microsoft Word files.

  • Issue AP-725 – OLE-embedded Bitmap Image Object attachments not extracted during standard Discovery.

    Resolution – Embedded items using Bitmap Image Object attachment methodology are extracted during a standard Discovery job.

  • Issue AP-738 – Identification of multi-part archive naming conventions during ingestion.

    Resolution – Improved identification of multi-part archive naming conventions are implemented for the purpose of identifying a wider range of these container types in order to extract their content.

  • Issue AP-746 – Third-level MSG files fail to extract email body text when the originating top-level parent MSG is a loose file.

    Resolution – MSG files extracted from an MSG which, itself, was extracted from another MSG could have no text extracted for the email body when the top-level parent MSG is a loose file on disk. Text is now extracted in this scenario.

    Note: The ‘EmailBody’ field in the Items table will remain blank in this scenario.

Metadata

  • Issue AP-66 – Auto-populated paths present in Word documents could be replaced with the processing output location.

    Resolution – Auto-populated paths present in Word documents are retained.

    Note: Documents which are natively re-processed via the QC application will retain the current handling of updating to the present location.

  • Issue AP-75 – Certain MSG files processed with Outlook 2010 could extract empty body text.

    Resolution – Identification of body formats has been improved to extract body text.

  • Issue AP-101 – ‘Last Modified’ and ‘Date Created’ metadata differ between Streaming Discovery and Data Extract/Process results.

    Resolution – Streaming Discovery extraction has been updated to more closely match that of Data Extract/Process job results.

  • Issue AP-130 – Emails containing internet-sourced inline images time out when internet access is disabled on the Worker machine.

    Resolution – The option ‘Enable internet links in emails’ has been added, which the user can disable when access to the internet is not available in the environment.

Text\OCR

  • Issue AP-215 – PowerPoint 2016 is not extracting OCR text.

    Resolution – Handling of PowerPoint 2016 files has been adjusted in order to extract OCR text.

  • Issue AP-726 – Text not extracted from some shape objects in Microsoft Word.

    Resolution – Text is extracted from certain shape objects in Microsoft Word.

  • Issue AP-727 – Text not extracted from some shape objects in Microsoft Excel.

    Resolution – Text is extracted from certain shape objects in Microsoft Excel.

Imaging

  • Issue AP-182 – PowerPoint files reporting a page number greater than 1 encounter an ‘index out of range exception’ during imaging.

    Resolution – PowerPoint can report the starting page as 2 rather than 1 in some instances. Handling has been improved in order to avoid error in these situations.

  • Issue AP-183 – PowerPoint ‘Notes Pages’ fail to image to PDF.

    Resolution – PowerPoint ‘Notes Pages’ are successfully converted to PDF.

  • Issue AP-748 – PowerPoint files with ‘Mark as Final’ option selected in native fail to image.

    Resolution – PowerPoint files with ‘Mark as Final’ option selected in native are handled and images are generated as expected.

  • Issue AP-750 – PowerPoint fails to output images when using ‘Notes Pages Split’ with ‘Print Hidden Slides’ disabled when hidden slides are present in the native.

    Resolution – Images are properly generated when using ‘Notes Pages Split’ with ‘Print Hidden Slides’ disabled when hidden slides are present in the native.

  • Issue AP-829 – Certain Excel Worksheets containing shape objects could have images not accounted for.

    Resolution – Some Excel Worksheets that contain shape objects could have images that are not reflected in the document.txt file written to disk that logs the document’s associated images. This has been resolved.

  • Issue AP-895 – Excel Worksheets containing comments may be missing from imaged output.

    Resolution – Excel Worksheets containing comments are handled as expected.

Export

  • Issue AP-77 – Extraction of PDF Portfolio files attached to an e-mail could cause improper family member identification, resulting in a duplicate-insertion error.

    Resolution – Extraction of PDF Portfolio files attached to an e-mail are properly identified within their family relationship to prevent a duplicate-insertion error.

  • Issue AP-78 – Utilizing the ‘Multiple page PDF files per family’ option on export could result in more documents than the count of families.

    Resolution – Handling has been improved in order to better identify family boundaries and generate a single PDF per family as expected.

  • Issue AP-97 – Quick searches accessed via right-click from within the document list in a QA session would not auto-load to Review after re-process.

    Resolution – Handling of re-process actions after performing a search have been updated to properly handle execution of loading results back to Review.

  • Issue AP-98 – Overlay load file output does not include header row when selected for a direct to disk export.

    Resolution – Handling has been resolved so that the option outputs properly with the header when the option is selected for a direct to disk export.

  • Issue AP-412 – Existing images could be overwritten when an Export Series Overlay was generated to load additional metadata fields.

    Resolution – Existing images are no longer overwritten when an Export Series Overlay is run to load additional metadata fields.

  • Issue AP-679 – Endorsing the top-left region of the page in a Process Export would result in error.

    Resolution – Handling of Endorsements applied to the top-left region of a page has been resolved to no longer encounter errors.

  • Issue AP-723 – Time format changes not included when generating overlay data.

    Resolution – Time format changes (ex. 12-hour to 24-hour) were not accounted for during overlay generation.