Run Streaming Discovery De-duplication Detailed Report
A new report for Streaming Discovery and StreamingThe process of automatically copying, processing, filtering and loading data into review systems. Imaging jobs was added in version 2018.5.2 called De-duplicationThe process of identifying and separating identical electronic documents. In eCapture, the MD5 hash value of each document is generated during the discovery phase. When de-duplication is performed, a look-up for the same MD5 hash is performed across the specified de-duplication scope (Current Job, Custodian, Project and Client) for all previously-processed data. If a match is found, the item is marked a duplicate; if not, it is marked an original. Additional scope options within eCapture allow families of documents to be maintained through de-duplication such that if the top-level parent document is marked a duplicate, the entire family is marked as duplicates. Alternatively, items within a family can be de-duplicated individually. Only items selected for processing can be eligible for de-duplication, and only non-filtered (i.e., processed) items are marked as an original. If two items have matching MD5 hashes, the SHA-1 hash value is checked as well. If those values still match and the documents are parents, a family hash is generated by hashing the concatenated MD5 hash values of the entire family. This allows for a through hash comparison for the entire family in the event of differences between child documents. Bit-by-bit comparisons between files can also be performed during de-duplication, and matching file names can also be made a requirement for de-duplication. Detailed.
This report contains detailed de-duplication similar to the pre-existing reports for DataComprises documents, numbers, files, emails, and any other information stored on a digital device. In eCapture, refers to the electronic files that are discovered and processed. Extract and Processing jobs and indicates the original file a document was a duplicate of.
The De-duplication setting (CustodianIn eDiscovery, the data custodian is usually the person responsible for, or the person with administrative control over, granting access to an organization's documents or electronic files while protecting the data as defined by the organization's security policy or its standard IT practices., ProjectIn ADD, the level beneath Client in the hierarchy. Projects can have one or more Custodians., or None) in the Streaming DiscoveryProcess used to determine file type(s) to later be processed. The process of making data known to the eCapture system and assigning an index value to this data. Options dialog determines the report output. .
The report, in the form of a .CSVA comma-separated values (CSV) file used to store tabular data. file, contains the following information:
- Relationship (Original and its Duplicate)
- DiscoveryPath (original location where the file was located at time of discovery)
- ItemFileName (original document filename and the duplicate copy of the original document)
- FamilyHash (for Streaming, this accounts for the additional layer of de-duplication for identifying duplicate families)
- ItemHashValue (this value is the equivalent of the MD5 Hash; however, Streaming uses SHA1 for de-duplication)
- To access the report for a Streaming Discovery job or a Streaming Imaging job, right click the desired job in the ClientThe highest level in the ADD hierarchy. A Client is required to create a case. Management tree view to display the context menu.
- Choose Reporting > Deduplication Detailed. The Windows Explorer appears and displays a named .CSV file containing the job type and job name.
- Accept or change the directory location.
- Click Save. The report generates, and Windows Explorer displays the saved .CSV file. (Note: If there is no data, a prompt appears stating: Report contains no data.
- Click OK to close the prompt. No report generates.)
- Open the .CSV file to view the report data.