Near Duplicates

This Ipro Analytics function identifies textual near-duplicate groups in item text. Textual near-duplicates are items in which most of the text appears in other items in the group and in the same order.

The metadata of the items identified as part of a near-duplicate group will be updated with a group IDIn the eCapture Controller, a number assigned (through the SQL database) to every Project, Discovery Job, or Processing Job. to identify the near-duplicate group to which the items belong.


For Ipro Analytics:

The documents are grouped by IA ND Family.

The documents are sorted by IA ND Sort.

The selected fields are BEGDOC, IA ND Family, IA ND Sort, IA ND Words, and IA ND Score.


Related pages: