Active Learning Frequently Asked Questions
The following questions comprise a Frequently Asked Questions (FAQ) list related to Active Learning. For instructions about setting up Active Learning, see the Related pages links at the end of this topic.
What should I know about the Active Learning Index? Are there any preventative measures I should take to optimize my Active Learning index?
The Index for an Active Learning project is created by default at the time that the Active Learning-enabled review pass is saved. It is recommended to validate that the records in your scope have all proper text, and/or OCR has been created and loaded before creating the review pass. Removing documents that contain a large amount of text from your scope can also be beneficial, especially if it is common within the case but is not required.
Why is the default Batch Size 10 and can it be modified?
A default batch size of 10 was selected in order to provide the Active Learning system with as many example documents in as close to real-time as they are coded. Every example provided helps to train the engine so that larger batches delay training of the system. This exponentially delays the prioritization of positive records into batches. Though the batch size can be modified, for the sake of optimization and giving the engine examples more quickly, smaller is better. If the batch size must be modified, this can be done on the Settings tab of the review pass under the Batch Size option.
Must review/training be done by subject matter experts?
No. Subject matter experts (SMEs) are not required for an Active Learning project. Documents are coded as part of batches in the review pass, and predictions are made immediately and throughout the course of the review pass.
What if I added records to my case and must add them to my Active Learning project on which I’ve already started review?
If new records are loaded to your case or the scope of criteria for your project changes, a banner then displays at the top of the Review pass screen. This allows the review pass index and predictions to be updated at the bottom of the Overview tab by clicking the Update Documents button. When this occurs, documents cannot be removed from the review pass scope or index. So if the active learning project no longer applies to some documents, the predictions can be ignored.
On what is the prediction algorithm based?
The prediction algorithm uses the 40 k-nearest neighbors (k-NN). This means that the 40 most similar manually coded documents are used to make the predictions for the records.
When do predictions start?
Predictions start being applied as soon as a regular (non-QC) batch is checked in that contains a positive example.
Does every document in the Active Learning Review pass receive a prediction and relevancy score?
Documents in the Review Pass that have a “normal” document treatment receive a prediction and relevancy score. Documents that have a “short” or “binary” document treatment do not receive a prediction or relevancy score and must be reviewed outside of Active Learning.
I see the Omitted Records category in my Review Metrics – what should I be aware of with these documents?
Omitted records are documents that were part of your review pass scope, but a prediction could not be applied because the document was mostly binary data or contained little or no words. Generally, omitted documents should be reviewed manually outside of the Active Learning process and can be identified by clicking the hyperlinked Count on the Overview tab of the review pass to open the document set within Visual Search.
Can I provide pre-coding document examples?
Pre-coded documents can be pulled into a QC batch and then reviewers would go through and tag them with the appropriate tag and “reviewed” status. For very large numbers of documents, the “reviewed” status could be updated within SQL. For more information, please contact IPRO Support.
If a document is manually tagged as a Positive example, how could the prediction be the opposite? Or vice versa?
A document’s prediction is not influenced by how you tag the document itself, but rather how manually tagged documents that are similar to it have been coded. This helps prevent bias and is known as “leave-one-out cross validation”. This is also the reason why you might have “conflicts” (the reviewer tagged it one way, but the system predicted it another). It is usually a good idea to review conflicts, as there may have been some discrepancies when tagging other similar documents.
Conflicts are a common part of Active Learning, but what should I know about conflicts and how to manage them?
Conflicts are records that were manually tagged, for which there is a disagreement between the reviewer and the prediction. Often the earlier stages of an Active Learning project can result in more conflicts, which may resolve themselves as more training is applied and the engine becomes more accurate. Conflict records can be isolated and reviewed separately by submitting them into a QC Batch for your existing Review Pass. This can be done by clicking on the hyperlinked Count on the Overview tab of the Review Pass, then choosing the Mass Action to Create QC Batch and selecting your Active Learning-enabled Review Pass from the selector. See QC Batches
What if my Primary Review Purpose tag is part of a multi-tag group?
If your Primary Review Purpose tag is part of a multi-tag tag group, (for example, the tag group has tags for Relevant, Not Relevant, and Technical Issue, and your Primary Review Purpose is set to Relevant), any tags other than the Relevant tag is recorded as a Negative example.
What happens if a reviewer doesn’t know how to code a document?
Once a document is marked as Reviewed, if the Primary Review Purpose tag is selected, then the document is recorded as a “positive” example. If the Primary Review Purpose tag is not selected, the document is recorded as a “negative” example. Therefore, reviewers should be familiar with using the OnHold review status for a document if they are unsure whether the document is a Positive or Negative example. This prevents the document from being added as an example until a review decision can be made.
My Recall seems to stop around 70%. Can I expect it to improve, or go above 70%?
The desired Recall level is set to 70% within the system. This means documents above the 70% recall threshold (higher relevancy scores), will be predicted as relevant. Documents below the 70% recall threshold (lower relevancy scores) will be predicted as non-relevant. If you desired 100% Recall, you could simply review all documents in the case, and still receive the benefits of Active Learning to prioritize the review.
Does the “Poor”, “Mediocre”, “Good”, or “Excellent” Active Learning status indicate a stopping point in the review?
The Active Learning status is related to how well the engine is trained, and if it could benefit from seeing more training example documents. The Active Learning status should not be used to gauge when to stop review, as that is often based on other metrics (Recall, for example).
I received an opposing party production and want to use Active Learning to review it. To use Active Learning, must the data be processed with IPRO?
As long as you have extracted text for your documents, you can generate an Analytics index and use Active Learning. The data does not require processing with IPRO.
Can I use Active Learning on documents with OCR text?
Documents with OCR text may require review outside of Active Learning, because of the varying quality with OCR text. However, it depends on the case and data.
Can I use Active Learning for privileged review, review for confidentiality, or other review workflows?
Most clients review documents for privilege outside of Active Learning, as there are often subtle distinctions that determine whether a document is privilege or not (that is, the recipient of a document may determine if it was privilege) and it can be difficult for Active Learning to identify those distinctions. Similarly, it can be difficult to use Active Learning to distinguish different confidentiality levels. Active Learning is best used to prioritize and assist with identifying relevancy and/or specific issues, but you could combine Active Learning as a QC step on most linear reviews.
I tagged documents outside of the Active Learning-enabled review pass. Will those be used by the engine to help train the system?
No. Documents must be part of the specific Active Learning review pass to train the engine. Tagging a document outside of the review pass (even if it is the same tag used for Active Learning) will not train or influence the engine and associated reports or metrics.
What if I normally review with the Tag Family option turned on?
The tag family rule can be used with Active Learning. However only the primary document, which is manually tagged, will be added as a training example to your Active Learning project. Family documents would be tagged, but they would not be included in the training unless they are later reviewed as part of an Active Learning batch. It is recommended to only consider the document you are reviewing when utilizing Active Learning; users should not consider documents within other relationships when they are making tagging decisions. After the Active Learning project is completed, a typical next step might be to resolve prediction discrepancies within a family before production.
Can a tag be used more than one time as a “Review Purpose”? What other things should I think about concerning my Review Purpose tags?
A tag can be used only one time as the primary review purpose. Once it is designated as the primary review purpose within a review pass, the tag can no longer be used across other review passes (this applies to both Active Learning and non-Active Learning review passes). One way to manage this is to name tags by using part of the review pass name. During a First Level Review for responsiveness, you may name the tag “FLR Responsive”. During a later review pass for QC purposes, you could name the tag as “QC Responsive”. Keep in mind that it may be easier to manage the review in a single review pass. However, following a consistent tag-naming convention can also be useful when multiple review passes are used for similar purposes.
Can reviewers apply other coding decisions to documents while reviewing for an Active Learning review pass?
Yes. Other tags, tag groups, and fields can be applied or edited during review for an Active Learning review pass. They will not affect the prioritization of the primary review purpose. Keep in mind only one tag can be designated as the primary review purpose for each review pass, which is considered the “positive” example, while any other tag that is part of that tag group, is considered a “negative” example. The most common example would be a tag group called “Responsiveness” that has a tag named “Responsive” and another tag named “Non-Responsive”. In this example, an optional step could be to tag for specific issues within another tag group, but there would be no Active Learning or prioritization taking place when tagging the issues.
What actually happens when I switch a review pass from Active Learning‑disabled to Active Learning‑enabled after documents have been reviewed?
Even in an Active Learning-disabled state, an Analytics index is created for the review pass and training will be happening based on the selected review purpose; however, documents will be batched out in the default sort order and prioritization will not occur. If in the future you decide to enable Active Learning on the review pass, the documents will start to be prioritized and batched out based on the highest likelihood of being positive. When Active Learning is enabled, batches will be created on demand when reviewers check out a batch.
What benefit does Content Filtering have for Active Learning?
Content filtering allows common phrases, sentences, email disclaimers, and so on to be ignored by the Analytics index. This leads to better clustering of documents based on the actual text. Otherwise dissimilar documents could be considered conceptually similar if they share a lengthy email disclaimer, footer, or so on.
Are “short” cluster centers prioritized in the beginning (TAR 3.0), or are just the “normal” cluster centers prioritized?
No. All short documents are excluded entirely and must be reviewed outside of the Active Learning review pass.
Where can I get an estimate of the prevalence (estimated percentage of relevant docs in the review pass)?
In the review pass under the Insights area, there is a prevalence estimate as well as a range of estimated relevant documents. This will fluctuate throughout the review, especially at the beginning of an Active Learning project.