Negative Data Suppression in Credit Scoring: Removals Yield an Advancement in Credit Scoring Accuracy
In June 2016, Equifax, Experian and TransUnion announced a series of initiatives to enhance credit data reporting accuracy: the National Consumer Assistance Plan (NCAP). Key elements included the removal of all civil judgments, a substantial reduction in the number of tax liens, and a reduction in medical-related agency collections (specifically, those less than 180 days old).
When you remove the aforementioned items from the mix, what are the implications for credit score models that traditionally rely on credit bureau data? Can a model built without these data maintain its predictive strength?
To find answers, data scientists at VantageScore Solutions, LLC, developed two nonsegmented models: one with and one without the data entries that NCAP removed. They used two million anonymized U.S. consumer credit files, randomly selected from one of the nationwide credit bureau databases, covering the years 2013–2015. Incorporating bureau credit file data, accounts, inquiries, public records and collections — a total of 900 data points for each model — the study also made sure that all attributes met compliance and regulatory guidelines.
Both models proved to be essentially equal in terms of predictiveness. However, the study revealed an interesting finding when analyzing the model built with anticipated public record and collection trade removals in mind: It lifted a small percentage of consumers above the hypothetical scoring cut-off — people who had failed to score under the traditional model. In looking more deeply at this newly approved segment, analysts found that these consumers used a more stable product mix and demonstrated superior credit management skills. In other words, they represented an attractive opportunity for lenders.
Negative Data Suppression and VantageScore 4.0
VantageScore 4.0 is the first and only patent-protected, tri-bureau credit-scoring model to be developed post-NCAP. VantageScore’s data scientists built the VantageScore 4.0 model with key provisions, keeping public record and collection trade suppression in mind.
For example, the model distinguishes medical collections from other types of collection accounts, ignores medical collections less than six months old (to allow time for insurance-payment processing) and penalizes medical collections less than non-medical ones.
VantagesScore 4.0 also relies less on derogatory collections and public-records data to generate scores for consumers with those records in their credit files. This ensures that the model will not lose substantial predictive strength in the likely event that these records fail to meet NCAP’s enhanced data-quality standards and are thus removed from consumer credit files.
But these are just highlights of the model’s many new features. VantageScore 4.0 is the first and only tri-bureau credit-scoring model to incorporate newly available trended credit data from all three credit-reporting companies (CRCs).
The model also leverages machine-learning techniques in the development of scorecards for consumers with dormant credit histories (i.e., those with scoreable trades but no update to their credit file in the last six months). This approach yielded a substantial enhancement in scoring accuracy among consumers who cannot obtain scores using traditional scoring models, strengthening VantageScore’s ability to accurately score approximately 30 million more consumers who cannot obtain scores using traditional scoring models.
For lenders, VantageScore 4.0 is an important advancement, given the changes driven by NCAP. It underscores that lenders should evaluate how their incumbent models perform with and without the suppression of some negative data and against VantageScore 4.0, to determine whether there is detrimental impact on predictive performance and an opportunity for growth.