Machine learning techniques inform credit scores
Mention machine learning these days and it conjures up anything from self-driving cars to the robot apocalypse.
Here’s the truth: machine learning has been in use in the financial services industry for many years. That said, using what is essentially just a mathematical technique to develop a credit scoring model is actually unique for a tri-bureau credit scoring model.
In fact, for our latest model, VantageScore 4.0, our data scientists leveraged machine learning techniques in their development of scorecards for those with dormant credit histories (i.e., those with scoreable trades but with no update to their credit file within the last six months).
But what is machine learning and what did our data scientists actually do? Glad you asked!
When seeking to improve the performance of credit score models, typically two approaches are used: incorporate additional data (e.g., rent, utility and cell phone information) or use enhanced mathematical techniques to ascertain the predictive relationships within the existing behavioral credit data. We simply hooked up these fundamental techniques with massive computing capability to produce significantly improved predictive performance.
Normally, a credit scoring model is built using attributes that only incorporate one or two dimensions of the existing behavioral credit data. This approach is generally adequate when a model is used to score those with conventional credit histories, i.e., thick files which contain, for example, a combination of auto loans, credit cards, etc. But for those with thin files (or those who have not used credit within the last six months), this approach to credit does not adequately measure their risk of default or otherwise provide enough predictive power. That’s where machine learning comes in handy.
For these thin file consumers, multiple behavioral dimensions may be evaluated to ascertain predictive relationships that may be otherwise ignored but may be used to produce a model that is both predicative and able to score these consumers. But machine learning must be used to both discover those relationships and to dig deeply into them to yield improved performance. Even the fantastic data scientists at VantageScore don’t have enough bandwidth to combine the myriad of behavioral relationships needed to create these attributes, so they used a machine learning technique – an algorithm – to do the data mining for them.
Once these relationships were discovered, that’s when the data scientists went to work. The highest performing relationships, or “nodes” in data speak, were identified and combined to provide an estimate of the optimal predictive performance. The goal then became to convert these unstructured high-performing nodes into structured, static attributes that can be reason coded and incorporated into a credit scoring model that is 100% compliant.
Underneath my signature is a really great illustration of this process. Obviously, it is a simplified explanation of how VantageScore 4.0 uses machine learning to build scorecards for the consumers that other models consider to be unscoreable. To learn more, please download and read our recent whitepaper that digs in deeper into the topic of the machine learning techniques.
But the results speak for themselves: According to our latest validations study, VantageScore 4.0, with the use of machine learning, outperforms VantageScore 3.0 (which did not leverage machine learning) by 10% for new account originations and 4.2% for existing account management trades.
More on our model validations results next month. For now, we hope our July newsletter offers you some light beach reading: a “Did You Know” article about how most consumers actually have great credit scores, tips for small business entrepreneurs, and the latest results from our annual credit knowledge survey with the Consumer Federation of America.
CEO and President, VantageScore Solutions