Scorecard Validation


Gini & ROC Curve
Kolmogorov-Smirnov Curve
Risk Segments
Strategy Curve
Approval vs “Bad” Rate
“Good” and “Bad” distribution
Cumulative Event Rates
Chi-squared Scorecard Stability Test
Score Points Distribution

Gini & ROC Curve

The most widely used way to evaluate quality of a scorecard is Gini coefficient 
and ROC curve

ROC curve that located higher and more to the left is indicates better scorecard quality.

The evaluation of the quality of classification by the Gini coefficient can be checked with the help of the following tables:

Application Scoring

scr_8_1

Collection Scoring

scr_8_3

Behavioral Scoring

scr_8_2

Fraud Scoring

The Gini approach is not relevant for fraud scoring because the number of fraudsters in a typical dataset is too small, and scorecard quality should be analyzed with other methods.

ROC Curve values usually calculated not only for the dataset that was used to create a scorecard (training set), but also for a separate out-of-sample validation dataset. ROC Curve values for training and validation datasets should be close to each other. When several scorecards are compared, preference is given to the one with the highest Gini value.

Unacceptable ROC curve performance.
Scorecard need to be improved.

scr_8_4

Reality.
Acceptable ROC curve performance.

scr_8_5

Perfect ROC curve performance.

scr_8_6

top


Kolmogorov-Smirnov Curve

Shows how strong is the difference between the distributions of the “Bad” and “Good” borrowers

Kolmogorov-Smirnov curve shows the difference between the distribution of “Goods” and “Bads”. The maximum difference between “Goods” and “Bads” distribution known as a Kolmogorov-Smirnov value, that is often used together with Gini value to asses scorecard quality.

scr_9_1

Kolmogorov-Smirnov values usually calculated not only for the dataset that was used to create a scorecard (training set), but also for a separate out-of-sample validation dataset. Kolmogorov-Smirnov values for training and validation datasets should be close to each other.

When several scorecards compared, preference is given to the one with the highest Kolmogorov-Smirnov value.

Unacceptable Kolmogorov-Smirnov curves

scr_9_2

top


Risk Segments

Allows evaluating the logicality and magnitude of the risks’ distribution and expected odds for each risk segment

Additionaly, Risk Segments Graph helps to select cut-off point based on “Good : Bad” odds.

Unacceptable Risk Segments Graph

scr_10_1

Acceptable Risk Segments Graph

scr_10_2

top


Strategy Curve

Allows evaluating the degree of the discrepancy between actual odds and odds predicted by scorecard and determining those score ranges where the scorecard makes majority of mistake

Additionally, Strategy Curve helps making decisions about the use of the developed Scorecard and its complete or partial restructuring, depending on the degree of the found discrepancy.

Unacceptable Strategy Curve

scr_11_1

Acceptable Risk Segments Graph

scr_11_2

top


Approval vs “Bad” Rate

Shows the dependence between the approved borrowers and the corresponding share of the “Bad” borrowers for each score

Approval vs “Bad” Rate chart allows setting the initial cut-off point value that ensures the minimum level of the share of the “Bad” borrowers under a permissible level of the approved borrowers.

Unacceptable Approval vs “Bad” Rate chart

scr_12_1

Acceptable Approval vs “Bad” Rate chart

scr_12_2

top


“Good” and “Bad” distribution

Allows to visually assess the distribution of “Good” and “Bad”, resulting from the use Scorecard

The typical “hill-like” shape of the peaks and easily seen difference between “Good” and “Bad” distributions indicate a proper Scorecard performance and its ability to differentiate between “Goods” and “Bads”.

Unacceptable distribution graphs

Scorecard does not help to differentiate “Goods” from “Bads”

scr_13_1

Acceptable distribution graphs

scr_13_2

top


Cumulative Event Rates

Shows the dependence between changes in rates of “Good” and “Bad” and changes in the score

 An increase in the share of the “Good” outcomes, accompanied by a decrease in the number of “Bad” accounts, confirms that the Scorecard’s performance is logical.

A monotonous decrease in the share of the “Bad” borrowers in the upper score range speaks about the correctness of the Scorecard’s performance and its ability to differentiate “Bad” borrowers into the lower part of the working range.

Acceptable Cumulative Event Rates graph

scr_14_1

Unacceptable Cumulative Event Rates graph

scr_14_2

top


Chi-squared Scorecard Stability Test

Scorecard Stability – is scorecard ability to perform with expected quality even with a drift in customer base.

Chi-squared Test calculates difference between actual and predicted distributions for all borrower characteristics. The lower the difference, the more stable the developed scorecard is.

chi1

Chi-squared Test is based on Hosmer-Lemeshow factor that is calculated for each scorecard characteristics as a sum of chi-square values for each attribute (sub-characteristic).

chi2

HL factor is then compared with a reference value of chi-squared distribution for the corresponding number of the degrees of freedom. The number of the degrees of freedom is equal to the number of attributes minus 1.

chi3

top


Score Points Distribution

Allows visual evaluation of the scorecard working range. The columns in the diagram correspond to the number of borrowers who got assigned a certain score.

d_chart1

Score distribution must have a “hill-like” visual form that maximally resembles normal distribution.

d_chart2

If the distribution has a “hill-like” visual form that is concentrated in a very narrow score range, that will cause problems with risk control. The bigger the number of borrowers within a small range is, the scarcer is the possibility of preventing changes in the credit portfolio, the more change-sensitive is the risk control system. In this case, we need to re-calibrate the scorecard by setting broader limits of the score range.

d_chart3

If the distribution is divided in to separate segments, that means that there are borrowers, whose credit behavior differs in principle from that of the main population. 
In this case, we need to find the categories of borrowers, whose behavior is “very good” or “very bad”, andchange the way, in which corresponding characteristics are categorized.

top


 


Need Scoring trainings?   Scorecard Development Services

FREE Trial Scorecard Development Software   FREE Trial Credit Scoring System

Credit Scoring Software is the most easy-to-use and the fastest to integrate scoring system.