This function combines results from weight of evidence, information value and summary statistics.

ExpCatStat(
  data,
  Target = NULL,
  result = "Stat",
  clim = 10,
  nlim = 10,
  bins = 10,
  Pclass = NULL,
  plot = FALSE,
  top = 20,
  Round = 2
)

Arguments

data

dataframe or matrix

Target

target variable

result

"Stat" - summary statistics, "IV" - information value

clim

maximum unique levles for categorical variable. Variables will be dropped if unique levels is higher than clim for class factor/character variable

nlim

maximum unique values for numeric variable.

bins

number of bins (default is 10)

Pclass

reference category of target variable

plot

Inforamtion value barplot (default FALSE)

top

for plotting top information values (default value is 20)

Round

round of value

Value

This function provides summary statistics for categorical variable

  • Stat - Summary statistics includes Chi square test scores, p value, Information values, Cramers V and Degree if association

  • IV - Weight of evidence and Information values

Columns description:

  • Variable variable name

  • Target - Target variable

  • class - name of bin (variable value otherwise)

  • out0 - number of good observations

  • out1 - number of bad observations

  • Total - Total values for each category

  • pct1 - good observations / total good observations

  • pct0 - bad observations / total bad observations

  • odds - Odds ratio [(a/b)/(c/d)]

  • woe - Weight of Evidence – calculated as ln(odds)

  • iv - Information Value - ln(odds) * (pct0 – pct1)

Details

Criteria used for categorical variable predictive power classification are

  • If information value is < 0.03 then predictive power = "Not Predictive"

  • If information value is 0.3 to 0.1 then predictive power = "Somewhat Predictive"

  • If information value is 0.1 to 0.3 then predictive power = "Meidum Predictive"

  • If information value is >0.3 then predictive power = "Highly Predictive"

Author

dubrangala

Examples

# Example 1 ## Read mtcars data # Target variable "am" - Transmission (0 = automatic, 1 = manual) # Summary statistics ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10, Pclass=1,plot=FALSE,top=20,Round=2)
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Variable Target Unique Chi-squared p-value df IV Value Cramers V #> 1 cyl am 3 8.741 0.013 2 1.32 0.52 #> 2 vs am 2 0.348 0.556 1 0.11 0.10 #> 3 gear am 3 20.945 0.000 2 0.44 0.81 #> 4 carb am 6 6.237 0.284 5 0.17 0.44 #> 5 mpg am 10 20.945 0.013 9 0.14 0.81 #> 6 disp am 10 21.636 0.010 9 0.35 0.82 #> 7 hp am 10 17.490 0.042 9 0.47 0.74 #> 8 drat am 10 21.497 0.011 9 0.12 0.82 #> 9 wt am 10 20.254 0.016 9 0.46 0.80 #> 10 qsec am 10 11.824 0.223 9 0.54 0.61 #> Degree of Association Predictive Power #> 1 Strong Highly Predictive #> 2 Weak Somewhat Predictive #> 3 Strong Highly Predictive #> 4 Strong Somewhat Predictive #> 5 Strong Somewhat Predictive #> 6 Strong Highly Predictive #> 7 Strong Highly Predictive #> 8 Strong Somewhat Predictive #> 9 Strong Highly Predictive #> 10 Strong Highly Predictive
# Information value plot ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10, Pclass=1,plot=TRUE,top=20,Round=2)
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Warning: Chi-squared approximation may be incorrect
#> Variable Target Unique Chi-squared p-value df IV Value Cramers V #> 1 cyl am 3 8.741 0.013 2 1.32 0.52 #> 2 vs am 2 0.348 0.556 1 0.11 0.10 #> 3 gear am 3 20.945 0.000 2 0.44 0.81 #> 4 carb am 6 6.237 0.284 5 0.17 0.44 #> 5 mpg am 10 20.945 0.013 9 0.14 0.81 #> 6 disp am 10 21.636 0.010 9 0.35 0.82 #> 7 hp am 10 17.490 0.042 9 0.47 0.74 #> 8 drat am 10 21.497 0.011 9 0.12 0.82 #> 9 wt am 10 20.254 0.016 9 0.46 0.80 #> 10 qsec am 10 11.824 0.223 9 0.54 0.61 #> Degree of Association Predictive Power #> 1 Strong Highly Predictive #> 2 Weak Somewhat Predictive #> 3 Strong Highly Predictive #> 4 Strong Somewhat Predictive #> 5 Strong Somewhat Predictive #> 6 Strong Highly Predictive #> 7 Strong Highly Predictive #> 8 Strong Somewhat Predictive #> 9 Strong Highly Predictive #> 10 Strong Highly Predictive
# Inforamtion value for categorical Independent variables ExpCatStat(mtcars,Target="am",result = "IV",clim=10,nlim=10,bins=10, Pclass=1,plot=FALSE,top=20,Round=2)
#> Variable Class Out_1 Out_0 TOTAL Per_1 Per_0 Odds WOE IV Ref_1 #> 1 cyl.1 4 8 3 11 0.62 0.16 8.53 1.36 0.63 1 #> 2 cyl.2 6 3 4 7 0.23 0.21 1.12 0.10 0.00 1 #> 3 cyl.3 8 2 12 14 0.15 0.63 0.11 -1.43 0.69 1 #> 4 vs.1 0 6 12 18 0.46 0.63 0.50 -0.31 0.05 1 #> 5 vs.2 1 7 7 14 0.54 0.37 2.00 0.38 0.06 1 #> 6 gear.1 3 0 15 15 0.00 0.79 0.00 0.00 0.00 1 #> 7 gear.2 4 8 4 12 0.62 0.21 6.00 1.08 0.44 1 #> 8 gear.3 5 5 0 5 0.38 0.00 0.00 0.00 0.00 1 #> 9 carb.1 1 4 3 7 0.31 0.16 2.37 0.66 0.10 1 #> 10 carb.2 2 4 6 10 0.31 0.32 0.96 -0.03 0.00 1 #> 11 carb.3 3 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 12 carb.4 4 3 7 10 0.23 0.37 0.51 -0.48 0.07 1 #> 13 carb.5 6 1 0 1 0.08 0.00 0.00 0.00 0.00 1 #> 14 carb.6 8 1 0 1 0.08 0.00 0.00 0.00 0.00 1 #> 15 mpg.1 [10.4,13.3] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 16 mpg.2 (13.3,15] 1 2 3 0.08 0.11 0.71 -0.31 0.01 1 #> 17 mpg.3 (15,15.8] 1 3 4 0.08 0.16 0.44 -0.69 0.06 1 #> 18 mpg.4 (15.8,17.8] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 19 mpg.5 (17.8,19.2] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 20 mpg.6 (19.2,21] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 21 mpg.7 (21,21.4] 1 1 2 0.08 0.05 1.50 0.47 0.01 1 #> 22 mpg.8 (21.4,24.4] 1 3 4 0.08 0.16 0.44 -0.69 0.06 1 #> 23 mpg.9 (24.4,30.4] 4 0 4 0.31 0.00 0.00 0.00 0.00 1 #> 24 mpg.10 (30.4,33.9] 2 0 2 0.15 0.00 0.00 0.00 0.00 1 #> 25 disp.1 [71.1,78.7] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 26 disp.2 (78.7,108] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 27 disp.3 (108,140.8] 2 2 4 0.15 0.11 1.55 0.31 0.01 1 #> 28 disp.4 (140.8,160] 3 1 4 0.23 0.05 5.40 1.53 0.28 1 #> 29 disp.5 (160,167.6] 0 2 2 0.00 0.11 0.00 0.00 0.00 1 #> 30 disp.6 (167.6,275.8] 0 5 5 0.00 0.26 0.00 0.00 0.00 1 #> 31 disp.7 (275.8,301] 1 0 1 0.08 0.00 0.00 0.00 0.00 1 #> 32 disp.8 (301,351] 1 3 4 0.08 0.16 0.44 -0.69 0.06 1 #> 33 disp.9 (351,400] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 34 disp.10 (400,472] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 35 hp.1 [52,65] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 36 hp.2 (65,91] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 37 hp.3 (91,105] 1 3 4 0.08 0.16 0.44 -0.69 0.06 1 #> 38 hp.4 (105,110] 3 1 4 0.23 0.05 5.40 1.53 0.28 1 #> 39 hp.5 (110,123] 1 2 3 0.08 0.11 0.71 -0.31 0.01 1 #> 40 hp.6 (123,150] 0 2 2 0.00 0.11 0.00 0.00 0.00 1 #> 41 hp.7 (150,175] 1 2 3 0.08 0.11 0.71 -0.31 0.01 1 #> 42 hp.8 (175,205] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 43 hp.9 (205,245] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 44 hp.10 (245,335] 2 0 2 0.15 0.00 0.00 0.00 0.00 1 #> 45 drat.1 [2.76,2.93] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 46 drat.2 (2.93,3.07] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 47 drat.3 (3.07,3.15] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 48 drat.4 (3.15,3.23] 0 2 2 0.00 0.11 0.00 0.00 0.00 1 #> 49 drat.5 (3.23,3.69] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 50 drat.6 (3.69,3.77] 1 2 3 0.08 0.11 0.71 -0.31 0.01 1 #> 51 drat.7 (3.77,3.9] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 52 drat.8 (3.9,4.08] 2 3 5 0.15 0.16 0.97 -0.06 0.00 1 #> 53 drat.9 (4.08,4.22] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 54 drat.10 (4.22,4.93] 2 0 2 0.15 0.00 0.00 0.00 0.00 1 #> 55 wt.1 [1.513,1.835] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 56 wt.2 (1.835,2.2] 3 0 3 0.23 0.00 0.00 0.00 0.00 1 #> 57 wt.3 (2.2,2.77] 3 1 4 0.23 0.05 5.40 1.53 0.28 1 #> 58 wt.4 (2.77,3.15] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 59 wt.5 (3.15,3.215] 1 2 3 0.08 0.11 0.71 -0.31 0.01 1 #> 60 wt.6 (3.215,3.44] 0 4 4 0.00 0.21 0.00 0.00 0.00 1 #> 61 wt.7 (3.44,3.52] 0 2 2 0.00 0.11 0.00 0.00 0.00 1 #> 62 wt.8 (3.52,3.78] 1 3 4 0.08 0.16 0.44 -0.69 0.06 1 #> 63 wt.9 (3.78,4.07] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 64 wt.10 (4.07,5.424] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 65 qsec.1 [14.5,15.41] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 66 qsec.2 (15.41,16.46] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 67 qsec.3 (16.46,17.02] 3 2 5 0.23 0.11 2.55 0.74 0.09 1 #> 68 qsec.4 (17.02,17.3] 0 2 2 0.00 0.11 0.00 0.00 0.00 1 #> 69 qsec.5 (17.3,17.6] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 70 qsec.6 (17.6,18] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> 71 qsec.7 (18,18.6] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 72 qsec.8 (18.6,19.44] 2 2 4 0.15 0.11 1.55 0.31 0.01 1 #> 73 qsec.9 (19.44,20] 2 1 3 0.15 0.05 3.27 1.10 0.11 1 #> 74 qsec.10 (20,22.9] 0 3 3 0.00 0.16 0.00 0.00 0.00 1 #> Ref_0 Target #> 1 0 am #> 2 0 am #> 3 0 am #> 4 0 am #> 5 0 am #> 6 0 am #> 7 0 am #> 8 0 am #> 9 0 am #> 10 0 am #> 11 0 am #> 12 0 am #> 13 0 am #> 14 0 am #> 15 0 am #> 16 0 am #> 17 0 am #> 18 0 am #> 19 0 am #> 20 0 am #> 21 0 am #> 22 0 am #> 23 0 am #> 24 0 am #> 25 0 am #> 26 0 am #> 27 0 am #> 28 0 am #> 29 0 am #> 30 0 am #> 31 0 am #> 32 0 am #> 33 0 am #> 34 0 am #> 35 0 am #> 36 0 am #> 37 0 am #> 38 0 am #> 39 0 am #> 40 0 am #> 41 0 am #> 42 0 am #> 43 0 am #> 44 0 am #> 45 0 am #> 46 0 am #> 47 0 am #> 48 0 am #> 49 0 am #> 50 0 am #> 51 0 am #> 52 0 am #> 53 0 am #> 54 0 am #> 55 0 am #> 56 0 am #> 57 0 am #> 58 0 am #> 59 0 am #> 60 0 am #> 61 0 am #> 62 0 am #> 63 0 am #> 64 0 am #> 65 0 am #> 66 0 am #> 67 0 am #> 68 0 am #> 69 0 am #> 70 0 am #> 71 0 am #> 72 0 am #> 73 0 am #> 74 0 am