Function provides summary statistics for all numerical variable. This function automatically scans through each variable and select only numeric/integer variables. Also if we know the target variable, function will generate relationship between target variable and each independent variable.

ExpNumStat(
  data,
  by = "A",
  gp = NULL,
  Qnt = NULL,
  Nlim = 10,
  MesofShape = 2,
  Outlier = FALSE,
  round = 3,
  dcast = FALSE,
  val = NULL
)

Arguments

data

dataframe or matrix

by

group by A (summary statistics by All), G (summary statistics by group), GA (summary statistics by group and Overall)

gp

target variable if any, default NULL

Qnt

default NULL. Specified quantiles is c(.25,0.75) will find 25th and 75th percentiles

Nlim

numeric variable limit (default value is 10 which means it will only consider those variable having more than 10 unique values and variable type is numeric/integer)

MesofShape

Measures of shapes (Skewness and kurtosis).

Outlier

Calculate the lower hinge, upper hinge and number of outliers

round

round off

dcast

fast dcast from data.table

val

Name of the column whose values will be filled to cast (see Detials sections for list of column names)

Value

summary statistics for numeric independent variables

Summary by:

  • Only overall level

  • Only group level

  • Both overall and group level

Details

coloumn descriptions

  • Vname is Variable name

  • Group is Target variable

  • TN is Total sample (inculded NA observations)

  • nNeg is Total negative observations

  • nPos is Total positive observations

  • nZero is Total zero observations

  • NegInf is Negative infinite count

  • PosInf is Positive infinite count

  • NA_value is Not Applicable count

  • Per_of_Missing is Percentage of missings

  • Min is minimum value

  • Max is maximum value

  • Mean is average value

  • Median is median value

  • SD is Standard deviation

  • CV is coefficient of variations (SD/mean)*100

  • IQR is Inter quartile range

  • Qnt is quantile values

  • MesofShape is Skewness and Kurtosis

  • Outlier is Number of outliers

  • Cor is Correlation b/w target and independent variables

See also

Examples

# Descriptive summary of numeric variables is Summary by Target variables ExpNumStat(mtcars,by="G",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=TRUE,round=3)
#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing #> 2 disp gear:4 12 0 0 12 0 0 0 0 #> 8 disp gear:3 15 0 0 15 0 0 0 0 #> 14 disp gear:5 5 0 0 5 0 0 0 0 #> 4 drat gear:4 12 0 0 12 0 0 0 0 #> 10 drat gear:3 15 0 0 15 0 0 0 0 #> 16 drat gear:5 5 0 0 5 0 0 0 0 #> 3 hp gear:4 12 0 0 12 0 0 0 0 #> 9 hp gear:3 15 0 0 15 0 0 0 0 #> 15 hp gear:5 5 0 0 5 0 0 0 0 #> 1 mpg gear:4 12 0 0 12 0 0 0 0 #> 7 mpg gear:3 15 0 0 15 0 0 0 0 #> 13 mpg gear:5 5 0 0 5 0 0 0 0 #> 6 qsec gear:4 12 0 0 12 0 0 0 0 #> 12 qsec gear:3 15 0 0 15 0 0 0 0 #> 18 qsec gear:5 5 0 0 5 0 0 0 0 #> 5 wt gear:4 12 0 0 12 0 0 0 0 #> 11 wt gear:3 15 0 0 15 0 0 0 0 #> 17 wt gear:5 5 0 0 5 0 0 0 0 #> sum min max mean median SD CV IQR Skewness #> 2 1476.200 71.100 167.600 123.017 130.900 38.909 0.316 81.075 -0.200 #> 8 4894.500 120.100 472.000 326.300 318.000 94.853 0.291 104.200 -0.267 #> 14 1012.400 95.100 351.000 202.480 145.000 115.491 0.570 180.700 0.408 #> 4 48.520 3.690 4.930 4.043 3.920 0.312 0.077 0.188 1.994 #> 10 46.990 2.760 3.730 3.133 3.080 0.274 0.087 0.145 1.017 #> 16 19.580 3.540 4.430 3.916 3.770 0.390 0.099 0.600 0.386 #> 3 1074.000 52.000 123.000 89.500 94.000 25.893 0.289 44.250 -0.077 #> 9 2642.000 97.000 245.000 176.133 180.000 47.689 0.271 60.000 -0.196 #> 15 978.000 91.000 335.000 195.600 175.000 102.834 0.526 151.000 0.337 #> 1 294.400 17.800 33.900 24.533 22.800 5.277 0.215 7.075 0.611 #> 7 241.600 10.400 21.500 16.107 15.500 3.372 0.209 3.900 -0.082 #> 13 106.900 15.000 30.400 21.380 19.700 6.659 0.311 10.200 0.373 #> 6 227.580 16.460 22.900 18.965 18.755 1.614 0.085 1.113 0.891 #> 12 265.380 15.410 20.220 17.692 17.420 1.350 0.076 0.955 0.437 #> 18 78.200 14.500 16.900 15.640 15.500 1.130 0.072 2.100 0.113 #> 5 31.400 1.615 3.440 2.617 2.700 0.633 0.242 1.026 -0.157 #> 11 58.389 2.465 5.424 3.893 3.730 0.833 0.214 0.508 0.714 #> 17 13.163 1.513 3.570 2.633 2.770 0.819 0.311 1.030 -0.276 #> Kurtosis 10% 20% LB.25% UB.75% nOutliers #> 2 -1.615 76.000 78.760 -42.688 281.612 0 #> 8 -0.235 238.200 272.240 119.500 536.300 0 #> 14 -1.647 105.180 115.260 -150.750 572.050 0 #> 4 3.640 3.855 3.900 3.619 4.369 1 #> 10 0.707 2.828 2.986 2.818 3.397 4 #> 16 -1.555 3.572 3.604 2.720 5.120 0 #> 3 -1.558 62.300 65.200 -0.625 176.375 0 #> 9 -0.910 107.000 142.000 60.000 300.000 0 #> 15 -1.418 99.800 108.600 -113.500 490.500 0 #> 1 -0.946 19.380 21.000 10.387 38.688 0 #> 7 -0.640 11.560 14.100 8.650 24.250 0 #> 13 -1.457 15.320 15.640 0.500 41.300 0 #> 6 1.323 17.148 18.344 16.796 21.246 2 #> 12 -0.262 16.252 16.990 15.602 19.423 4 #> 18 -1.729 14.540 14.580 11.450 19.850 0 #> 5 -1.300 1.845 1.988 0.594 4.699 0 #> 11 -0.161 3.303 3.439 2.689 4.719 4 #> 17 -1.273 1.764 2.015 0.595 4.715 0
# Descriptive summary of numeric variables is Summary by Overall ExpNumStat(mtcars,by="A",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=TRUE,round=3)
#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing sum #> 2 disp All 32 0 0 32 0 0 0 0 7383.100 #> 4 drat All 32 0 0 32 0 0 0 0 115.090 #> 3 hp All 32 0 0 32 0 0 0 0 4694.000 #> 1 mpg All 32 0 0 32 0 0 0 0 642.900 #> 6 qsec All 32 0 0 32 0 0 0 0 571.160 #> 5 wt All 32 0 0 32 0 0 0 0 102.952 #> min max mean median SD CV IQR Skewness Kurtosis 10% #> 2 71.100 472.000 230.722 196.300 123.939 0.537 205.175 0.400 -1.090 80.610 #> 4 2.760 4.930 3.597 3.695 0.535 0.149 0.840 0.279 -0.565 3.007 #> 3 52.000 335.000 146.688 123.000 68.563 0.467 83.500 0.761 0.052 66.000 #> 1 10.400 33.900 20.091 19.200 6.027 0.300 7.375 0.640 -0.201 14.340 #> 6 14.500 22.900 17.849 17.710 1.787 0.100 2.008 0.387 0.554 15.534 #> 5 1.513 5.424 3.217 3.325 0.978 0.304 1.029 0.444 0.172 1.956 #> 20% LB.25% UB.75% nOutliers #> 2 120.140 -186.938 633.763 0 #> 4 3.072 1.820 5.180 0 #> 3 93.400 -28.750 305.250 1 #> 1 15.200 4.363 33.862 1 #> 6 16.734 13.881 21.911 1 #> 5 2.349 1.038 5.153 3
# Descriptive summary of numeric variables is Summary by Overall and Group ExpNumStat(mtcars,by="GA",gp="gear",Qnt=seq(0,1,.1),MesofShape=1, Outlier=TRUE,round=2)
#> Vname Group TN nNeg nZero nPos NegInf PosInf NA_Value Per_of_Missing #> 2 disp gear:All 32 0 0 32 0 0 0 0 #> 8 disp gear:4 12 0 0 12 0 0 0 0 #> 14 disp gear:3 15 0 0 15 0 0 0 0 #> 20 disp gear:5 5 0 0 5 0 0 0 0 #> 4 drat gear:All 32 0 0 32 0 0 0 0 #> 10 drat gear:4 12 0 0 12 0 0 0 0 #> 16 drat gear:3 15 0 0 15 0 0 0 0 #> 22 drat gear:5 5 0 0 5 0 0 0 0 #> 3 hp gear:All 32 0 0 32 0 0 0 0 #> 9 hp gear:4 12 0 0 12 0 0 0 0 #> 15 hp gear:3 15 0 0 15 0 0 0 0 #> 21 hp gear:5 5 0 0 5 0 0 0 0 #> 1 mpg gear:All 32 0 0 32 0 0 0 0 #> 7 mpg gear:4 12 0 0 12 0 0 0 0 #> 13 mpg gear:3 15 0 0 15 0 0 0 0 #> 19 mpg gear:5 5 0 0 5 0 0 0 0 #> 6 qsec gear:All 32 0 0 32 0 0 0 0 #> 12 qsec gear:4 12 0 0 12 0 0 0 0 #> 18 qsec gear:3 15 0 0 15 0 0 0 0 #> 24 qsec gear:5 5 0 0 5 0 0 0 0 #> 5 wt gear:All 32 0 0 32 0 0 0 0 #> 11 wt gear:4 12 0 0 12 0 0 0 0 #> 17 wt gear:3 15 0 0 15 0 0 0 0 #> 23 wt gear:5 5 0 0 5 0 0 0 0 #> sum min max mean median SD CV IQR 0% 10% 20% #> 2 7383.10 71.10 472.00 230.72 196.30 123.94 0.54 205.18 71.10 80.61 120.14 #> 8 1476.20 71.10 167.60 123.02 130.90 38.91 0.32 81.08 71.10 76.00 78.76 #> 14 4894.50 120.10 472.00 326.30 318.00 94.85 0.29 104.20 120.10 238.20 272.24 #> 20 1012.40 95.10 351.00 202.48 145.00 115.49 0.57 180.70 95.10 105.18 115.26 #> 4 115.09 2.76 4.93 3.60 3.70 0.53 0.15 0.84 2.76 3.01 3.07 #> 10 48.52 3.69 4.93 4.04 3.92 0.31 0.08 0.19 3.69 3.86 3.90 #> 16 46.99 2.76 3.73 3.13 3.08 0.27 0.09 0.14 2.76 2.83 2.99 #> 22 19.58 3.54 4.43 3.92 3.77 0.39 0.10 0.60 3.54 3.57 3.60 #> 3 4694.00 52.00 335.00 146.69 123.00 68.56 0.47 83.50 52.00 66.00 93.40 #> 9 1074.00 52.00 123.00 89.50 94.00 25.89 0.29 44.25 52.00 62.30 65.20 #> 15 2642.00 97.00 245.00 176.13 180.00 47.69 0.27 60.00 97.00 107.00 142.00 #> 21 978.00 91.00 335.00 195.60 175.00 102.83 0.53 151.00 91.00 99.80 108.60 #> 1 642.90 10.40 33.90 20.09 19.20 6.03 0.30 7.38 10.40 14.34 15.20 #> 7 294.40 17.80 33.90 24.53 22.80 5.28 0.22 7.08 17.80 19.38 21.00 #> 13 241.60 10.40 21.50 16.11 15.50 3.37 0.21 3.90 10.40 11.56 14.10 #> 19 106.90 15.00 30.40 21.38 19.70 6.66 0.31 10.20 15.00 15.32 15.64 #> 6 571.16 14.50 22.90 17.85 17.71 1.79 0.10 2.01 14.50 15.53 16.73 #> 12 227.58 16.46 22.90 18.96 18.75 1.61 0.09 1.11 16.46 17.15 18.34 #> 18 265.38 15.41 20.22 17.69 17.42 1.35 0.08 0.96 15.41 16.25 16.99 #> 24 78.20 14.50 16.90 15.64 15.50 1.13 0.07 2.10 14.50 14.54 14.58 #> 5 102.95 1.51 5.42 3.22 3.33 0.98 0.30 1.03 1.51 1.96 2.35 #> 11 31.40 1.61 3.44 2.62 2.70 0.63 0.24 1.03 1.61 1.84 1.99 #> 17 58.39 2.46 5.42 3.89 3.73 0.83 0.21 0.51 2.46 3.30 3.44 #> 23 13.16 1.51 3.57 2.63 2.77 0.82 0.31 1.03 1.51 1.76 2.01 #> 30% 40% 50% 60% 70% 80% 90% 100% LB.25% UB.75% #> 2 142.06 160.00 196.30 275.80 303.10 350.80 396.00 472.00 -186.94 633.76 #> 8 87.70 113.20 130.90 144.34 156.01 160.00 166.84 167.60 -42.69 281.61 #> 14 275.80 292.72 318.00 354.00 360.00 408.00 452.00 472.00 119.50 536.30 #> 20 125.24 135.12 145.00 207.40 269.80 311.00 331.00 351.00 -150.75 572.05 #> 4 3.15 3.35 3.70 3.82 3.91 4.05 4.21 4.93 1.82 5.18 #> 10 3.91 3.92 3.92 4.02 4.08 4.10 4.21 4.93 3.62 4.37 #> 16 3.07 3.07 3.08 3.11 3.15 3.21 3.51 3.73 2.82 3.40 #> 22 3.65 3.71 3.77 3.95 4.13 4.26 4.35 4.43 2.72 5.12 #> 3 106.20 110.00 123.00 165.00 178.50 200.00 243.50 335.00 -28.75 305.25 #> 9 66.00 76.80 94.00 103.40 109.70 110.00 121.70 123.00 -0.62 176.38 #> 15 155.00 175.00 180.00 180.00 200.00 218.00 239.00 245.00 60.00 300.00 #> 21 125.40 150.20 175.00 210.60 246.20 278.20 306.60 335.00 -113.50 490.50 #> 1 15.98 17.92 19.20 21.00 21.47 24.08 30.09 33.90 4.36 33.86 #> 7 21.12 21.96 22.80 23.76 26.43 29.78 32.20 33.90 10.39 38.69 #> 13 14.80 15.20 15.50 16.76 17.94 18.80 20.52 21.50 8.65 24.25 #> 19 16.58 18.14 19.70 22.22 24.74 26.88 28.64 30.40 0.50 41.30 #> 6 17.02 17.34 17.71 18.18 18.61 19.33 19.99 22.90 13.88 21.91 #> 12 18.54 18.60 18.75 18.90 19.30 19.81 19.99 22.90 16.80 21.25 #> 18 17.10 17.36 17.42 17.69 17.95 18.29 19.78 20.22 15.60 19.42 #> 24 14.78 15.14 15.50 15.98 16.46 16.74 16.82 16.90 11.45 19.85 #> 5 2.77 3.16 3.33 3.44 3.55 3.77 4.05 5.42 1.04 5.15 #> 11 2.24 2.44 2.70 2.84 3.07 3.18 3.42 3.44 0.59 4.70 #> 17 3.47 3.55 3.73 3.80 3.84 4.31 5.31 5.42 2.69 4.72 #> 23 2.27 2.52 2.77 2.93 3.09 3.25 3.41 3.57 0.60 4.72 #> nOutliers #> 2 0 #> 8 0 #> 14 0 #> 20 0 #> 4 0 #> 10 1 #> 16 4 #> 22 0 #> 3 1 #> 9 0 #> 15 0 #> 21 0 #> 1 1 #> 7 0 #> 13 0 #> 19 0 #> 6 1 #> 12 2 #> 18 4 #> 24 0 #> 5 3 #> 11 0 #> 17 4 #> 23 0
# Summary by specific statistics for all numeric variables ExpNumStat(mtcars,by="GA",gp="gear",Qnt=c(0.1,0.2),MesofShape=2, Outlier=FALSE,round=2,dcast = TRUE,val = "IQR")
#> Stat Vname gear.3 gear.4 gear.5 gear.All #> 1 IQR disp 104.20 81.08 180.70 205.18 #> 2 IQR drat 0.14 0.19 0.60 0.84 #> 3 IQR hp 60.00 44.25 151.00 83.50 #> 4 IQR mpg 3.90 7.08 10.20 7.38 #> 5 IQR qsec 0.96 1.11 2.10 2.01 #> 6 IQR wt 0.51 1.03 1.03 1.03