1. Exploratory analysis - Custom tables, summary statistics

In this vignette we will discuss about how to customize the summary statistics using ExpCustomStat function from SmartEDA. The output of this function returns matrix object containing descriptive information on all input variables for each level/combination of levels in categorical/group variable. Also, while running the analysis we can filter row/cases of the data. We can apply the filters at individual variable level or complete data like base subsetting.

Function definition:

ExpCustomStat(data,Cvar=NULL,Nvar=NULL,stat=NULL,gpby=TRUE,filt=NULL,dcast=FALSE)

Key functionalities of ExpCustomStat are:

  1. Categorical data descriptive statistics (Frequencies, Proportions)
  2. Numerical data descriptive statistics (Mean, Median, Sum, Variance etc..)
  3. Comparison of numerical data based on categorical data
  4. Filter rows/cases where conditions are true. Options to apply filters at variable level or complete data set like base subsetting
  5. Options to calculate basic statistics like Mean, Median, Std.Dev, Variance, Count, Proportions, Quantiles, IQR, Percentages of Shares (PS) for numerical data

1.1 Usage of ExpCustomStat function

Will open the carseats data from ISLR package and drive different types of use cases using ExpCustomStat function.

In this vignette, we will be using a simulated data set containing sales of child car seats at 400 different stores.

Data Source ISLR package.

Function source SmartEDA package

Carseats data from ISLR package:

options(width = 150)
CData = ISLR::Carseats
head(CData,5)
##   Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban  US
## 1  9.50       138     73          11        276   120       Bad  42        17   Yes Yes
## 2 11.22       111     48          16        260    83      Good  65        10   Yes Yes
## 3 10.06       113     35          10        269    80    Medium  59        12   Yes Yes
## 4  7.40       117    100           4        466    97    Medium  55        14   Yes Yes
## 5  4.15       141     64           3        340   128       Bad  38        13   Yes  No

2. Categorical summaries

Categorical summaries to describe the distribution for a qualitative variables.

  • Count (Frequency) the number of subjects in each group
  • The proportions in each group (Relative frequency)
  • Tabulation or custom tables for more than two categorical variables

2.1. Frequency table

The number of observations for particular category

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US", "Urban" and "ShelveLoc" variable
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc","Education"),gpby=FALSE)
##      Level  Group_by Count  Prop
##  1:    Yes        US   258 64.50
##  2:     No        US   142 35.50
##  3:    Yes     Urban   282 70.50
##  4:     No     Urban   118 29.50
##  5:    Bad ShelveLoc    96 24.00
##  6:   Good ShelveLoc    85 21.25
##  7: Medium ShelveLoc   219 54.75
##  8:     17 Education    49 12.25
##  9:     10 Education    48 12.00
## 10:     12 Education    49 12.25
## 11:     14 Education    40 10.00
## 12:     13 Education    43 10.75
## 13:     16 Education    47 11.75
## 14:     15 Education    36  9.00
## 15:     18 Education    40 10.00
## 16:     11 Education    48 12.00

OR we can use similar analysis using ExpCTable function from same package, this functions includes cumulative percentages and Total

ExpCTable(Carseats,Target=NULL,clim=5,nlim=15,round=2,bin=NULL,per=F)
##     Variable  Valid Frequency Percent CumPercent
## 1  ShelveLoc    Bad        96   24.00      24.00
## 2  ShelveLoc   Good        85   21.25      45.25
## 3  ShelveLoc Medium       219   54.75     100.00
## 4  ShelveLoc  TOTAL       400      NA         NA
## 5      Urban     No       118   29.50      29.50
## 6      Urban    Yes       282   70.50     100.00
## 7      Urban  TOTAL       400      NA         NA
## 8         US     No       142   35.50      35.50
## 9         US    Yes       258   64.50     100.00
## 10        US  TOTAL       400      NA         NA
## 11 Education     10        48   12.00      12.00
## 12 Education     11        48   12.00      24.00
## 13 Education     12        49   12.25      36.25
## 14 Education     13        43   10.75      47.00
## 15 Education     14        40   10.00      57.00
## 16 Education     15        36    9.00      66.00
## 17 Education     16        47   11.75      77.75
## 18 Education     17        49   12.25      90.00
## 19 Education     18        40   10.00     100.00
## 20 Education  TOTAL       400      NA         NA
  • Base: All carseats data
  • Table: Calculating the count and proportions of "US", "Urban" and "ShelveLoc" variable
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)
##     Level  Group_by Count  Prop
## 1:    Yes        US   258 64.50
## 2:     No        US   142 35.50
## 3:    Yes     Urban   282 70.50
## 4:     No     Urban   118 29.50
## 5:    Bad ShelveLoc    96 24.00
## 6:   Good ShelveLoc    85 21.25
## 7: Medium ShelveLoc   219 54.75

2.2. Crosstabulation (more than one categorical variable)

To produce cross tables which calculate counts and proportions for each combination of categorical variables we can use ExpCustomStat

NOTE: For crosstabulation change input gpby=TRUE

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US" Vs "Urban"
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
##     US Urban Count Prop
## 1: Yes   Yes   186 46.5
## 2:  No   Yes    96 24.0
## 3: Yes    No    72 18.0
## 4:  No    No    46 11.5

We can also produce multidimensional tables based on three or more categorical variables

  • Base: All carseats data
  • Table: Calculating the count and proportions of "US" Vs "Urban" Vs "ShelveLoc"
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)
##      US Urban ShelveLoc Count  Prop
##  1: Yes   Yes       Bad    51 12.75
##  2: Yes   Yes      Good    39  9.75
##  3: Yes   Yes    Medium    96 24.00
##  4:  No   Yes       Bad    23  5.75
##  5: Yes    No       Bad    11  2.75
##  6:  No   Yes    Medium    55 13.75
##  7:  No    No    Medium    29  7.25
##  8: Yes    No    Medium    39  9.75
##  9:  No   Yes      Good    18  4.50
## 10: Yes    No      Good    22  5.50
## 11:  No    No      Good     6  1.50
## 12:  No    No       Bad    11  2.75

2.3. Adding filters to tables

If we want to understand the number of stores in US and location is Urban for Population size greater than 150

  • Base: Those stores having their average age of the local Population is > 150
  • Table: Calculating the count and proportions of "US" Vs "Urban"
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
##     US Urban         Filter Count  Prop
## 1: Yes   Yes Population>150   138 48.25
## 2:  No   Yes Population>150    63 22.03
## 3: Yes    No Population>150    53 18.53
## 4:  No    No Population>150    32 11.19
  • Base: Those stores having their average age of the local Population > 150 and Urban==“Yes”
  • Table: Calculating the count and proportions of "US" Vs "ShelveLoc"
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")
##     US ShelveLoc                        Filter Count  Prop
## 1: Yes       Bad Urban=='Yes' & Population>150    40 19.90
## 2: Yes      Good Urban=='Yes' & Population>150    31 15.42
## 3: Yes    Medium Urban=='Yes' & Population>150    67 33.33
## 4:  No       Bad Urban=='Yes' & Population>150    13  6.47
## 5:  No    Medium Urban=='Yes' & Population>150    37 18.41
## 6:  No      Good Urban=='Yes' & Population>150    13  6.47

3. Numerical summaries

Numerical summaries to describe the distribution for quantitative variables.

  • Measure of location (Mean, median)
  • Measure of variation (Variance, Std.Dev, Range)
  • Measure of shapes (Skewness and Kurtosis)
  • Other summary statistics

3.1. Numerical variable summary

  • Base: All carseats data
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','sd','min','max','IQR'))
##     Attribute Count       mean       sum          var         sd min    max    IQR
## 1: Population   400 264.840000 105936.00 21719.813935 147.376436  10 509.00 259.50
## 2:      Sales   400   7.496325   2998.53     7.975626   2.824115   0  16.27   3.93
## 3:  CompPrice   400 124.975000  49990.00   235.147243  15.334512  77 175.00  20.00
## 4:     Income   400  68.657500  27463.00   783.218239  27.986037  21 120.00  48.25
  • Base: All carseats data
  • Table: Tukey’s five number summary (minimum, lower-hinge, median, upper-hinge, maximum) for Price, Population, Sales, CompPrice, Income
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))
##     Attribute min  p0.25 median  p0.75    max
## 1: Population  10 139.00 272.00 398.50 509.00
## 2:      Sales   0   5.39   7.49   9.32  16.27
## 3:  CompPrice  77 115.00 125.00 135.00 175.00
## 4:     Income  21  42.75  69.00  91.00 120.00

3.2. Adding filters to complete data (like base subset)

Filter rows/cases of complete dataset where conditions are true

  • Base: Whether the store is in urban (Urban ==Yes)
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income.
options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','median','max'),filt="Urban=='Yes'")
##     Attribute       Filter Count       mean      sum          var   min median    max
## 1: Population Urban=='Yes'   282 259.886525 73288.00 20913.488857 12.00 274.00 508.00
## 2:      Sales Urban=='Yes'   282   7.468191  2106.03     8.044141  0.37   7.42  16.27
## 3:  CompPrice Urban=='Yes'   282 125.634752 35429.00   246.545834 77.00 125.00 175.00
## 4:     Income Urban=='Yes'   282  69.343972 19555.00   743.706885 21.00  70.00 120.00
  • Base: Those store having their average age of the local population > 150 and Urban==“Yes”
  • Table: Descriptive summary for Price, Population, Sales, CompPrice, Income.
options(width=150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','median','IQR'),filt="Urban=='Yes' & Population>150")
##     Attribute                        Filter Count       mean      sum median    IQR
## 1: Population Urban=='Yes' & Population>150   201 332.641791 66861.00 331.00 165.00
## 2:      Sales Urban=='Yes' & Population>150   201   7.573383  1522.25   7.49   3.74
## 3:  CompPrice Urban=='Yes' & Population>150   201 125.134328 25152.00 125.00  19.00
## 4:     Income Urban=='Yes' & Population>150   201  68.223881 13713.00  69.00  47.00

3.3. Filter out unique value from all the numeric variables

This will be useful when we need to exclude redundant values like ‘999’ or ‘9999’ or ‘-9’ or ‘-1111’, or ‘888’ etc from each selected variable.

Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)

Exclude 999:

x = c(23,24,34,12,12,23,45) y = c(1,3,4,0,0,8,0)

data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 999
data_sam[sample(1:400,20),"CompPrice"] <- -9
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")
##     Attribute                   Filter Count       mean       sum min
## 1: Population Population%ni% c(999,-9)   400 264.840000 105936.00  10
## 2:      Sales      Sales%ni% c(999,-9)   370   7.484919   2769.42   0
## 3:  CompPrice  CompPrice%ni% c(999,-9)   380 124.963158  47486.00  77
## 4:     Income     Income%ni% c(999,-9)   355  68.974648  24486.00  21

3.4. Adding filters at variable level

Different filters for each numeric variable. For example, below are the conditions (logic) for each variable summary analysis.

"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'

"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)

"CompPrice" - Exclude Price is greater than 150

"Education" - All stores

"Income" - Inculde only stores in US (US==Yes)

Table: Descriptive summary for Price, Population, Sales, CompPrice, Income based on the filters.

options(width = 150)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^All^US=='Yes'"))
##     Attribute            Filter Count       mean      sum          var         sd      IQR median
## 1: Population ShelveLoc=='Good'    85 267.047059 22699.00 16193.473950 127.253581 177.0000 272.00
## 2:      Sales      Urban=='Yes'   282   7.468191  2106.03     8.044141   2.836219   3.9175   7.42
## 3:  CompPrice        Price>=150    32 141.875000  4540.00   181.596774  13.475785  18.2500 142.50
## 4:  Education               All   400  13.900000  5560.00     6.867168   2.620528   4.0000  14.00
## 5:     Income         US=='Yes'   258  70.515504 18193.00   782.849953  27.979456  48.0000  70.00

4. Numerical summaries by category

Descriptive summary for numerical variable by group level.

4.1. Variable summary report (One group variable)

  • Base: All carseats data
  • Table: Descriptive summary for Price, Population, Sales by group (Urban, ShelveLoc)
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)
##      Level  Attribute  Group_by Count  Prop       mean   min    P0.25 median    p0.75    max
##  1:    Yes Population     Urban   282 70.50 259.886525 12.00 139.0000 274.00 376.7500 508.00
##  2:     No Population     Urban   118 29.50 276.677966 10.00 144.0000 271.00 408.0000 509.00
##  3:    Yes      Sales     Urban   282 70.50   7.468191  0.37   5.3750   7.42   9.2925  16.27
##  4:     No      Sales     Urban   118 29.50   7.563559  0.00   5.4400   7.67   9.3350  14.90
##  5:    Bad Population ShelveLoc    96 24.00 275.291667 10.00 145.5000 296.00 400.5000 501.00
##  6:   Good Population ShelveLoc    85 21.25 267.047059 14.00 176.0000 272.00 353.0000 503.00
##  7: Medium Population ShelveLoc   219 54.75 259.401826 12.00 124.0000 261.00 405.0000 509.00
##  8:    Bad      Sales ShelveLoc    96 24.00   5.522917  0.37   4.0525   5.21   7.4625  11.67
##  9:   Good      Sales ShelveLoc    85 21.25  10.214000  3.58   8.3300  10.50  11.9600  16.27
## 10: Medium      Sales ShelveLoc   219 54.75   7.306575  0.00   5.6250   7.38   8.7750  13.36

4.2. Variable summary report (More than One group variable)

  • Base: All carseats data
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)
##     Urban  US ShelveLoc Attribute Count  Prop      mean   sum    PS min max   IQR        sd
##  1:   Yes Yes       Bad CompPrice    51 12.75 124.88235  6369 12.74  93 161 17.00 14.768408
##  2:   Yes Yes      Good CompPrice    39  9.75 127.00000  4953  9.91  89 156 22.00 15.998355
##  3:   Yes Yes    Medium CompPrice    96 24.00 125.05208 12005 24.01  85 175 20.00 15.410434
##  4:   Yes  No       Bad CompPrice    23  5.75 125.52174  2887  5.78  86 157 21.50 17.367340
##  5:    No Yes       Bad CompPrice    11  2.75 122.45455  1347  2.69  88 149  8.50 17.494934
##  6:   Yes  No    Medium CompPrice    55 13.75 126.00000  6930 13.86  77 159 20.00 16.611018
##  7:    No  No    Medium CompPrice    29  7.25 121.86207  3534  7.07  89 154 21.00 16.012772
##  8:    No Yes    Medium CompPrice    39  9.75 126.33333  4927  9.86  96 150 18.50 14.411862
##  9:   Yes  No      Good CompPrice    18  4.50 126.94444  2285  4.57  95 157 21.00 16.049454
## 10:    No Yes      Good CompPrice    22  5.50 122.36364  2692  5.39 107 147 16.25 11.986284
## 11:    No  No      Good CompPrice     6  1.50 126.50000   759  1.52  96 142 13.75 16.610238
## 12:    No  No       Bad CompPrice    11  2.75 118.36364  1302  2.60 106 131 15.00  9.080449
## 13:   Yes Yes       Bad    Income    51 12.75  76.31373  3892 14.17  28 119 40.50 26.657449
## 14:   Yes Yes      Good    Income    39  9.75  66.89744  2609  9.50  21 117 49.00 28.163802
## 15:   Yes Yes    Medium    Income    96 24.00  68.83333  6608 24.06  25 120 41.25 27.070538
## 16:   Yes  No       Bad    Income    23  5.75  67.52174  1553  5.65  25 114 34.00 25.414508
## 17:    No Yes       Bad    Income    11  2.75  72.72727   800  2.91  21 120 52.50 33.400871
## 18:   Yes  No    Medium    Income    55 13.75  66.14545  3638 13.25  22 118 42.50 26.857248
## 19:    No  No    Medium    Income    29  7.25  62.44828  1811  6.59  22 120 60.00 32.077569
## 20:    No Yes    Medium    Income    39  9.75  69.05128  2693  9.81  21 119 55.00 30.884291
## 21:   Yes  No      Good    Income    18  4.50  69.72222  1255  4.57  24 113 62.25 31.765326
## 22:    No Yes      Good    Income    22  5.50  72.31818  1591  5.79  22 115 35.00 27.425026
## 23:    No  No      Good    Income     6  1.50  53.83333   323  1.18  30  78 40.25 22.542552
## 24:    No  No       Bad    Income    11  2.75  62.72727   690  2.51  25 105 17.00 23.524842
##     Urban  US ShelveLoc Attribute Count  Prop      mean   sum    PS min max   IQR        sd

4.3. Variable summary report (More than One group variable) with filter

  • Base: Stores in Urban location only
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','median','IQR'), gpby = TRUE,filt="Urban=='Yes'")
##     Urban  US ShelveLoc Attribute       Filter Count  Prop      mean   sum    PS median   IQR
##  1:   Yes Yes       Bad CompPrice Urban=='Yes'    51 18.09 124.88235  6369 17.98  125.0 17.00
##  2:   Yes Yes      Good CompPrice Urban=='Yes'    39 13.83 127.00000  4953 13.98  128.0 22.00
##  3:   Yes Yes    Medium CompPrice Urban=='Yes'    96 34.04 125.05208 12005 33.88  125.0 20.00
##  4:   Yes  No       Bad CompPrice Urban=='Yes'    23  8.16 125.52174  2887  8.15  125.0 21.50
##  5:   Yes  No    Medium CompPrice Urban=='Yes'    55 19.50 126.00000  6930 19.56  127.0 20.00
##  6:   Yes  No      Good CompPrice Urban=='Yes'    18  6.38 126.94444  2285  6.45  122.0 21.00
##  7:   Yes Yes       Bad    Income Urban=='Yes'    51 18.09  76.31373  3892 19.90   81.0 40.50
##  8:   Yes Yes      Good    Income Urban=='Yes'    39 13.83  66.89744  2609 13.34   69.0 49.00
##  9:   Yes Yes    Medium    Income Urban=='Yes'    96 34.04  68.83333  6608 33.79   69.0 41.25
## 10:   Yes  No       Bad    Income Urban=='Yes'    23  8.16  67.52174  1553  7.94   67.0 34.00
## 11:   Yes  No    Medium    Income Urban=='Yes'    55 19.50  66.14545  3638 18.60   69.0 42.50
## 12:   Yes  No      Good    Income Urban=='Yes'    18  6.38  69.72222  1255  6.42   73.5 62.25
  • Base: Removing redundant values
  • Table: Descriptive summary for CompPrice, Income by group (Urban){PS: Percentages of shares}
options(width = 150)
data_sam = Carseats[,]
data_sam[sample(1:400,30),"Sales"] <- 888
data_sam[sample(1:400,20),"CompPrice"] <- 999
data_sam[sample(1:400,45),"Income"] <- 999
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
##     Urban  US ShelveLoc Attribute                   Filter Count  Prop       mean      sum    PS
##  1:   Yes Yes       Bad     Sales     Sales%ni% c(888,999)    49 13.24   5.474898   268.27  9.69
##  2:   Yes Yes      Good     Sales     Sales%ni% c(888,999)    37 10.00  10.901081   403.34 14.58
##  3:   Yes Yes    Medium     Sales     Sales%ni% c(888,999)    90 24.32   7.530889   677.78 24.49
##  4:    No Yes       Bad     Sales     Sales%ni% c(888,999)    10  2.70   5.654000    56.54  2.04
##  5:   Yes  No    Medium     Sales     Sales%ni% c(888,999)    50 13.51   6.956000   347.80 12.57
##  6:    No  No    Medium     Sales     Sales%ni% c(888,999)    27  7.30   6.430741   173.63  6.27
##  7:    No Yes    Medium     Sales     Sales%ni% c(888,999)    35  9.46   7.778000   272.23  9.84
##  8:   Yes  No      Good     Sales     Sales%ni% c(888,999)    16  4.32   9.601250   153.62  5.55
##  9:    No Yes      Good     Sales     Sales%ni% c(888,999)    20  5.41  10.381000   207.62  7.50
## 10:    No  No      Good     Sales     Sales%ni% c(888,999)     5  1.35   7.782000    38.91  1.41
## 11:    No  No       Bad     Sales     Sales%ni% c(888,999)    10  2.70   5.302000    53.02  1.92
## 12:   Yes  No       Bad     Sales     Sales%ni% c(888,999)    21  5.68   5.446190   114.37  4.13
## 13:   Yes Yes       Bad CompPrice CompPrice%ni% c(888,999)    50 13.16 124.780000  6239.00 13.10
## 14:   Yes Yes      Good CompPrice CompPrice%ni% c(888,999)    38 10.00 126.710526  4815.00 10.11
## 15:   Yes Yes    Medium CompPrice CompPrice%ni% c(888,999)    94 24.74 125.180851 11767.00 24.72
## 16:   Yes  No       Bad CompPrice CompPrice%ni% c(888,999)    21  5.53 125.190476  2629.00  5.52
## 17:    No Yes       Bad CompPrice CompPrice%ni% c(888,999)    11  2.89 122.454545  1347.00  2.83
## 18:   Yes  No    Medium CompPrice CompPrice%ni% c(888,999)    52 13.68 126.365385  6571.00 13.80
## 19:    No  No    Medium CompPrice CompPrice%ni% c(888,999)    25  6.58 123.640000  3091.00  6.49
## 20:    No Yes    Medium CompPrice CompPrice%ni% c(888,999)    36  9.47 127.055556  4574.00  9.61
## 21:   Yes  No      Good CompPrice CompPrice%ni% c(888,999)    17  4.47 126.764706  2155.00  4.53
## 22:    No Yes      Good CompPrice CompPrice%ni% c(888,999)    21  5.53 122.714286  2577.00  5.41
## 23:    No  No      Good CompPrice CompPrice%ni% c(888,999)     6  1.58 126.500000   759.00  1.59
## 24:    No  No       Bad CompPrice CompPrice%ni% c(888,999)     9  2.37 120.555556  1085.00  2.28
## 25:   Yes Yes       Bad    Income    Income%ni% c(888,999)    47 13.24  75.382979  3543.00 14.69
## 26:   Yes Yes      Good    Income    Income%ni% c(888,999)    32  9.01  66.437500  2126.00  8.81
## 27:   Yes Yes    Medium    Income    Income%ni% c(888,999)    87 24.51  67.310345  5856.00 24.28
## 28:    No Yes       Bad    Income    Income%ni% c(888,999)    10  2.82  72.700000   727.00  3.01
## 29:    No  No    Medium    Income    Income%ni% c(888,999)    26  7.32  62.538462  1626.00  6.74
## 30:    No Yes    Medium    Income    Income%ni% c(888,999)    37 10.42  69.621622  2576.00 10.68
## 31:   Yes  No    Medium    Income    Income%ni% c(888,999)    49 13.80  65.795918  3224.00 13.37
## 32:    No Yes      Good    Income    Income%ni% c(888,999)    19  5.35  72.263158  1373.00  5.69
## 33:   Yes  No      Good    Income    Income%ni% c(888,999)    15  4.23  68.666667  1030.00  4.27
## 34:    No  No      Good    Income    Income%ni% c(888,999)     4  1.13  55.750000   223.00  0.92
## 35:    No  No       Bad    Income    Income%ni% c(888,999)    10  2.82  59.200000   592.00  2.45
## 36:   Yes  No       Bad    Income    Income%ni% c(888,999)    19  5.35  64.368421  1223.00  5.07
##     Urban  US ShelveLoc Attribute                   Filter Count  Prop       mean      sum    PS

Different base for each numeric variable.

"Population" - Consider only Good ShelveLoc (the quality of the shelving location for the car seats at each site) ShelveLoc=='Good'

"Sales" - Inculde only those store belongs to Urban location (Urban==Yes)

"CompPrice" - Exclude Price is greater than 150

ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','IQR'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
##     Urban  US  Attribute            Filter Count  Prop       mean      sum          var      IQR
##  1:   Yes Yes Population ShelveLoc=='Good'    39 45.88 277.538462 10824.00 18259.676113 176.0000
##  2:   Yes  No Population ShelveLoc=='Good'    18 21.18 219.888889  3958.00 13625.633987 155.5000
##  3:    No Yes Population ShelveLoc=='Good'    22 25.88 283.318182  6233.00 15122.512987 205.5000
##  4:    No  No Population ShelveLoc=='Good'     6  7.06 280.666667  1684.00 13183.066667 125.2500
##  5:   Yes Yes      Sales      Urban=='Yes'   186 65.96   7.710968  1434.24     8.734665   4.0775
##  6:   Yes  No      Sales      Urban=='Yes'    96 34.04   6.997813   671.79     6.445127   3.4175
##  7:   Yes Yes  CompPrice        Price>=150    16 50.00 142.187500  2275.00   223.362500  17.5000
##  8:    No Yes  CompPrice        Price>=150     7 21.88 138.428571   969.00   180.952381  16.0000
##  9:   Yes  No  CompPrice        Price>=150     7 21.88 143.285714  1003.00   156.571429  14.5000
## 10:    No  No  CompPrice        Price>=150     2  6.25 146.500000   293.00   112.500000   7.5000

5. Resahpe data

Reshapes a grouped data

options(width = 150)
ExpCustomStat(Carseats,Cvar = c("Urban"), Nvar=c("Population","Sales"), stat = c('Count','Prop'),gpby=TRUE,dcast=TRUE)
## Row value   : Attribute
## Column value: Urban
## Statistics  : Count + Prop
##     Attribute Count_No Count_Yes Prop_No Prop_Yes
## 1: Population      118       282    29.5     70.5
## 2:      Sales      118       282    29.5     70.5
Example scripts
##Frequency table for categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=FALSE)

##Crosstabulation between categorical variables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt=NULL)
ExpCustomStat(Carseats,Cvar=c("US","Urban","ShelveLoc"),gpby=TRUE,filt=NULL)

##Adding filters for custom tables
ExpCustomStat(Carseats,Cvar=c("US","Urban"),gpby=TRUE,filt="Population>150")
ExpCustomStat(Carseats,Cvar=c("US","ShelveLoc"),gpby=TRUE,filt="Urban=='Yes' & Population>150")

## Numeric variable summary
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var','min','max'))
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('min','p0.25','median','p0.75','max'))

## Adding filters for complete data (like base Subset)
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','var'),filt="Urban=='Yes'")
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum'),filt="Urban=='Yes' & Population>150")

## Filter unique value from all the numeric variables
ExpCustomStat(data_sam,Nvar=c("Population","Sales","CompPrice","Income"),stat = c('Count','mean','sum','min'),filt="All %ni% c(999,-9)")

## Adding filters at variable level
ExpCustomStat(Carseats,Nvar=c("Population","Sales","CompPrice","Education","Income"),stat = c('Count','mean','sum','var','sd','IQR','median'),filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150^ ^US=='Yes'"))

##Numerical summaries by category
##Variable summary report (One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","ShelveLoc"), Nvar=c("Population","Sales"), stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'),gpby=FALSE)

##Variable summary report (More than One group variable)
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','min','max','IQR','sd'), gpby = TRUE)

##Variable summary report (More than One group variable) with filter
ExpCustomStat(Carseats,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS','P0.25','median','p0.75'), gpby = TRUE,filt="Urban=='Yes'")
ExpCustomStat(data_sam,Cvar = c("Urban","US","ShelveLoc"), Nvar=c("Sales","CompPrice","Income"), stat = c('Count','Prop','mean','sum','PS'), gpby = TRUE,filt="All %ni% c(888,999)")
ExpCustomStat(Carseats,Cvar = c("Urban","US"), Nvar=c("Population","Sales","CompPrice"), stat = c('Count','Prop','mean','sum','var','min','max'), filt=c("ShelveLoc=='Good'^Urban=='Yes'^Price>=150"))
References