Table of descriptive statistics. Output returns matrix object containing descriptive information on all input variables for each level or combination of levels in categorical/group variable. Also while running the analysis user can filter out the data by individual variable level or across data level.

ExpCustomStat(
  data,
  Cvar = NULL,
  Nvar = NULL,
  stat = NULL,
  gpby = TRUE,
  filt = NULL,
  dcast = FALSE,
  value = NULL
)

Arguments

data

dataframe or Matrix

Cvar

qualitative variables on which to stratify / subgroup or run categorical summaries

Nvar

quantitative variables on which to run summary statistics for.

stat

descriptive statistics. Sepecify which summary statistics required (Included all base stat functions like 'mean','medain','max','min','sum','IQR','sd','var',quantile like P0.1, P0.2 etc'). Also added two more stat here are 'PS' is percentage of shares and 'Prop' is column percentage

gpby

default value is True. Group level summary will be created based on list of categorical variable. If summary required at each categorical variable level then keep this option as FALSE

filt

filter out data while running the summary statistics. Filter can apply accross data or individual variable level using filt option. If there are multiple filters, seperate the conditons by using '^'. Ex: Nvar = c("X1","X2","X3","X4"), let say we need to exclude data X1>900 for X1 variable, X2==10 for X2 variable, Gender !='Male' for X3 variable and all data for X4 then filt should be, filt = c("X1>900"^"X2==10"^"Gender!='Male'"^all) or c("X1>900"^"X2==10"^"Gender!='Male'"^ ^). in case if you want to keep all data for some of the variable listed in Nvar, then specify inside the filt like ^all^ or ^ ^(single space)

dcast

fast dcast from data.table

value

If dcast is TRUE, pass the variable name which needs to come on column

Value

summary statistics as dataframe. Usage of this function is detailed in user guide vignettes document.

Details

Filter unique value from all the numeric variables

Case1: Excluding unique values or outliers values like '999' or '9999' or '888' etc from each selected variables.

Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)

Exclude 999:

x = c(23,24,34,12,12,23,45)

y = c(1,3,4,0,0,8,0)

Case2: Summarise the data with selected descriptive statistics like 'mean' and 'median' or 'sum' and 'variance' etc..

Case3: Aggregate the data with different statistics using group by statement

Case4: Reshape the summary statistics.. etc

The complete functionality of `ExpCustomStat` function is detailed in vignette help page with example code.

Examples

## Selected summary statistics 'Count,sum, percentage of shares' for ## disp and mpg variables by vs, am and gear ExpCustomStat(mtcars, Cvar=c("vs","am","gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","PS"), gpby = TRUE, filt = NULL)
#> vs am gear Attribute Count sum PS #> 1: 0 1 4 disp 2 320.0 4.33 #> 2: 1 1 4 disp 6 533.5 7.23 #> 3: 1 0 3 disp 3 603.1 8.17 #> 4: 0 0 3 disp 12 4291.4 58.12 #> 5: 1 0 4 disp 4 622.7 8.43 #> 6: 0 1 5 disp 4 917.3 12.42 #> 7: 1 1 5 disp 1 95.1 1.29 #> 8: 0 1 4 mpg 2 42.0 6.53 #> 9: 1 1 4 mpg 6 168.2 26.16 #> 10: 1 0 3 mpg 3 61.0 9.49 #> 11: 0 0 3 mpg 12 180.6 28.09 #> 12: 1 0 4 mpg 4 84.2 13.10 #> 13: 0 1 5 mpg 4 76.5 11.90 #> 14: 1 1 5 mpg 1 30.4 4.73
ExpCustomStat(mtcars, Cvar=c("gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","var"), gpby = TRUE, filt = "am==1")
#> gear Attribute Filter Count sum var #> 1: 4 disp am==1 8 853.5 1381.08696 #> 2: 5 disp am==1 5 1012.4 13338.08700 #> 3: 4 mpg am==1 8 210.2 29.31643 #> 4: 5 mpg am==1 5 106.9 44.34200
ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","mean","median"), gpby = TRUE, filt = "am==1")
#> gear Attribute Filter Count sum mean median #> 1: 4 disp am==1 8 853.5 106.6875 93.50 #> 2: 5 disp am==1 5 1012.4 202.4800 145.00 #> 3: 4 mpg am==1 8 210.2 26.2750 25.05 #> 4: 5 mpg am==1 5 106.9 21.3800 19.70
## Selected summary statistics 'Count and fivenum stat for disp and mpg ## variables by gear ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp", "mpg"), stat = c("Count",'min','p0.25','median','p0.75','max'), gpby = TRUE)
#> gear Attribute Count min p0.25 median p0.75 max #> 1: 4 disp 12 71.1 78.925 130.9 160.000 167.6 #> 2: 3 disp 15 120.1 275.800 318.0 380.000 472.0 #> 3: 5 disp 5 95.1 120.300 145.0 301.000 351.0 #> 4: 4 mpg 12 17.8 21.000 22.8 28.075 33.9 #> 5: 3 mpg 15 10.4 14.500 15.5 18.400 21.5 #> 6: 5 mpg 5 15.0 15.800 19.7 26.000 30.4