Table of descriptive statistics. Output returns matrix object containing descriptive information on all input variables for each level or combination of levels in categorical/group variable. Also while running the analysis user can filter out the data by individual variable level or across data level.
ExpCustomStat( data, Cvar = NULL, Nvar = NULL, stat = NULL, gpby = TRUE, filt = NULL, dcast = FALSE, value = NULL )
data | dataframe or Matrix |
---|---|
Cvar | qualitative variables on which to stratify / subgroup or run categorical summaries |
Nvar | quantitative variables on which to run summary statistics for. |
stat | descriptive statistics. Sepecify which summary statistics required (Included all base stat functions like 'mean','medain','max','min','sum','IQR','sd','var',quantile like P0.1, P0.2 etc'). Also added two more stat here are 'PS' is percentage of shares and 'Prop' is column percentage |
gpby | default value is True. Group level summary will be created based on list of categorical variable. If summary required at each categorical variable level then keep this option as FALSE |
filt | filter out data while running the summary statistics. Filter can apply accross data or individual variable level using filt option. If there are multiple filters, seperate the conditons by using '^'. Ex: Nvar = c("X1","X2","X3","X4"), let say we need to exclude data X1>900 for X1 variable, X2==10 for X2 variable, Gender !='Male' for X3 variable and all data for X4 then filt should be, filt = c("X1>900"^"X2==10"^"Gender!='Male'"^all) or c("X1>900"^"X2==10"^"Gender!='Male'"^ ^). in case if you want to keep all data for some of the variable listed in Nvar, then specify inside the filt like ^all^ or ^ ^(single space) |
dcast | fast dcast from data.table |
value | If dcast is TRUE, pass the variable name which needs to come on column |
summary statistics as dataframe. Usage of this function is detailed in user guide vignettes document.
Filter unique value from all the numeric variables
Case1: Excluding unique values or outliers values like '999' or '9999' or '888' etc from each selected variables.
Eg:dat = data.frame(x = c(23,24,34,999,12,12,23,999,45), y = c(1,3,4,999,0,999,0,8,999,0)
Exclude 999:
x = c(23,24,34,12,12,23,45)
y = c(1,3,4,0,0,8,0)
Case2: Summarise the data with selected descriptive statistics like 'mean' and 'median' or 'sum' and 'variance' etc..
Case3: Aggregate the data with different statistics using group by statement
Case4: Reshape the summary statistics.. etc
The complete functionality of `ExpCustomStat` function is detailed in vignette help page with example code.
## Selected summary statistics 'Count,sum, percentage of shares' for ## disp and mpg variables by vs, am and gear ExpCustomStat(mtcars, Cvar=c("vs","am","gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","PS"), gpby = TRUE, filt = NULL)#> vs am gear Attribute Count sum PS #> 1: 0 1 4 disp 2 320.0 4.33 #> 2: 1 1 4 disp 6 533.5 7.23 #> 3: 1 0 3 disp 3 603.1 8.17 #> 4: 0 0 3 disp 12 4291.4 58.12 #> 5: 1 0 4 disp 4 622.7 8.43 #> 6: 0 1 5 disp 4 917.3 12.42 #> 7: 1 1 5 disp 1 95.1 1.29 #> 8: 0 1 4 mpg 2 42.0 6.53 #> 9: 1 1 4 mpg 6 168.2 26.16 #> 10: 1 0 3 mpg 3 61.0 9.49 #> 11: 0 0 3 mpg 12 180.6 28.09 #> 12: 1 0 4 mpg 4 84.2 13.10 #> 13: 0 1 5 mpg 4 76.5 11.90 #> 14: 1 1 5 mpg 1 30.4 4.73ExpCustomStat(mtcars, Cvar=c("gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","var"), gpby = TRUE, filt = "am==1")#> gear Attribute Filter Count sum var #> 1: 4 disp am==1 8 853.5 1381.08696 #> 2: 5 disp am==1 5 1012.4 13338.08700 #> 3: 4 mpg am==1 8 210.2 29.31643 #> 4: 5 mpg am==1 5 106.9 44.34200ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp","mpg"), stat = c("Count","sum","mean","median"), gpby = TRUE, filt = "am==1")#> gear Attribute Filter Count sum mean median #> 1: 4 disp am==1 8 853.5 106.6875 93.50 #> 2: 5 disp am==1 5 1012.4 202.4800 145.00 #> 3: 4 mpg am==1 8 210.2 26.2750 25.05 #> 4: 5 mpg am==1 5 106.9 21.3800 19.70## Selected summary statistics 'Count and fivenum stat for disp and mpg ## variables by gear ExpCustomStat(mtcars, Cvar = c("gear"), Nvar = c("disp", "mpg"), stat = c("Count",'min','p0.25','median','p0.75','max'), gpby = TRUE)#> gear Attribute Count min p0.25 median p0.75 max #> 1: 4 disp 12 71.1 78.925 130.9 160.000 167.6 #> 2: 3 disp 15 120.1 275.800 318.0 380.000 472.0 #> 3: 5 disp 5 95.1 120.300 145.0 301.000 351.0 #> 4: 4 mpg 12 17.8 21.000 22.8 28.075 33.9 #> 5: 3 mpg 15 10.4 14.500 15.5 18.400 21.5 #> 6: 5 mpg 5 15.0 15.800 19.7 26.000 30.4