Awesome Data Science Resources
Statistics
- Multicollinearity check in R
- Data transformations
- Outlier
- Missing value
- Skewness vs Kurtosis
- Sampling for imbalanced sample
- Degrees of freedom
- T - test
- ANOVA
- AUC
- Levenes Test
- Introduction to Principal Component Analysis
- Mahalanobis distance
- 11 dimensionality reduction techniques
Data Science Interivew Tips
- Want to Become a Data Scientist
- Difference of Data Science, Machine Learning and Data Mining
- Data Science, Machine Learning, BI Explained in a Amazing Paragraphs
- What is Predictive Analytics
Feature Engineering
Statistical Test Interpretations
Deep Learning
- Getting Started Tensorflow
- How to implement deep learning in r using keras and tensorflow
- Deep Learning Research
- Trending DS deep learning methods
R, Python and AutomML packages
- Data science ecosystem: R vs Python vs Substitutes
- R for ML
- PYCaret
- automl Frameworks
- Task scheduling with python
- R Studio Blog: r vs python
- Quora Blog: r vs python
- Why r for data science and not python
- XanderHorn autoML
- How can r users learn python for data science
- r numpy
- pandas vs datatable
- Equivalents in r python and perl
- Exploratory data analysis with r
- Exploratory data analysis eda
- Python standard env
- Python Read Write tables
- psycopg2 tutorial
Data Science Plots
- ggplot2 package
- How to add a background image to ggplot2 graphs
- 7 visualizations you should learn in r
- Analyzing the 8 best visualization techniques
- Data Visualization Techniques
- Plotly
- Matplotlib
- Matplotlib org
- Shiny vs dash a side by side comparison
R Shiny
ML Classifiers
Decision Tree
Random Forest
- machine learning random forest from scratch with python
- Random forests from scratch
- An implementation and explanation of the random forest in python
- Understanding random forest and hyper parameter tuning
- Bootstrapping and oob samples in random forests
- Hyperparameter tuning the random forest
- Random forests h2o
GBM
XGBoost
- XGBoost read the docs
- XGB github
- Introduction to XGBoost
- Ensemble r machine learning
- Ensemble learning
- XGBoost tuning of regularization
- XGBoost algorithm
- Fraud Detection
LightGBM
- LGBM - Laurae
- Lightgbm Parameters Guide
- Talkingdata adtracking fraud detection
- L1 and L2 Regularisation
- Microsoft malware prediction
- titanic voting pipeline stack and guide
GAM model
Text Mining
- Text Classification
- Sentiment analysis countvectorizer TF IDF
- Sentiment analysis TF IDF
- Transformers bert roberta
- Roberta fastai huggingface transformers
- Transformer with LSTM
- How to preprocessing for glove part1 eda
- How to preprocessing for glove part2 usage
- how to preprocessing when using embeddings
Cohort Analysis
- What is cohort analysis and how should i use
- A beginners guide to cohort analysis
- Cohort and multi touch attribution
- What can you do with a cohort analysis
- How to use cohort data to analyze user behavior
- Benefits of performing a cohort analysis
- RFM segmentation
- Behavioral Cohorts
- Cohorts git
Time Series
- Uber Orbit python library
- Facebook releases prophet its free forecasting tools for python and r
- Weather forecast with regression models
- ARIMA model statsmodels python
- Time series analysis
- sklearn model selection TimeSeriesSplit
- Ensemble of trees for forecasting time series
- TSrepr time series representations
- Time series analysis using ARIMA model in r
- ARIMA model time series forecasting python
- Time series model of forecasting future power demand
- Timeseries classification
- Statsmodels tsa ARIMA
- Prediction task with Multivariate Time Series and VAR model
- NEURAL NETWORKS for algorithmic trading
- Awesome deep trading
- PAA
- Web traffic time series forecast
Model Interpreations
Clustering
- Clustering a tutorial for cluster analysis with r
- The 5 clustering algorithms data scientists need to know
Customer Churn Predictions
- Why churn
- Customer churn prediction for subscription businesses using machine learning
- Customer churn Logistic Regression with R
- Churn classification
- Survival prediction using cost sensitive learning
- 6 factors to consider before building a predictive model for life insurance
- Predicting customer churn
- Project - detecting early alzheimer
- What drives b2b customer attrition
- Define customer churn b2b
- Churn Rate
- Identify churn at risk customers
- Customer retention metrics
- New customers vs return customers
- Customer purchase behaviour analytics
Deep Learning Method
Web Scraping
- Web Scraping Machine Learning using python
- Web Scraping Hackmageddon
- Web Scraping Data Preprocessing machine learning model
- Recommender systems in python
- Content based filtering
Data Science Cheat Sheets
ML and heuristic data labeling
Further Reading
- Arun Jagota - Published in Towards Data Science
- List of awesome ML Learning
- BERT
- PDF table extractions
- SQL
R package Development
Useful web links
-
[quantide - Chapter 6 Creating R packages](http://www.quantide.com/ramarro-chapter-06/) - analyticsvidhya - How I created a package in R & published it on CRAN / GitHub
- hvitfeldt - usethis-workflow-for-package-development
- kbroman - Writing vignettes
- r-pkgs - Releasing a package
- r-bio - An introduction to Git and how to use it with RStudio
- stackoverflow - R CMD check –as-cran warning
- mjdenny - R Package Development Pictorial
Code snippet to submit library to CRAN
Build package
devtools::document(roclets=c('rd', 'collate', 'namespace', 'vignette'))
devtools::build()
devtools::use_news_md()
devtools::use_code_of_conduct()
devtools::use_cran_badge()
devtools::use_cran_comments()
Upload it to CRAN
devtools::submit_cran()
Create websites for you and your projects using gitpage
Developing github web page using below package
pkgdown::build_site().
Write R markdown
- basic writing and formatting syntax
- pdf document format
- authoring basics
- kableExtra
- knitr-markdown
- rmarkdown.rstudio.com
Create your own R Package Logo
library(hexSticker)
library(UCSCXenaTools)
library(extrafont)
font_import()
loadfonts(device = "win")
library(showtext)
## Loading Google fonts (http://www.google.com/fonts)
font_add_google("Archivo Black", "arb")
imgurl <- image_read("man/figures/dml_icon3.png")
## Design Stickers
sticker(imgurl,
s_width = 1,
s_height = 1.2,
package="MyRpackage", p_size=20, s_x=1, s_y=.75,
url = "https://CRAN.R-project.org/package=MyRpackage",
u_color = "white", u_size = 3,
h_fill="dodgerblue4", h_color='blue',
p_family ="arb",
filename="man/figures/dml_logo.png")
Get the Number of Downloads of your R package from CRAN
For examples to check the number of downloads of ‘data.table’ R package
- Month - https://cranlogs.r-pkg.org/badges/data.table
- Day - https://cranlogs.r-pkg.org/badges/last-day/data.table
- Total - https://cranlogs.r-pkg.org/badges/grand-total/data.table