# Links

## Contents

- 1 Links
- 1.1 Topological Data Analysis and Machine Learning reinforce each other. Good read on how to visualize!
- 1.2 This is how to do large scale data science!
- 1.3 [must read]. Fantastically clear post on vector models and k-nn.
- 1.4 [Python] Great tutorial on word2vec and doc2vec
- 1.5 Great intro on Random Forests in Python and R
- 1.6 Continue reading this after you read the post on calculus on computational graphs.
- 1.7 Nicely explained: model based clustering, with examples in R.
- 1.8 [must read] Calculus on computational graphs.
- 1.9 [R] Simple tutorial on building word clouds
- 1.10 If you always wanted to understand LSTM's, this is your chance!
- 1.11 Light intro in some concepts of optimization.
- 1.12 [A good read] Controlling for confounding variables.
- 1.13 Nice intro to multilevel models - linear mixed models - random effect models in R.
- 1.14 Fairly tech paper, but the technique is new and fascinating! This could well be the next generation of models.
- 1.15 Good tips on advancing in data science.
- 1.16 Read this extremely clearly written article on Generalized Additive Models (gams) + how to do it in R.
- 1.17 Very nice overview of the basic data mining algorithms with R and Python code.
- 1.18 [mindblowing piece of video] how the brain does backpropagation
- 1.19 Cool article about Bayesian optimization of hyper parameters with Gaussian processes.
- 1.20 Interesting take on the maturity of the different categories of artificial intelligence.
- 1.21 Just for fun] 5 cool and unusual datasets to play around with
- 1.22 Great post on matrix factorization and the relation between k-means and PCA.
- 1.23 Survival analysis tutorial in R.
- 1.24 Simple tutorial on deeplearning with the Keras framework
- 1.25 Nice overview of Watson trade-off analytics.
- 1.26 Great read on recommendation systems and the technicalities behind the Netflix challenge.
- 1.27 Great article on Generalized Additive Models
- 1.28 15 Questions about plots in R.
- 1.29 Good post on preventing model leakage, illustrated by a cross validation example in Python.
- 1.30 [What a jewel!] If you want to get into deep learning, read this extremely accessible book.
- 1.31 Great article on word embeddings
- 1.32 Nice and well formulated tutorial on the R functions Apply, Mapply and Sapply.
- 1.33 Nice data science competition model write up
- 1.34 Great read on detecting fraud in online games
- 1.35 Just because you can: R and the location of letters in words.
- 1.36 Some good insights in feature creation using machine learning models.
- 1.37 Humor that only data scientists make smile.
- 1.38 Free data science trainings on the web.
- 1.39 [R code] Simple example: intro to gradient descent by deriving it for a linear model.
- 1.40 Well explained and useful intro introduction to Graph databases with an application of building a recommendation algorithm.
- 1.41 Stop hiring data scientists until ready!
- 1.42 A great set of data science tutorials on Git (including an explanation of Git hub)
- 1.43 Nice illustration of decision boundaries for various machine learning models.
- 1.44 The ultimate data science cheatsheet collection
- 1.45 Very useful tutorial on how to use Git with R.
- 1.46 Readable article on deep neural networks for vision and the recent ability for these networks to 'dream'.
- 1.47 Nice small writeup on an R model for a Kaggle competition.
- 1.48 Very useful R viz cheatsheet
- 1.49 [Technical] Tough read on uncertainty in deep learning models, but well worth it.
- 1.50 First steps: getting started with SparkR.
- 1.51 Practical guide to visualize high dimensional data
- 1.52 Play with this tool to show how deep neural nets 'dream'
- 1.53 Fantastic practical insight in modelbuilding
- 1.54 Great article about the difference between machine learning and statistical modeling.
- 1.55 Important article that discusses how to visualize what a deep neural network learns.
- 1.56 Nice Kaggle coding walkthough.
- 1.57 Fantastic to see how neural networks are equipped with episodic memory to give power of reasoning.
- 1.58 Understanding boosting, with nice vizualizations
- 1.59 Great way to explain complexity of an algorithm.
- 1.60 Informative read on a model building journey.
- 1.61 Quick walkthrough of machine learning models and deep learning.
- 1.62 [Nerd] Just because it's funny
- 1.63 Nice showcase of modeling on Spark: Word2Vec & Gradient Boosting Machines.
- 1.64 Insightful post on modeling human behavior.
- 1.65 This is how Facebook knows who you are, even without seeing your face.
- 1.66 Inspiring and insightful interview with Top Kaggler
- 1.67 Large scale flash memory failures: a good read into analytics at work to understand life cycle of hardware components. What is missing, is the forward looking part: can you see how to include that?
- 1.68 If you have nothing better to do today, analyze this terabyte dataset with your favorite Click Through Rate models.
- 1.69 Read up on ROC and AUC with an application of predicting the number of deaths from the Titanic. (soo useful).
- 1.70 Interesting read on hyper-parameter optimization.
- 1.71 The never ending possibilities of the neural network: teaching the computer to have conversations.
- 1.72 Simple (technical) read on Random forest
- 1.73 Brilliant article on R in the IBM Cloud.
- 1.74 Cool automation in R
- 1.75 This is how machine translation works. Cool stuff!
- 1.76 [YouTube+code] Neural network evolves to play Super Mario World.
- 1.77 Fascinating result! Machine learning method beats humans in verbal comprehension questions IQ test. Technical paper.
- 1.78 Mapping example in R, good example code, with an application to crime analytics.
- 1.79 A must read on model mixing! Well written, lots of examples, and not available in such collection and overview in literature.
- 1.80 Basic R: getting familiar with data frames. An easy to follow and well illustrated tutorial.
- 1.81 Interesting, non-technical, read on a recommendation system for an online retailer.
- 1.82 How to become a data scientist: a nice guide with lots of detail.
- 1.83 This makes me smile: logistic regression to find out the value of chess pieces.
- 1.84 Clearly written article on A/B testing and proving your analytical model by setting up an experiment.
- 1.85 I love the thinking! This is what we need to do more in data science.
- 1.86 Insightful paper on characteristics of fraud that are detectable in data by using analytics.
- 1.87 Good hints: speeding up your R code
- 1.88 Tuning the parameters of your Random Forest model
- 1.89 Simple introduction to text mining: bag of words and term frequency / inverse document frequency (TF-IDF)
- 1.90 A useful pointer: lessons learned in high-performance R.
- 1.91 Awesome paper from Google about the prediction of energy efficiency in their data centers. Well written, includes some examples how the predictions can be used to make the datacenter more efficient.
- 1.92 Long, but worth the read: Hofstadter, the author of Godel, Escher and Bach on intelligence, AI and machine learning.
- 1.93 Simple intro read into the top 10 data mining algorithms. The real trick is to start using them :)
- 1.94 Excellent series on the working of various machine learning models by understanding their decision boundery, shown in simple R code.
- 1.95 AirBnb rocks! Check out the nerd section on their home-grown modeling tool (and check out the article on handling missing data in Random Forests).
- 1.96 Simple Random Forest explanation and coding example.
- 1.97 An overlooked area in Machine Learning: prediction intervals
- 1.98 Great read to expand your intuition on high dimensional spaces
- 1.99 Insightful tips to improve your model
- 1.100 Nice overview of the different data scientist skills.
- 1.101 How to keep your data scientists: I like the first point (the other points make sense too)
- 1.102 Great article on how unstructured data became important to data science
- 1.103 Thought provoking examples on how different datasets give rise to the same regression equations
- 1.104 Good post on interpreting categorical regression coefficients
- 1.105 Interesting post on A/B testing and the need for statistical sound criteria
- 1.106 Questions from Data Science interviews
- 1.107 A simple post on model evaluation. Specially the picture is useful to explain this process to the business
- 1.108 A great post on handling Twitter responses using R
- 1.109 A good showcase for meta modeling; 3 three layer model is build from many model ensembles
- 1.110 How to filter out relevant predictors for your model
- 1.111 Practical article on K-means clustering

# Links

Topological Data Analysis and Machine Learning reinforce each other. Good read on how to visualize!

http://www.ayasdi.com/blog/bigdata/how-tda-and-machine-learning-enhance-each-other/

### This is how to do large scale data science!

http://www.unofficialgoogledatascience.com/2015/09/causal-attribution-in-era-of-big-time.html?m=1

### [must read]. Fantastically clear post on vector models and k-nn.

http://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1/

### [Python] Great tutorial on word2vec and doc2vec

http://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis

### Great intro on Random Forests in Python and R

http://www.analyticsvidhya.com/blog/2015/09/random-forest-algorithm-multiple-challenges/

### Continue reading this after you read the post on calculus on computational graphs.

http://outlace.com/Computational-Graph/

### Nicely explained: model based clustering, with examples in R.

http://exploringdatablog.blogspot.com/2011/08/fitting-mixture-distributions-with-r.html?m=1

### [must read] Calculus on computational graphs.

http://colah.github.io/posts/2015-08-Backprop/

### [R] Simple tutorial on building word clouds

http://datascienceplus.com/building-wordclouds-in-r/

### If you always wanted to understand LSTM's, this is your chance!

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

### Light intro in some concepts of optimization.

http://horicky.blogspot.sg/2015/08/common-techniques-in-optimization.html

### [A good read] Controlling for confounding variables.

http://janhove.github.io/design/2015/08/24/caveats-confounds-correlational-designs/

### Nice intro to multilevel models - linear mixed models - random effect models in R.

http://datascienceplus.com/analysing-longitudinal-data-multilevel-growth-models-i/

### Fairly tech paper, but the technique is new and fascinating! This could well be the next generation of models.

http://arxiv.org/abs/1503.03585

### Good tips on advancing in data science.

### Read this extremely clearly written article on Generalized Additive Models (gams) + how to do it in R.

http://multithreaded.stitchfix.com/blog/2015/07/30/gam/

### Very nice overview of the basic data mining algorithms with R and Python code.

http://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/

### [mindblowing piece of video] how the brain does backpropagation

https://youtu.be/kxp7eWZa-2M?t=38m13s

### Cool article about Bayesian optimization of hyper parameters with Gaussian processes.

http://betatim.github.io/posts/bayesian-hyperparameter-search/

### Interesting take on the maturity of the different categories of artificial intelligence.

http://insights.venturescanner.com/2015/08/06/which-artificial-intelligence-category-is-most-mature/

### Just for fun] 5 cool and unusual datasets to play around with

http://www.bytesandstitches.com/blog/5-weird-and-wonderful-data-sets-you-can-use/

### Great post on matrix factorization and the relation between k-means and PCA.

http://joelcadwell.blogspot.com/2015/08/matrix-factorization-comes-in-many.html?m=1

### Survival analysis tutorial in R.

http://mathminers.com/index.php/2015/08/06/survival-analysis-in-r-step-by-step-guide/

### Simple tutorial on deeplearning with the Keras framework

http://smerity.com/articles/2015/keras_qa.html

### Nice overview of Watson trade-off analytics.

### Great read on recommendation systems and the technicalities behind the Netflix challenge.

http://www.mmds.org/mmds/v2.1/ch09-recsys2.pdf

### Great article on Generalized Additive Models

http://multithreaded.stitchfix.com/blog/2015/07/30/gam/

### 15 Questions about plots in R.

http://blog.datacamp.com/15-questions-about-r-plots/

### Good post on preventing model leakage, illustrated by a cross validation example in Python.

http://www.alfredo.motta.name/cross-validation-done-wrong/

### [What a jewel!] If you want to get into deep learning, read this extremely accessible book.

http://neuralnetworksanddeeplearning.com

### Great article on word embeddings

http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

### Nice and well formulated tutorial on the R functions Apply, Mapply and Sapply.

http://blog.datacamp.com/r-tutorial-apply-family/

### Nice data science competition model write up

https://medium.com/@nickrgmills/what-i-learned-from-my-first-data-science-competition-a00cadcba52a

### Great read on detecting fraud in online games

https://www.snellman.net/blog/archive/2015-07-22-cheater-detection-in-async-online-game/

### Just because you can: R and the location of letters in words.

http://www.56n.dk/where-do-letters-occur-in-words/

### Some good insights in feature creation using machine learning models.

http://blog.kaggle.com/2015/07/24/taxi-trip-time-winners-interview-3rd-place-bluetaxi/

### Humor that only data scientists make smile.

http://www.oneweirdkerneltrick.com

### Free data science trainings on the web.

### [R code] Simple example: intro to gradient descent by deriving it for a linear model.

http://alexhwoods.com/2015/07/19/guide-to-linear-regression/

### Well explained and useful intro introduction to Graph databases with an application of building a recommendation algorithm.

### Stop hiring data scientists until ready!

http://www.kdnuggets.com/2015/07/stop-hiring-data-scientists-until-ready.html#.VakaYm7e5uM.linkedin

### A great set of data science tutorials on Git (including an explanation of Git hub)

http://www.analyticsvidhya.com/blog/2015/07/github-special-data-scientists-to-follow-best-tutorials/

### Nice illustration of decision boundaries for various machine learning models.

http://freakonometrics.hypotheses.org/20002

### The ultimate data science cheatsheet collection

http://www.kdnuggets.com/2015/07/good-data-science-machine-learning-cheat-sheets.html

### Very useful tutorial on how to use Git with R.

http://www.r-bloggers.com/rstudio-and-github/

### Readable article on deep neural networks for vision and the recent ability for these networks to 'dream'.

http://engineer.abeja.asia/?p=173

### Nice small writeup on an R model for a Kaggle competition.

http://www.analyticsvidhya.com/blog/2015/07/top-10-kaggle-fb-recruiting-competition/

### Very useful R viz cheatsheet

### [Technical] Tough read on uncertainty in deep learning models, but well worth it.

http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

### First steps: getting started with SparkR.

http://pingax.com/sparkr-with-rstudio-ubuntu-12-04/

### Practical guide to visualize high dimensional data

http://blog.applied.ai/visualising-high-dimensional-data/

### Play with this tool to show how deep neural nets 'dream'

https://github.com/google/deepdream/blob/master/dream.ipynb

### Fantastic practical insight in modelbuilding

http://blog.kaggle.com/2015/05/07/profiling-top-kagglers-kazanovacurrently-2-in-the-world/

### Great article about the difference between machine learning and statistical modeling.

http://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/

### Important article that discusses how to visualize what a deep neural network learns.

http://arxiv.org/pdf/1506.06579v1.pdf

### Nice Kaggle coding walkthough.

http://www.analyticsvidhya.com/blog/2015/06/solution-kaggle-competition-bike-sharing-demand/

### Fantastic to see how neural networks are equipped with episodic memory to give power of reasoning.

http://arxiv.org/pdf/1506.07285v1.pdf

### Understanding boosting, with nice vizualizations

http://www.r-bloggers.com/an-attempt-to-understand-boosting-algorithms/

### Great way to explain complexity of an algorithm.

### Informative read on a model building journey.

### Quick walkthrough of machine learning models and deep learning.

http://www.slideshare.net/mobile/TerryTaewoongUm/introduction-to-machine-learning-and-deep-learning

### [Nerd] Just because it's funny

http://www.cafepress.com/mf/96703427/i-support-vector-machines_tshirt?productId=1502159657

### Nice showcase of modeling on Spark: Word2Vec & Gradient Boosting Machines.

http://h2o.ai/blog/2015/06/ask-craig-sparkling-water/

### Insightful post on modeling human behavior.

http://joelcadwell.blogspot.com/2015/06/looking-for-preference-in-all-wrong.html?m=1

### This is how Facebook knows who you are, even without seeing your face.

### Inspiring and insightful interview with Top Kaggler

http://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/

### Large scale flash memory failures: a good read into analytics at work to understand life cycle of hardware components. What is missing, is the forward looking part: can you see how to include that?

http://users.ece.cmu.edu/~omutlu/pub/flash-memory-failures-in-the-field-at-facebook_sigmetrics15.pdf

### If you have nothing better to do today, analyze this terabyte dataset with your favorite Click Through Rate models.

http://labs.criteo.com/downloads/download-terabyte-click-logs/

### Read up on ROC and AUC with an application of predicting the number of deaths from the Titanic. (soo useful).

http://www.r-bloggers.com/illustrated-guide-to-roc-and-auc/

### Interesting read on hyper-parameter optimization.

https://medium.com/@D33B/smarter-parameter-sweeps-or-why-grid-search-is-plain-stupid-c17d97a0e881

### The never ending possibilities of the neural network: teaching the computer to have conversations.

http://arxiv.org/pdf/1506.05869v1.pdf

### Simple (technical) read on Random forest

http://www.r-bloggers.com/variable-importance-plot-and-variable-selection/

### Brilliant article on R in the IBM Cloud.

http://www.ibm.com/developerworks/library/ba-bluemix-trs-predictive-analytics-with-dashdb/index.html

### Cool automation in R

http://www.r-bloggers.com/connecting-r-to-everything-with-ifttt/

### This is how machine translation works. Cool stuff!

http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/

### [YouTube+code] Neural network evolves to play Super Mario World.

https://www.youtube.com/watch?v=qv6UVOQ0F44

### Fascinating result! Machine learning method beats humans in verbal comprehension questions IQ test. Technical paper.

http://arxiv.org/pdf/1505.07909v1

### Mapping example in R, good example code, with an application to crime analytics.

http://www.r-bloggers.com/introductory-point-pattern-analysis-of-open-crime-data-in-london/

### A must read on model mixing! Well written, lots of examples, and not available in such collection and overview in literature.

http://mlwave.com/kaggle-ensembling-guide/

### Basic R: getting familiar with data frames. An easy to follow and well illustrated tutorial.

http://www.r-bloggers.com/15-easy-solutions-to-your-data-frame-problems-in-r/

### Interesting, non-technical, read on a recommendation system for an online retailer.

http://www.www2015.it/documents/proceedings/companion/p1269.pdf

### How to become a data scientist: a nice guide with lots of detail.

http://www.mastersindatascience.org/careers/data-scientist/

### This makes me smile: logistic regression to find out the value of chess pieces.

http://www.r-bloggers.com/big-data-and-chess-what-are-the-predictive-point-values-of-chess-pieces/

### Clearly written article on A/B testing and proving your analytical model by setting up an experiment.

http://blog.dato.com/how-to-evaluate-machine-learning-models-the-pitfalls-of-ab-testing

### I love the thinking! This is what we need to do more in data science.

### Insightful paper on characteristics of fraud that are detectable in data by using analytics.

http://info.neo4j.com/rs/neotechnology/images/Fraud%20Detection%20Using%20GraphDB%20-%202014.pdf

### Good hints: speeding up your R code

http://rstatistics.net/strategies-to-speed-up-r-code/

### Tuning the parameters of your Random Forest model

http://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/

### Simple introduction to text mining: bag of words and term frequency / inverse document frequency (TF-IDF)

http://fastml.com/classifying-text-with-bag-of-words-a-tutorial/

### A useful pointer: lessons learned in high-performance R.

http://www.r-bloggers.com/lessons-learned-in-high-performance-r/

### Awesome paper from Google about the prediction of energy efficiency in their data centers. Well written, includes some examples how the predictions can be used to make the datacenter more efficient.

### Long, but worth the read: Hofstadter, the author of Godel, Escher and Bach on intelligence, AI and machine learning.

http://hardforkit.com/articles/the-man-who.html

### Simple intro read into the top 10 data mining algorithms. The real trick is to start using them :)

http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english

### Excellent series on the working of various machine learning models by understanding their decision boundery, shown in simple R code.

### AirBnb rocks! Check out the nerd section on their home-grown modeling tool (and check out the article on handling missing data in Random Forests).

http://nerds.airbnb.com/airflow/http://nerds.airbnb.com/overcoming-missing-values-in-a-rfc/

### Simple Random Forest explanation and coding example.

http://tjo-en.hatenablog.com/entry/2015/06/04/190000

### An overlooked area in Machine Learning: prediction intervals

http://blog.datadive.net/prediction-intervals-for-random-forests/

### Great read to expand your intuition on high dimensional spaces

http://isomorphism.es/post/120539470124/hacks-for-thinking-about-high-dimensional-space

### Insightful tips to improve your model

https://medium.com/@D33B/7-ways-to-improve-your-predictive-models-753705eba3d6

### Nice overview of the different data scientist skills.

http://dataconomy.com/the-22-skills-of-a-data-scientist/

### How to keep your data scientists: I like the first point (the other points make sense too)

http://channels.theinnovationenterprise.com/articles/how-to-keep-your-data-scientists

### Great article on how unstructured data became important to data science

http://www.hadoop360.com/blog/how-nosql-fundamentally-changed-machine-learning

### Thought provoking examples on how different datasets give rise to the same regression equations

http://en.m.wikipedia.org/wiki/Anscombe%27s_quartet

### Good post on interpreting categorical regression coefficients

http://www.r-bloggers.com/using-and-interpreting-different-contrasts-in-linear-models-in-r/

### Interesting post on A/B testing and the need for statistical sound criteria

http://kadavy.net/blog/posts/aa-testing/

### Questions from Data Science interviews

http://blog.udacity.com/2015/04/data-science-interview-questions.html

### A simple post on model evaluation. Specially the picture is useful to explain this process to the business

http://www.datasciencecentral.com/m/blogpost?id=6448529:BlogPost:275475

### A great post on handling Twitter responses using R

http://www.r-bloggers.com/analyzing-r-bloggers-posts-via-twitter/

### A good showcase for meta modeling; 3 three layer model is build from many model ensembles

### How to filter out relevant predictors for your model

https://www.knime.org/blog/seven-techniques-for-data-dimensionality-reduction