A deep-in the tools to use for building a more reproducible ML project
David Beauchemin
8
minutes read
Over the past years, I’ve worked on various machine learning projects (mostly research ones), and I’ve faced numerous problems along the way that impacted the reproducibility of my results. I had to regularly (not without hating myself) take a lot of time to resolve which experiment was the best and which settings were associated with those results. Even worse, finding where the heck were my results was painful. All these situations made my work difficult to reproduce and also…
In this article, we will train an RNN, or more precisely, an LSTM, to predict the sequence of tags associated with a
given address, known as address parsing.
Contributing to the blog has never been easier. First of all, it must be said that any submission, whatever its format (Markdown, Microsoft Word, Notepad, name it!), will be considered, and ultimately transcribed into Markdown by our team. We offer the option to submit an article here, and we are already thinking of a way to review non-Markdown documents (possibly Google Docs). This being written, for those who would like to write and submit a post in the conventional way, here is a simple…
Scikit-Learn had its first release in 2007, which was a pre deep learning era. However, it’s one of the most known and adopted machine learning library, and is still growing.
Guillaume Chevalier
7
minutes read
Scikit-Learn’s “pipe and filter” design pattern is simply beautiful. But how to use it for Deep Learning, AutoML, and complex production-level pipelines?
This is probably going to sound cliché and trivial but I just realized that I learned how to use a computer before learning to use a pen. I launched my favourite game on a MS/DOS terminal a few years before writing my name on paper.
A simple explanation of Stein's paradox through the famous baseball example of Efron and Morris
Samuel Perreault
9
minutes read
There is nothing from my first stats course that I remember more clearly than Prof. Asgharian repeating “I have seen what I should have seen” to describe the idea behind maximum likelihood theory. Given a family of models, maximum likelihood estimation consists of finding which values of the parameters maximize the probability of observing the dataset we have observed. This idea, popularized in part by Sir Ronald A. Fisher, profoundly changed the field of statistics at a time when…
.Layer (dot-layer) is an open community promoting collaboration and knowledge sharing in data science. In Unix operating systems, files beggining with a dot are invisible to the user. Many ignore their purpose, even their existence. In this world where data are queens, data scientists form this hidden layer, invisible to the public. It is now time we say: Hello, World!