TIL About BFG Repo-cleaner

If you ever migrate code from Bitbucket to Github, you will unpleasantly discover that GH does not allow by default fiels larger than 100MB (unless you pay extra for Large File Storage). At that point, you will probably realize that Github isn't really the right place to store such large…

Video: Usos del Machine Learning aplicados al E-commerce

Aquí dejo el video de mi charla "Usos del Machine Learning aplicados al E-commerce" que tuvo lugar en la ENAE Business School como parte del Foro "Ecommerce & Big/Small Data". En esta charla explico varios algoritmos que se usan hoy en día en Ecommerce así como las librerías que…

Handle missing categoricals with PMML

PMML, a markup language developed by the Data Mining Group is, in my opinion, a well needed standard in the Data Science ecosystem. PMML is basically an xml format to define Machine learning pipelines, which allows for (sort of) interoperability between different ML Platforms. In particular, I have been working…

Video: Jornadas Data Science en Murcia

El 21 de Abril de 2017, y gracias al apoyo de Centic y del Info de Murcia, unas 80 personas se acercaron a que yo les diera la brasa durante 3 horas sobre todo lo relacionado con Data Science. Aquí dejo el video. Las transparencias las podeis ver en SlideShare.…

This is what a memory leak looks like

Left, side of this chart, VSZ (virtual memory) and RSS (RAM) over time (obtained via ps) for a process using poor implementation of KafkaClient in java, which is creating a new kafka client per GET request. This is bad. Right side of the chart, current performance once I fixed the…