ETL | El Sotanillo de Juan Sierra Pons

During the COVID-19 I have invested some of the “free time” given by the lock down to refresh some old topics like capacity planning and command line optimizations.

In 2011 I got my LPIC-3 and while studying for the previous LPIC-2 two of the topics were Capacity Planning and Predict Future Resource Needs. To refresh this knowledge I recently took Matthew Pearson’s Linux Capacity Planning course from the LinuxAcademy

My interest in Data Science and Business Intelligence started with a course I took where the main tool used was Pentaho mostly PDI (aka Kettle) for ETL jobs and Report Designer for reports automation. Then I continued with Waikato’s university WEKA courses and this path drove me to read Jeroen Janssens‘ Data Science at the Command Line book which I have recently re-read again. In his book, Jeroen uses Ole’s Tange GNU parallel a tool I have already written about in my A Quick and Neat 🙂 Orchestrator using GNU Parallel post

How are Linux Capacity Planning, ETL, command line and parallelization of jobs related you might wonder. Let’s dig into it

Continue reading →

El Sotanillo de Juan Sierra Pons

Linux, Open Source, Bash, Virtualization, Cloud, Puppet, DevOps, Blog, Travels, etc.

Tag Archives: ETL

Optimizing long batch processes or ETL by using buff/cache properly II (parallelizing network operations)

Optimizing long batch processes or ETL by using buff/cache properly

Share

Share