Optimizing long batch processes or ETL by using buff/cache properly III (full workflow)

In the two previous post we have seen how disk IO and network IO affects our ETLs. For both use cases we have seen several techniques that could be used to improve drastically performance and drive to an efficient resource usage:

  • Avoid IO disk at all.
  • Use buff/cache properly if IO disk couldn’t be avoided.
  • Optimize data download by choosing the right file format, use the Keep-Alive properly and parallelize network operations.

In this post we are going to put together network and processing operations to see the improvement in a complete workflow.

Continue reading

Share

Hitachi Vantara Certified Specialist – Pentaho Data Integration Implementation HCE-5920 Exam

This August I got my “Hitachi Vantara Certified Specialist – Pentaho Data Integration Implementation HCE-5920 Exam” certification. The badge can be checked by clicking in the image or in the link

Hitachi Vantara Certified Specialist - Pentaho Data Integration implementation

https://www.certmetrics.com/hitachi/public/badge.aspx?i=49&t=c&d=2020-08-22&ci=HDS00240188

In 2016 I finished a project for an EU institution to automate the generation of many reports. The data sources were diverse. eg: API’s, Databases, etc.

I used Pentaho Data Integration (Also know as PDI, Spoon or Kettle) to create the ETL’s jobs that consolidated all the data and generated the reports using the Report-Designer (now Pentaho Reporting)

The reports were generated automatically in 15 minutes while the previous way was taking around 3 weeks as the report were being filled manually in an intermittent way: 2 hours today, 3 hours tomorrow, 5 hours next week.

This summer, HitachiVantara that bought Pentaho several years ago offered for free its course Pentaho Data Integration Fundamentals (DI1000W). I took the course to refresh my knowledge and decided to pass the HCE-5920 Exam to get the certification. Finally I got my Badge 🙂

Share

DevOps job interviews with old fashioned check list questions

During the last few weeks I have been interviewed for several DevOps positions. In two of them I had to reply a skills check-list and in the other one an exercise to be solved and send back by email. I think these check-list interviews are not good for DevOps positions, specially  if the check-lists used are not updated properly. Let’s see why…

Continue reading

Share

Bootstrapping a new VPS on a DigitalOcean droplet with puppet client up and running in 4 mins 15 secs.

I have been working with DigitalOcean for several months, on average DigitalOcean deploys your VPS server in 55 seconds. After the server is deployed, all the manual/prone to errors/boring configuration process is needed.

As I am using puppet to configure all my servers I have create provisioningDO rakefile script (based on John Arundel’s book Puppet 3 Cookbook)  to deploy and configure my servers in 4 min 15 sec. It means After 4 min 15 secs, my servers are ready for production.

provisioningDO uses Jack Pearkes’ tugboat CLI tool so, a fully installed and configured tugboat CLI is necessary. It shouldn’t take you more than 5-10 minutes to have a working and ready to go tugboat installation 🙂
Continue reading

Share