Hitachi Vantara Certified Specialist – Pentaho Data Integration Implementation HCE-5920 Exam

This August I got my “Hitachi Vantara Certified Specialist – Pentaho Data Integration Implementation HCE-5920 Exam” certification. The badge can be checked in the following link

https://www.certmetrics.com/hitachi/public/badge.aspx?i=49&t=c&d=2020-08-22&ci=HDS00240188

In 2016 I finished a project for an EU institution to automate the generation of many reports. The data sources were diverse. eg: API’s, Databases, etc.

I used Pentaho Data Integration (Also know as PDI, Spoon or Kettle) to create the ETL’s jobs that consolidated all the data and generated the reports using the Report-Designer (now Pentaho Reporting)

The reports were generated automatically in 15 minutes while the previous way was taking around 3 weeks as the report were being filled manually in an intermittent way: 2 hours today, 3 hours tomorrow, 5 hours next week.

This summer, HitachiVantara that bought Pentaho several years ago offered for free its course Pentaho Data Integration Fundamentals (DI1000W). I took the course to refresh my knowledge and decided to pass the HCE-5920 Exam to get the certification. Finally I got my Badge 🙂

Share

Optimizing long batch processes or ETL by using buff/cache properly

During the COVID-19 I have invested some of the “free time” given by the lock down to refresh some old topics like capacity planning and command line optimizations.

I got my LPIC-3 in 2011 and while studying for the previous LPIC-2 two of the topics were Capacity Planning and Predict Future Resource Needs. To refresh the knowledge I recently took Matthew Pearson’s Linux Capacity Planning course from LinuxAcademy

My interest in Data Science and Business Intelligence started with a course I took where the main tool used was Pentaho mostly PDI (aka Kettle) for ETL jobs and Report Designer for reports automation. Then I continued with Waikato’s university WEKA courses and this path drove me to read Jeroen JanssensData Science at the Command Line book which I have recently re -read too. In this book Jeroen’s uses GNU parallel a tool I have already written about in my A Quick and Neat 🙂 Orchestrator using GNU Parallel post

Why are Linux Capacity, ETL, command line and parallelization of jobs related you might wonder. Let’s dig into it

Continue reading

Share

Adding headless capabilities to the Tresorit backup software using Xpra / Winswitch in Linux

Recently I have changed my backup solution from SpiderOak to Tresorit. I have been very happy with SpiderOak since I started with them around 2009, But last year backups and sync started to fail. Eg: backups taking ages or not finishing at all, etc. Also support response time was not good enough and didn’t find a proper fix for my problems, so finally I decided to move my business elsewhere. The chosen one was Tresorit, a Swiss based company that offered two things important for me de-duplication and client side encryption.

Both solutions works in Linux but Tresorit needs a GUI to work (SpiderOak support a headless mode). This was a problem as I wanted to run the Tresorit client in a headless VPS servers. To add a kind of pseudo headless support to the Tresorit client I decided to use the Xpra software a multi-platform (Microsoft Windows, Linux, Mac) screen and application forwarding system or as they say in the web page “screen for X11”.
Continue reading

Share

DevOps job interviews with old fashioned check list questions

During the last few weeks I have been interviewed for several DevOps positions. In two of them I had to reply a skills check-list and in the other one an exercise to be solved and send back by email. I think these check-list interviews are not good for DevOps positions, specially  if the check-lists used are not updated properly. Let’s see why…

Continue reading

Share

A Quick and Neat :) Orchestrator using GNU Parallel

Sometimes you have to deal with servers that you don’t know anything about:

  • You are a short temp IT consultant with not previous knowledge on the environment.
  • The CMDB is out of order.
  • You are on a DR situation.
  • Or simply the main administrator is not there.

And you need:

  • Run commands in parallel
  • Get info from many servers at a time
  • Troubleshoot DNS problems
  • Check how many servers are up and running

On my systems I use two orchestrators: MCollective and SaltStack (configured automatically using puppet) that fulfill my needs. But let’s see a quick way to have an orchestrator in a rapid manner.

Continue reading

Share

Bootstrapping a new VPS on a DigitalOcean droplet with puppet client up and running in 4 mins 15 secs.

I have been working with DigitalOcean for several months, on average DigitalOcean deploys your VPS server in 55 seconds. After the server is deployed, all the manual/prone to errors/boring configuration process is needed.

As I am using puppet to configure all my servers I have create provisioningDO rakefile script (based on John Arundel’s book Puppet 3 Cookbook)  to deploy and configure my servers in 4 min 15 sec. It means After 4 min 15 secs, my servers are ready for production.

provisioningDO uses Jack Pearkes’ tugboat CLI tool so, a fully installed and configured tugboat CLI is necessary. It shouldn’t take you more than 5-10 minutes to have a working and ready to go tugboat installation 🙂
Continue reading

Share

Creacion del Alicante Puppet Users Group

Llevaba ya tiempo dandole vueltas a la idea de montar un grupo de usuarios de puppet en Alicante, que no se si habra muchos…

La semana pasada mande un correo a la lista de usuarios de puppet por si habia alguien interesado y hoy he recibido un correo de puppetlabs.com indicandome que si tenia un grupo de meetup, que ellos me pondrian un link en su web. por lo que me he decidido a crear un group en meetup.com.

Por lo que oficialemente hoy ha sido creado el Alicante Puppet Users Group

Asi que si estas interesado en Puppet, DevOps, Data Center and Operations Automation y basicamente hacer las cosas una sola vez y que los ordenadores hagan el resto. Este es tu grupo.

Espero que os apunteis y cuando seamos unos cuantos hagamos la primera quedada.

Salu2 puppeteros Alicantinos

Share

How to configure the Comtrend’s HG532c ADSL router ARP table for (WOL) Wake On Lan from internet using expect

Several months ago I finally got the (WOL) Wake On Lan feature of my RTL8111/8168B NIC card working. The problem was that a new driver (other than the provided by Debian) and a special PCI configuration was needed.

The other problem I had to deal with was the ADSL Router (Comtrend HG532c, The one provided by the Spanish ISP Jazztel) configuration:

  • Open the required port: This was an easy one just opening the 7 a 9 port and forwarding them to the server we want to WOL from the internet
  • Make the router remember the server’s tuple MAC/IP address. That was easy too, but some manual work was needed as when router is restarted the ARP table is flushed.  🙁

In my current job I had to change recently some configuration and restart more than 600 IP phones. To perform such titanic task I created a quick and dirty script using expect. It worked like a charm and made me think about automatize the way I set the ARP table in my Comtrend HG532c ADSL router.

Continue reading

Share