In the two previous post we have seen how disk IO and network IO affects our ETLs. For both use cases we have seen several techniques that could be used to improve drastically performance and drive to an efficient resource usage:
Avoid IO disk at all.
Use buff/cache properly if IO disk couldn’t be avoided.
Optimize data download by choosing the right file format, use the Keep-Alive properly and parallelize network operations.
In this post we are going to put together network and processing operations to see the improvement in a complete workflow.
The reports were generated automatically in 15 minutes while the previous way was taking around 3 weeks as the report were being filled manually in an intermittent way: 2 hours today, 3 hours tomorrow, 5 hours next week.
During the last few weeks I have been interviewed for several DevOps positions. In two of them I had to reply a skills check-list and in the other one an exercise to be solved and send back by email. I think these check-list interviews are not good for DevOps positions, specially if the check-lists used are not updated properly. Let’s see why…
I have been working with DigitalOcean for several months, on average DigitalOcean deploys your VPS server in 55 seconds. After the server is deployed, all the manual/prone to errors/boring configuration process is needed.