In the previous post I have focused in avoiding as much as possible IO on disk and if that was not possible using buff/cache as much as possible by grouping in time IO operations. This approach can make our ETL processes run X times faster. In the two examples the numbers where:
Avoiding IO at all was 11,3 times faster
Using buff/cache was almost 4 times faster
All the examples used a dataset already in the disk so no real network operation occurred. In this post I am going to focus on network operation using again GNU parallel.
Usamos cookies para ofrecerte la mejor experiencia en nuestra web.
Haciendo clic en “Aceptar” das tu consetimiento para usar estas cookies.
Puedes realizar un consentimiento pormenorizado en \"Ajustes\".
Configurar y más información