Optimising the disk footprint of GNU/Linux distributions for the Cloud

Did you know that the standard, off the shelf GNU/Linux instances you can get on Amazon are usually largely bigger than waht you actually need? Well, our well established Mancoosi tools can help you (again!).

Quinton, Rouvoy and Duchien remarked in their recent work presented at CloudPC 2012 that the stock virtual machine images pre-installed with some GNU/Linux distribution contain a lot of packages that you do not really need for your standard use, even when this particular image is advertised specifically for your precise purpose. As a consequence, you end up paying for storage that you do not need at all, just to save a copy of your nice instance.

After seeing a presentation on this issue by Clement Quinton, that visited Irill recently, we decided to see with Pietro how strightforward it is to use our Mancoosi tools to solve this issue.

Well, it turns out that this is basically a trivial task, that can simply become a one-liner.

First of all, make sure that you have installed, on your Debian box, the following pieces of code from the Mancoosi project:

  1. aspcud, version 1.7 or later, from http://www.cs.uni-potsdam.de/wv/aspcud/ and make sure the executable aspcud is the aspcud-full one (the aspcud version in Debian experimental, as of 2012.10.24-1 is fine);
  1. ceve, which is one of the applications compiled by default in the Dose library from https://gforge.inria.fr/projects/dose/ (the version of ceve in Debian as of 1.4-2 is not recent enough)
  1. cudf_solution_checker from https://gforge.inria.fr/projects/misc-competitio/

Then just checkout the script thinnit.sh from its git repository, and you can now ask interesting questions like:

  • what is the smallest installation containing all packages needed to make an apache2 web server using Debian testing? The answer is, you need 154 packages, totalling 181Mb of disk space once installed, and the list of these packages is left in a CUDF file named thinned.cudf
./thinnit.sh /var/lib/apt/lists/ftp.fr.debian.org_debian_dists_testing_main_binary-amd64_Packages apache2
 <snip>
-181008,-154
  • and if I want tomcat and php? Then you need 176 packages, for a total of 258Mb, and the list of these packages is left in a CUDF file named thinned.cudf
./thinnit.sh /var/lib/apt/lists/ftp.fr.debian.org_debian_dists_testing_main_binary-amd64_Packages "tomcat6,php5"
 <snip>
-258482,-176

Well, I think you get the idea... and by the way you can modify the script to play with the full rich user preference language supported by the entrants of the latest MISC competition!

And the good news is... on a stock laptop all this takes less then 10 seconds.

Please let us know if you use this code, and... enjoy!

Acknowledgement: if this is just one line, now, it is because of all the wonderful people that worked on Mancoosi and participated in MISC over the past years.

There are so many to thank that it wont fit in the blog entry, but you can find them on the Mancoosi website, on the list of participants to the MISC competition, and mentioned in my previous posts on Mancoosi tools.

Comments

1. On Friday, February 8 2013, 13:21 by Roberto Di Cosmo

Of course you need to add the size of a minimal boot system (that can go up to over a dozne megabytes depending on your kernel).