Design decisions that can ruin your day.

I ran into three really bad design decisions this morning while trying to work with a virtual machine, the sum total of which consumed hours of my day for what could have been a fifteen minute project. 70-persistent-net.rules in Ubuntu, the networking command line in Sun's VirtualBox and Sun VirtualBox's hard drive "manager".

I had a Windows machine that was running a Ubuntu install that was basically there to run VTiger (an open source customer relationship manager). To do this, it needs to run Apache and MySQL. I have been running this machine for some time and it works great: the load on VTiger is very low and simply didn't justify an entire machine.

The problem began when I got a larger hard drive for the machine and added it to Windows. All I really wanted to do was move the file that held the virtual image, update the virtual machine and move on.

Moving the file was easy enough (virtual machines have "hard drives" that are really just files in the host's file system. However, adding that file to VirtualBox proved more problematic as it notes that the hard drive GUID you are adding matches one it has already attached.

Then it refuses to have anything to do with the copied file. If VirtualBox had simply said "do you want to relink to this file" I would have been done there and then. But instead, I had to:

* Merge all the snapshots into the virtual machine (this step at least makes some type of sense).
* Remove the hard drive from the drive manager.
* Remove the virtual machine that references the hard drive
* Attach the copied drive to the drive manager.
* Recreate the virtual machine that references the hard drive.

This is where the other two design choices came to bite me. By recreating the virtual machine, I also had to reproduce the NAT mappings I had before. Instead of providing a UI, you have to make such settings with long command lines. Fortunately, I had a batch file I made when I had done this previously, so I figured this would be just a matter of search and replacing the virtual machine name.

After this was done I booted the virtual machine, but couldn't reach it. After some confusion over the Sun VirtualBox command line (perhaps I made an error in my edits?) I realized that Ubuntu was showing only a LO adapter. My ETH0 adapter was missing.

After much mucking around I found the problem was this: http://muffinresearch.co.uk/archives/2008/07/13/vmware-siocsifaddr-no-such-device-eth0-after-cloning/

It turns out that someone thought that caching settings about network adapters in this file (which doesn't exist in non-ubuntu linux) would be a good idea. They also apparently decided that caching wasn't good enough, but it was the one true source of knowledge about network adapters and when confronted with a change it should be ignored.

In my case, the change was that the same "hardware" network adapter now had a new MAC address. The mismatch isn't reported, it simply refuses to bring ETH0 up, denying all network connectivity.

This kind of design choice is the worst: programs that think they are smarter than the people using them. The worst part of this design is it will only bite people who are already suffering either hardware problems or some kind of change like I had: which means confounding variables abound.

To add insult to injury, one of the 1GB memory modules had decided it doesn't want to be recognized either, so I have to dismount and reseat the module (at minimum). Grrr.

In summary, the entire adventure would have been avoided if VirtualBox would have accepted a change to the location of the virtual drive file. It would have been avoided if making NAT mappings was simple enough to not be a red herring and it definitely would have been avoided if Ubuntu didn't think itself clever by keeping "persistent rules" that actually hinder repairing network failures.

At least next time I will have this post to remind me of the problem... hopefully it helps one other person some day.