crash dhcp ext4 Linux oracle servers

How not to setup your servers

Firstly happy holidays everyone. This will most likely be my last post of the year. And what a year it has been. The one horse UK economy standing still and the UK consumer being ripped of so a select group can still get their multi-million dollar bonuses.

But enough of politics. This last month I’ve have a couple of interesting experiences with my Linux servers. Both occuring because of overnight – unsupervised reboots. What linux needs rebooting you say? Well, yes, if you intend to keep your servers at the latest patch level. usually this is about once a month. If you don’t bother then you could keep a Linux server indefinitly. They are super reliable.

I have a small number of Linux servers this article is about two of them. One is the DHCP and fetchmail and dovecot server for the network – cham01. The other is the NAS and LDAP server for the network – cham02. Maybe a bad combination of services? It may seem through a series of events that this maybe the case.

I had scheduled a reboot of cham02 due to a new kernel patch having been downloaded. So too avoid messing up the user’s – because this is where all the e-mail is stored along with a lot of oracle and mysql databases – I scheduled the reboot at night. When I came in the following morning I discovered the server had not rebooted because the file system had some corruption on it. The fsck at boot failed and the server was in a maintenence shell. So those reports of ext4 having some bugs weren’t false at all!

As a result of this the LDAP server and the NFS exports were not operating. The cham01 server could not deliver any e-mail because it could not authenticate any users nor match e-mail addresses to accounts. The interesting thing was that I could not ssh into cham01 and nor could I log in at the console with the local admin account. It needed to be rebooted. But only after cham02 was rebooted – to ensure that the NFS mount for e-mail is available. Bad design number one? Or do I just need better control of the machine and services.

So I’ve no decided that no more unsupervised reboots will take place. The next reboot was a precautionary one, to make sure the RAID and all services were operating properly. So I shutdown cham01 then shutdown cham02 and brought cham02 up first. Bad move, cham01 is the DHCP server and so cham02 didn’t get assigned a IP address all the networking services weren’t operating. I had to bring cham02 up, shutdown the e-mail server, then restart networking on cham02. Is really the best way to setup servers. I don’t think essentially services like e-mail should be on a DHCP server such that you encounter this dependancy loop.

I’m now going to move the e-mail and dovecot to cham02 and keep cham01 as DHCP, DNS and backup server only. It won’t permanently mount NFS drives. Hopefully that takes one of the dependancy links away. Though any chance that there is way to have a fallback IP setup when Linux can’t find a DHCP service. That would have helped.

All for now. I’ve started a new blog focusing specifically on my work as an Oracle consultant and integrator. I’m learning the ins and outs of Salesforce now as one of the platforms. Enjoy!