Server down
Well, the server went down again yesterday for no apparent reason. I was doing a routine boot on it and when POST finished I got a screen filled with “L l9 l9 l9 l9 l9 l9 l9 l9 l9 l9 l9 l9″ about 20 lines down. Never to fear though, between the help of LinuxQuestions and Google, I found that this occurs when Lilo is configured incorrectly or corrupted. I booted off of the install CD and then mounted hda1, then ran lilo -M /dev/hda mba to reinstall Lilo to the Master Boot Record. I hoped that it would be this simple.
Then, when I booted without the CD in, low-and-behold, the Lilo bootloader comes up and shows me “Linux”. Problem seems solved except for one more issue: the network is now unreachable. I’m baffled at this one though because Lilo shouldn’t have had anything to do with my network settings and I haven’t updated any drivers or configuration for months. When I ran dmesg | grep eth1 I found nothing out of the ordinary. ifconfig showed the cards with their proper addresses but of course this isn’t really conclusive because I don’t have DHCP enabled on the server; both of the cards have been configured with static addresses. When I ran dmesg | grep eth0 I was presented with a series of errors headed by “NETDEV WATCHDOG: eth0: transmit timed out”. So, more searching ensued and I found several other users with the same issue (as is almost always the case). One said user fixed the problem by disabling Plug & Play and USB support from his BIOS because of an IRQ conflict with his Realtek card (same card as me). What the heck I say? Done and done.
Server is up and running again after almost 24 hours downtime.