Saturday, January 14, 2006

I am done troubleshooting the server

I have utterly and completely exhausted every avenue I have trodden down in an effort to get this machine going. I have swapped out (many, many times):
- memory (266Mhz DDR SDRAM/ECC/registered/CL2.5)
- mobo (Super Micro P4DL6)
- power supply..twice
- CMOS battery

I have removed CPUs. I have run single CPU configs. I have swapped the position of the existing CPUs. I have applied new thermal paste to a CPU. I have bought new memory. (I have returned new memory). I have connected storage types (ide or scsi) individually. I have connected storage types together. I have bought a new power supply. (I have returned a new power supply). I have used a different power outlet. I have removed the board from the case entirely in combination with a second power supply to make sure there was no short with the power or reset connectors from the original case.

I normally start with no storage, with no power connection or IDE/SCSI connection to the board. From there, if I get a clean boot and no freeze ups, I will add my DVD drive w/a Fedora install disc. After booting, the system will either make it past the initial Linux boot process or freeze during the Fedora install process. If the box makes it past the Linux boot process and doesn't hang, I am compelled to add storage. So, I alternately try adding SCSI or IDE storage, whatever suits my displeasure. SCSI freezes up on channel A, so I try channel B. Still freezes. I try the primary IDE controller, then I try the secondary IDE controller. I swap memory positions and brands while I do this, while I grid the combinations on paper, in order that I don't miss one.

So far, it has always frozen, either five minutes after booting or later, like fifteen minutes down the line when I almost get my hopes up that I've found the silver bullet and that it will actually work this time. 99% of the time it has frozen in a different place during the boot process or Fedora install.

The only unchanging constant are the CPUs. I am lead, inexorably, to the fact that these CPUs MUST be the source of the problem..everything else around them has been swapped out.

I must feel some joy at this discovery. However, it simply doesn't make sense that both CPUs would burn out. This doesn't seem logical. I believe that if I go through the effort of purchasing CPUs from Ebay that, in the end, I will be confronted with the same issue, will have wasted $150 or so for new CPUs and be left with final disappointment. But, I feel I have to resolve this chapter of my ridiculous life, so I plod on to my destruction.

I am done with troubleshooting the server.

UPDATE: if anyone has encountered and solved a similar problem, please let me know. Maybe I'll dig this mutha out of the trash bin and try once more! Thanks!


paul said...

Hi, I'm having exactly the same problem over here. Did you solve it in the mean time? I have two servers of the same configuration.

Both have :

P4DL6 mobo
2GB ram
2 DUAL core XEON 2,2Mhz

One unit keeps buzzing starting 2-30 seconds after power on,
the other takes 5 min. to 1 hour showing the same symptoms in the end

The symptoms are the buzzer turning on & the "overheat" LED turning on.

I suspect it has nothing to do with temp. , instead it could be the power supply. When I connect a spare power supply, which cannot deliver the 400 W required by spec, the unit immediately starts buzzing with the LED on.

However I have no appropriate power supply available.

Cacasodo said...

Hey Paul,
Unfortunately, no. With persistance, I usually can figure out most hardware issues. Not with the Supermicro. After two weeks of daily trial and error, I changed out every part to no avail.

Try to beg, borrow or steal a power supply that outputs greater than 400V. That might do the trick.

Let me know what happens.

Feel free to drop me a line or ask me a question.