Sunday, January 06, 2008

Belkin F5U503 firewire mishap

Though it may seem so, not all tasks computing are an easily digestible banquet for Cacasodo. Yes, there are high points of success, but like many the intrepid engineer, there are more failures than success. In this light will I tell you my tale of frustration.

It began with a simple goal: to optimize my video editing by installing a firewire card using the PCIX slot of my Dell SC1430. As my one and only PCI slot was used by my Sil680 RAID card, I wanted to install a firewire card in one of the two PCIX slots in the box. And of course, the card had to work under Fedora. As a precursor to the story, I will say that I have a DLink PCI firewire card and that card works without error in the box. But my purpose was to have the RAID card use the one and only PCI slot and the Belkin in one of the PCIX slots so both could live happily in the same home. Alas, I did not know the hell that awaited me in trying to make this happen.

It started as a hasty, unresearched purchase of a Belkin F5U503 firewire card. After my purchase was made, I was told by a friend to avoid the Belkin cards, as Belkin was a known OEM reseller who used substandard parts. Cold comfort that was after the fact, but I digress.

I brought the card home and installed it in the first PCIX slot of the Dell. The Belkin has three connectors that allow it to fit in either PCIX slot of my box:
http://www.staples.com/sbd/img/cat/std/s0057708_std.jpg

Here's a pic of my server's motherboard:
http://dcse.dell.com/IFR/PowerEdge/PESC1430/images/sysbrd_full.jpg

When I installed the card in the PCIX slot, here's what happened. Dmesg gives me this error:
[cacasodo ogre ~]# dmesg | grep 1394
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[38]
MMIO=[fc5ff800-fc5fffff] Max Packet=[2048] IR/IT contexts=[4/8]
ohci1394: fw-host0: Get PHY Reg timeout [0x000004c0/0x00000000/100]

ohci1394: fw-host0: Get PHY Reg timeout [0x000004c0/0x00000000/100]


I get a couple more errors in dmesg once I turn the cam on:
ieee1394: hpsb_update_config_rom() is deprecated
ieee1394: Failed to generate Configuration ROM image for host 0


lsmod reads:
[cacasodo ogre ~]# lsmod | grep 1394
ohci1394 43273 0 ieee1394 109081 1 ohci1394


lspci reads:
[cacasodo ogre ~]# lspci | grep 1394
06:06.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)


after probing for raw1394:
[cacasodo ogre ~]# modprobe raw1394
[cacasodo ogre ~]# lsmod | grep 1394
raw1394 37233 0 ohci1394 43273 0 ieee1394 109081 2 raw1394,ohci1394


I am running Fedora 7 and my kernel is the EZPlanet firewire patch kernel. I use this patched kernel because firewire is broken in Fedora 7's standard build:
[cacasodo ogre ~]# uname -r
2.6.22.9-1091.ez.fc7


I do see interrupt activity when I connect and turn on the cam:
[cacasodo ogre ~]# grep 1394 /proc/interrupts
38: 2 0 0 0 1 0
0 0 IO-APIC-fasteoi ohci1394


testlibraw gives me something:
[cacasodo ogre ~]# testlibraw
successfully got handle

current generation number: 1
1 card(s)
found
nodes on bus: 0,
card name: ohci1394

using first card found: 0 nodes on bus, local ID is 0, IRM is 63


But plugreport doesn't give me anything:
[cacasodo ogre ~]# plugreport
Host Adapter 0 ==============


Adding "noacpi", "noapic" and "nolapic" to my grub.conf and rebooting didn't help.

To see if it was kernel related, I tried booting using these other distros/kernels, with the same result..the "Get PHY Reg timeout" error:
Ubuntu 6.10 -> 2.6.17-10 kernel
Knoppix -> 2.6.19 kernel
Fedora 6 -> 2.6.18-1.2798 kernel

OK. I am almost spent for the evening. But does the card even work in the PCI slot? I put the card in the PCI slot and lo and behold, the card DOES work. To reiterate, the card does work in the PCI slot, just not in either of the PCIX slots. At this point I'm thinking that the card is not getting proper power, but I'm too tired to continue on. I sleep uneasily.

The next day, I take fifteen minutes to call Belkin. The gentleman on the line is helpful, but when I tell him I'm running Linux, I get the cold shoulder and he dismisses me with the "we don't support the card under Linux" line. Well, if the card works fine in one slot under Linux, why shouldn't it work in the other slot?! No matter, "we don't support the card under Linux."

Argh. So here is where I dive off into the deep end to set out to prove Belkin wrong that it is a power issue and not an OS issue. I went on Belkin's support site and tried to find the voltage requirements for the card. I could not. I searched on google and not so amazingly, I find a link to the voltage requirements ON BELKIN'S OWN SITE!
http://web.belkin.com/support/kb/kb.asp?a=3742&langid=

Great search engine there, Belkin! NOT! As well, I look on Dell's site for the voltage of the PCIX slots:
  • 2 64-bit/100MHz PCI-X slots 3.3V

The slots handle 3.3volts. Referring back to the Belkin KB article, it states clearly:
The F5U503 PCI requires a 5 volt PCI slot.

I knew it! Now, in order to prove Belkin wrong, I decide to get a Windows OS running on the Dell and show them that it is a POWER issue and NOT an OS related issue!

To do this, I'd install an older IDE drive in the Dell and apply an old Ghost image I had of XP. Well, the Dell doesn't have a floppy drive. So, I power down my web server, take the floppy out, install it in the Dell and reconfigure the BIOS to allow the floppy to be seen. OK. I try one of my Ghost floppies. It doesn't boot. I try a second and a third. They do not boot as well. Argh.

Wouldn't it be great if Ghost worked on a bootable CD? Unfortunately, Norton doesn't provide that option. But an enterprising person has a lovely install doc written up here:
http://nightowl.radified.com/bootcd/bootcdintro.html

Thank you, Nightowl! Though, it did take me a number of hours to setup the Ghost bootable disk, it was well worth the time as I now have reliable boot media for Ghost. The process itself could use a small blog entry. Nightowl did a great job, but he needs better formatting and less verbiage to make the instructions easier to follow.

Next, I used it to boot and apply a Ghost image of my XP system to that older hard drive I mentioned. But when I bootup, Ghost does not recognize the DVD disc. I try another disc that has an earlier Ghost image. Ghost also does not recognize that disc. Finally, I have a third DVD. Luckily, Ghost recognizes that disc. Unfortunately, when I go to apply the image, Ghost cannot find the IDE hard drive. What the hell? I see the drive in the BIOS, though the BIOS doesn't report back the correct size of the drive. It is the master drive on the IDE controller..what gives?

I tried a few couple configuration options:
-hard drive being slave
-hard drive cable select
-disabling the SATA drives
-changing boot sequence
-refreshing the BIOS version to the latest 1.0.4

None of these worked. Then I thought that if Ghost re-imaging didn't work, I should just install the OS itself. So, I had a bootable copy of the XP disc. I started the system with the disc in the drive, but the install process soon gave me a blue screen with a message regarding pci.sys. Looks like a lot of folks had that one:
http://www.google.com/search?q=xp+professional+install+pci.sys+blue+screen

This problem should be resolved with a disc that integrates XP with SP2 . Which of course, I don't have right now. So for expediancy's sake, I take another route: a Windows 2000 Advanced Server install. I was able to put the extra hard drive in a second machine, start the Win2K install process, but for some reason, the hard drive is not recognized properly by the BIOS. I brought the disk over to my Dell SC400 machine that runs XP. I disconnected the existing XP drive, installed the disk and got Win2K Advanced Server working without a problem. The box boots to that drive, so I know that the Master Boot Record is not corrupt. But when I move the drive back to the SC1430, it refuses to boot.

Not sure what to do at this point. I'd sure like to prove those Belkin support engineers wrong, but I can't get the "Belkin certifed" OS (in this case, Windows 2000 Advanced Server) to boot in the Dell. I assume I will have the same problem with XP SP2, but it might be worth a try.

To sum up:
-card doesn't work in PCIX slot, but I believe it is a simple voltage problem
-I'd like to prove this to Belkin, but I can't get a Windows OS to load via Ghost'ing an existing hard drive or installing a hard drive with a Windows OS on it

Update 1/6/08
Ooops..my bad. Looking in the manual for the Dell SC1430, it shows that pre-Win2003 OSs are not supported. Thereby, I am out $52 for the price I paid for the Belkin card. Just goes to show..DO YOUR RESEARCH before making a purchase of this kind.
!
A Frustrated Cacasodo

No comments:

Feel free to drop me a line or ask me a question.