[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4   5   6   7   8   9   10   11   12 [ Index of Past Answers ]


(?) The Answer Gang (!)


By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and... (meet the Gang) ... the Editors of Linux Gazette... and You!
Send questions (or interesting answers) to The Answer Gang for possible publication (but read the guidelines first)


(?) Hard Disk: BadCRC errors from dma_intr on bootup...

From Karthik Subramanian

Answered By Jay R. Ashworth, Chris Gianakopoulos, Didier Heyden, Johan H

Before i start, Many thanks for the good work :-)

(!) [Jay] We try.
Some of us are very trying, but you're expected to not notice. :-)

(?) I have a Samsung SV2042H (20 GB) as my primary master, and an ATAPI CD-ROM of unknown make as my primary slave.

I recently noticed the following messages on bootup: (extract from my /var/log/boot.msg)

<4>Freeing unused kernel memory: 112k freed
<4>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
<4>hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
<4>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
<4>hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
<4>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
<4>hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
<4>hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
<4>hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
<4>hdb: DMA disabled
<4>ide0: reset: success

1) What do the dma_intr messages mean? Does my HDD go to the junk heap, or is it possible for me to continue working with it? I have had no problems with it so far, despite the error messages.

(!) [Jay] I've been seeing something similar; same results, ie: nothing.
I think the IDE drivers got changed...
(!) [Didier] The DMA interrupt handler in the IDE driver seems to detect a data transfer failure (BadCRC) 4 times consecutively. All drives present on the corresponding IDE interface are then reset; in such a case (at least if you run a 2.4.x kernel), (U)DMA is disabled on both drives, even though you're told so only for your /dev/hdb CDROM (don't ask me why the kernel people have chosen to do so - one would have expected the faulty drive, hda, to be mentioned in a `DMA disabled' message as well :)
The fact that everything works fine (?) after the reset (no more awful messages and your system does boot, obviously) is reassuring: if your hard drive is indeed ready for something this is not (yet) for being sold back to your worst enemy ;) Be careful, though...
(!) [Johan] If dma is enabled on a controller that is not well supported, these errors can appear. ( I had it on a VIA KT266a with kernel 2.2. Upgrading to kernel 2.4 fixed it beautifully.
If you are sure that the IDE controller is supported, the drive is on its way out. You can run fsck with the badblock option turned on to mark these blocks as bad... As a rule, once these errors start, we throw the disk away(This is a high availability production environment).
If you dont mind that the disk can crash in the near future, make a backup and continue using it, it might work for a long time to come.
If the disk is under guarantee... take it back, it is not worth risking data loss if the drive can be replaced for free.
This is how you hunt for and fix badblocks.
# e2fsck -c /dev/hda1
Make sure that you have a backup, badblock scans can destroy data running with certain switches.
# man badblocks && man e2fsck (And read them carefully)
To turn of dma per drive
# hdparm -d0 /dev/hd[a-d]
To list dma settings
# hdparm -d /dev/hd[a-d]
To turn dma on
# hdparm -d1 /dev/hd[a-d]
Where hd[a-d] is hda, hdb, hdc, hdd.

(?) 2) I didn't see any options to turn DMA off for the peripherals in my BIOS options - so why/how is DMA being disabled for hdb? ( i put in an 'hdparm -d1 /dev/hdb' in my /etc/boot.local to enable DMA for hdb. )

(!) [Didier] You can pass an `ide=nodma' option to the boot loader to achieve this. Note that in the present case you'd better remove the `hdparm' line from your bootup script (-d1 is for forcing DMA on). Unfortunately I don't think it can be done on a per-drive basis (nor even on a per-interface basis).
To clarify, (U)DMA at kernel startup can only be globally disabled. You'll have then to fiddle with the hdparm utility to change this for a given drive (at your own risks).
There doesn't seem to be any `hdx=nodma' (x = 'a', 'b', 'c' or 'd') nor `idex=nodma' (x = '0' or '1') kernel options available at present -- the so-called note has been inserted at a wrong place :)
Apart from this you could try setting your CDROM drive as master on the IDE1 interface.

(?) 3) What does the number 4 prepended to the messages in /var/log/boot.msg (there are other numbers for the other messages) mean?

(!) [Jay] You're running Mandrake, aren't you? :-)
It's got something to do with the "debug level" that produces that particular line of kprintf output, I believe.
(!) [Didier] This number is most probably the log level associated to the given kernel message (<4> is usually the default value and corresponds to the KERN_WARNING level). A log level of <0> is for emergency conditions (system unusable) and <7> for debug messages.
(!) [Chris] Hi there, I beleive that dma_intr implies that a DMA interrupt occurred that is associated with your hard disk controller. You might be getting a Seek Complete error due to a bad CRC. In other words, either your media (the actual sectors of your hard disk platter) might be corrupt, or you might have a problem with your cabling.
Before I trashed the drive, I would unplug and replug the IDE cable from your disk controller AND your hard drive. Your disk controller might reside on your motherboard, and in that case, you would unplug the cable from the motherboard. You might also try a different IDE cable (the 40-pin ribbon cable) between your disk and the disk controller.
I start to worry when I see the BadCRC error messages, because when that happened to me, the hard disk eventually became useless. Make sure that you back up any data that you want to keep.
I saw those error messages on my son's computer when I gave him one of my hard drives that happened to be laying around. It was a 2Gb hard drive. At first, the messages were an annoyance during boot up. As time passed, we could not even get the system to boot up without running through fsck. Finally, things got so bad that fsck couldn't fix the filesystems. The drive is now on display, in parts, so that my son can show off the disk platters to his friends.
Good luck, and don't forget to back up your data.


This page edited and maintained by the Editors of Linux Gazette Copyright © 2002
Published in issue 76 of Linux Gazette March 2002
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4   5   6   7   8   9   10   11   12 [ Index of Past Answers ]