I'm using a SCSI 72Gb tape drive on a Fedora Core3 system which has recently been updated with the latest kernel (about 2 weeks ago). The tape backup (using tar -cf /dev/st0 ...) had been working fine - for about 5 months, and was used to restore data at that time.
I have just discovered that the backup has been failing recently. When I run tar -cvf /dev/st0 ... only 2048 bytes are backed up before this message is displayed:
tar: /dev/st0: Wrote only 2048 of 10240 bytes tar: Error is not recoverable: exiting now
If I then run tar -tvf /dev/st0, I get all the files upto the point at which the error message was generated.
Any ideas what may have gone wrong? Are there any tools to run diagnostics on the tape unit?
Many thanks
Stuart
Any ideas what may have gone wrong? Are there any tools to run diagnostics on the tape unit?
I assume we are talking about a DAT 36-72 drive here ?
I know of some diagnostic tools that run on Linux for DLT, but I have never used anything on a DAT drive, DLT is easy as only about two companies actually build the drives (quantum and tandberg). But I think with DAT you may be needing device specific tools.
Some drives are very fussy about block sizes, check your drive specifications and use the mt command to set this up. Although why only now this has would become a problem I am not sure.
Sometimes I have heard that the mt command can report useful information when asked to show drive status, you may see a count of read/write errors etc.
If we are talking something like DAT or AIT then is the tape rewound enough to fit some data on ? Sorry a bit of a newbie question, but I thought I'd check.
I have seen grey beard Unix admins use the cpio command in mind bending ways to test tape devices before, but please don't ask me to recite any of it :-)
Also remember that the cleaning light is only a guide, Although it's not good for the drive to run a cleaning tape through it too often, in this case it may be worth a try.
Dud tape ?
I assume we are talking about a DAT 36-72 drive here ?
HP StorageWorks DAT 72i
I know of some diagnostic tools that run on Linux for DLT, but I have never used anything on a DAT drive, DLT is easy as only about two companies actually build the drives (quantum and tandberg). But I think with DAT you may be needing device specific tools.
Some drives are very fussy about block sizes, check your drive specifications and use the mt command to set this up. Although why only now this has would become a problem I am not sure.
mt doesn't seem to like the drive (but Fedora uses mt-st rather than GNU mt)
# mt -f /dev/st0 status /dev/st0: Inappropriate ioctl for device
Sometimes I have heard that the mt command can report useful information when asked to show drive status, you may see a count of read/write errors etc.
If we are talking something like DAT or AIT then is the tape rewound enough to fit some data on ? Sorry a bit of a newbie question, but I thought I'd check.
Yes, we are using both old backup tapes and brand new tapes. these are rewound.
I have seen grey beard Unix admins use the cpio command in mind bending ways to test tape devices before, but please don't ask me to recite any of it :-)
I had forgotten about cpio - I will try an figure oput how to use it ;-)
Also remember that the cleaning light is only a guide, Although it's not good for the drive to run a cleaning tape through it too often, in this case it may be worth a try.
Cleaning the drive actually made no difference - and the cleaning light was not lit.
Thanks,
Stuart
On 03-Aug-05 Stuart Bailey wrote:
I'm using a SCSI 72Gb tape drive on a Fedora Core3 system which has recently been updated with the latest kernel (about 2 weeks ago). The tape backup (using tar -cf /dev/st0 ...) had been working fine - for about 5 months, and was used to restore data at that time.
I have just discovered that the backup has been failing recently. When I run tar -cvf /dev/st0 ... only 2048 bytes are backed up before this message is displayed:
tar: /dev/st0: Wrote only 2048 of 10240 bytes tar: Error is not recoverable: exiting now
If I then run tar -tvf /dev/st0, I get all the files upto the point at which the error message was generated.
Any ideas what may have gone wrong? Are there any tools to run diagnostics on the tape unit?
A few questions/suggestions.
1. Did this trouble start concurrently with the kernel upgrade? If so, possibly the cause is there. Can you re-instate the previous kernel and see if it still gives trouble? If it's the kernel upgrade then I don't have useful ideas.
2. Do you get the same problem regardless of which tape you put in the drive? If it's just one tape, then there may be a defect on the tape itself. But if it's independent of the tape, and it's not the kernel, then this points to the tape drive itself.
3. If it's not the tape, then try putting a spare (i.e. potentially disposable) tape in the drive and raw-writing to it:
a) Set up a test file with decipherable structure:
echo -e "\n" | awk '{for(i=1;i<=1000000;i++){printf("%07.0f\n",i)}}' > testfile
which will give you a test file with 8000000 bytes (7 for each integer plus a newline, so 8 bytes per integer).
b) raw-write this to the tape in various ways, e.g.:
dd if=testfile of=/dev/st0 bs=512 count=8
which will write 4096 bytes to the device in 8 blocks of 512 bytes.
c) raw-read it back (you will need to re-wind the tape first), e.g.:
dd if=/dev/st0 bs=512 count=8
[or e.g. bs=4096 count=1]
and see how far it gets. If only 2048 bytes of the 4096 got written to the tape, then the last line to be printed to the console would be "0000256".
d) Vary the above with different values for "bs" and "count".
The fact that your tape error says that only 2048 bytes were written suggests that the mechanism may be using a block-size of 2048 bytes and only one block got written. Where this failure to move to the next block arises, however, is not clear. It may be a hardware failure in the drive (internal buffer of 2048, not recycled); failure to communicate with the drive (e.g. the "handshake" from the drive would announce that the "write" had been cleared and it was ready for the next block, but the handshake was not being read and acted on); the kernel was using a 2048-byte block of RAM as a buffer but not re-cycling this; etc.
Using "bs" greater than 2048 as well as less than or equal to (e.g. "bs=4096" or "bs=8192") may discriminate.
Hoping this provides a useful pointer or two!
Best wishes, Ted.
-------------------------------------------------------------------- E-Mail: (Ted Harding) Ted.Harding@nessie.mcc.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 04-Aug-05 Time: 10:22:08 ------------------------------ XFMail ------------------------------