Hi!
My backup system has recently started causing me some grief.
I'm running backuppc backing up to an IDE drive in a Netgear SC101 SAN storage box. There are two drives in this device, and the second one is being talked to by Simple Backup, and is working fine.
The main/bigger drive is /dev/nbd1 and backuppc has started hanging whilst using it. It doesn't happen straight away - it's several hours after the backup has been started. If I power everything down, wait and switch it back on again, all the drives spin up and seem to work ok again.
The SC101 is attached via the SC101-nbd project driver. http://code.google.com/p/sc101-nbd/
It has all been working fine until recently. Nothing's changed, apart from installing routine security updates (inc kernel updates) I'm running ubuntu 10.04.
I'm currently doing a deep check on the disk using a tool from Ultimate boot Cd appropriate for the drive manufacturer, but a quick check said the disk's OK.
Any ideas why it's hanging like this? Once hung, I can't even ls the disk - it's completely blocked. I tried reformatting it yesterday, and erasing and reinstalling backuppc, but it crashed again.
Some of Syslog output below - there's many more after this one, but this is the first. Possible candidates: Disk SAN controller Main PC faulty Disk sleeping and failing to wake again.
Any ideas guys? Cheers Steve
Jan 11 07:00:28 myserver ntpd[1950]: kernel time sync status change 6001 Jan 11 07:02:01 myserver CRON[11758]: (logcheck) CMD ( if [ -x /usr/sbin/logcheck ]; then nice -n10 /usr/sbin/logcheck; fi) Jan 11 07:04:11 myserver kernel: [45960.212092] INFO: task jbd2/nbd1-8:12704 blocked for more than 120 seconds. Jan 11 07:04:11 myserver kernel: [45960.212100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 11 07:04:11 myserver kernel: [45960.212104] jbd2/nbd1-8 D 00005645 0 12704 2 0x00000000 Jan 11 07:04:11 myserver kernel: [45960.212113] c6ce9de4 00000046 f7490000 00005645 00000000 c08964e0 c47b9c24 c08964e0 Jan 11 07:04:11 myserver kernel: [45960.212123] 3c1ac4f3 0000299c c08964e0 c08964e0 c47b9c24 c08964e0 c08964e0 d7e7d800 Jan 11 07:04:11 myserver kernel: [45960.212131] 00000000 0000299c c47b9980 c47b9980 c30094e0 c6ce9e2c c6ce9df4 c05b71a1 Jan 11 07:04:11 myserver kernel: [45960.212140] Call Trace: Jan 11 07:04:11 myserver kernel: [45960.212157] [<c05b71a1>] io_schedule+0x61/0xa0 Jan 11 07:04:11 myserver kernel: [45960.212165] [<c01d4b9d>] sync_page+0x3d/0x50 Jan 11 07:04:11 myserver kernel: [45960.212171] [<c05b795d>] __wait_on_bit+0x4d/0x70 Jan 11 07:04:11 myserver kernel: [45960.212176] [<c01d4b60>] ? sync_page+0x0/0x50 Jan 11 07:04:11 myserver kernel: [45960.212181] [<c01d4dc1>] wait_on_page_bit+0x91/0xa0 Jan 11 07:04:11 myserver kernel: [45960.212187] [<c0170e90>] ? wake_bit_function+0x0/0x50 Jan 11 07:04:11 myserver kernel: [45960.212193] [<c01d5051>] wait_on_page_writeback_range+0xa1/0x110 Jan 11 07:04:11 myserver kernel: [45960.212201] [<c023ca2e>] ? bio_alloc_bioset+0x2e/0xf0 Jan 11 07:04:11 myserver kernel: [45960.212209] [<c02da977>] ? jbd2_journal_write_metadata_buffer+0x277/0x370 Jan 11 07:04:11 myserver kernel: [45960.212215] [<c01d511a>] filemap_fdatawait+0x5a/0x70 Jan 11 07:04:11 myserver kernel: [45960.212221] [<c02d2bae>] jbd2_journal_commit_transaction+0x51e/0x1030 Jan 11 07:04:11 myserver kernel: [45960.212230] [<c01311b8>] ? default_spin_lock_flags+0x8/0x10 Jan 11 07:04:11 myserver kernel: [45960.212236] [<c0164c78>] ? try_to_del_timer_sync+0x68/0xb0 Jan 11 07:04:11 myserver kernel: [45960.212241] [<c02d93d5>] kjournald2+0x95/0x1c0 Jan 11 07:04:11 myserver kernel: [45960.212247] [<c0170e40>] ? autoremove_wake_function+0x0/0x50 Jan 11 07:04:11 myserver kernel: [45960.212252] [<c02d9340>] ? kjournald2+0x0/0x1c0 Jan 11 07:04:11 myserver kernel: [45960.212256] [<c0170bb4>] kthread+0x74/0x80 Jan 11 07:04:11 myserver kernel: [45960.212261] [<c0170b40>] ? kthread+0x0/0x80 Jan 11 07:04:11 myserver kernel: [45960.212267] [<c010a467>] kernel_thread_helper+0x7/0x10 Jan 11 07:04:11 myserver kernel: [45960.212278] INFO: task flush-43:16:11670 blocked for more than 120 seconds. Jan 11 07:04:11 myserver kernel: [45960.212281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 11 07:04:11 myserver kernel: [45960.212285] flush-43:16 D 00000484 0 11670 2 0x00000000 Jan 11 07:04:11 myserver kernel: [45960.212291] c46d3b34 00000046 c07eaa20 00000484 00000000 c08964e0 ef5735a4 c08964e0 Jan 11 07:04:11 myserver kernel: [45960.212300] f5f3884b 0000299a c08964e0 c08964e0 ef5735a4 c08964e0 c08964e0 e8566a00 Jan 11 07:04:11 myserver kernel: [45960.212308] f5f2b230 0000299a ef573300 c2e0afe0 eb6accc0 c46d3b74 c46d3b94 c02d1ec5 Jan 11 07:04:11 myserver kernel: [45960.212316] Call Trace: Jan 11 07:04:11 myserver kernel: [45960.212324] [<c02d1ec5>] do_get_write_access+0x245/0x440 Jan 11 07:04:11 myserver kernel: [45960.212330] [<c0170e90>] ? wake_bit_function+0x0/0x50 Jan 11 07:04:11 myserver kernel: [45960.212336] [<c02d2238>] jbd2_journal_get_write_access+0x28/0x40 Jan 11 07:04:11 myserver kernel: [45960.212343] [<c02bb3fe>] __ext4_journal_get_write_access+0x2e/0x60 Jan 11 07:04:11 myserver kernel: [45960.212349] [<c02bd611>] ext4_mb_mark_diskspace_used+0x61/0x2f0 Jan 11 07:04:11 myserver kernel: [45960.212354] [<c02c38a6>] ext4_mb_new_blocks+0x306/0x590 Jan 11 07:04:11 myserver kernel: [45960.212360] [<c02b8613>] ? ext4_ext_find_extent+0x253/0x280 Jan 11 07:04:11 myserver kernel: [45960.212365] [<c02baf12>] ext4_ext_get_blocks+0x472/0x700 Jan 11 07:04:11 myserver kernel: [45960.212373] [<c0298f98>] ext4_get_blocks+0xf8/0x2c0 Jan 11 07:04:11 myserver kernel: [45960.212379] [<c02997c9>] mpage_da_map_blocks+0x99/0x410 Jan 11 07:04:11 myserver kernel: [45960.212385] [<c02b0d90>] ? ext4_journal_start_sb+0xc0/0xf0 Jan 11 07:04:11 myserver kernel: [45960.212390] [<c0296917>] ? ext4_meta_trans_blocks+0xb7/0xd0 Jan 11 07:04:11 myserver kernel: [45960.212396] [<c0299dc9>] ext4_da_writepages+0x289/0x580 Jan 11 07:04:11 myserver kernel: [45960.212403] [<c01489d9>] ? update_curr+0x169/0x2a0 Jan 11 07:04:11 myserver kernel: [45960.212410] [<c01dc66c>] do_writepages+0x1c/0x40 Jan 11 07:04:11 myserver kernel: [45960.212417] [<c0232a15>] writeback_single_inode+0xd5/0x380 Jan 11 07:04:11 myserver kernel: [45960.212422] [<c02331f9>] writeback_sb_inodes+0x149/0x230 Jan 11 07:04:11 myserver kernel: [45960.212427] [<c023364a>] writeback_inodes_wb+0x8a/0x160 Jan 11 07:04:11 myserver kernel: [45960.212432] [<c0233903>] wb_writeback+0x1e3/0x230 Jan 11 07:04:11 myserver kernel: [45960.212438] [<c0233a81>] wb_do_writeback+0x131/0x140 Jan 11 07:04:11 myserver kernel: [45960.212443] [<c0233acb>] bdi_writeback_task+0x3b/0xb0 Jan 11 07:04:11 myserver kernel: [45960.212448] [<c01eb9e0>] ? bdi_start_fn+0x0/0xc0 Jan 11 07:04:11 myserver kernel: [45960.212453] [<c01eba3d>] bdi_start_fn+0x5d/0xc0 Jan 11 07:04:11 myserver kernel: [45960.212457] [<c01eb9e0>] ? bdi_start_fn+0x0/0xc0 Jan 11 07:04:11 myserver kernel: [45960.212461] [<c0170bb4>] kthread+0x74/0x80 Jan 11 07:04:11 myserver kernel: [45960.212466] [<c0170b40>] ? kthread+0x0/0x80 Jan 11 07:04:11 myserver kernel: [45960.212471] [<c010a467>] kernel_thread_helper+0x7/0x10 Jan 11 07:05:01 myserver CRON[11897]: (root) CMD (if [ -x /usr/bin/vnstat ] && [ `ls /var/lib/vnstat/ | wc -l` -ge 1 ]; then /usr/bin/vnstat -u; fi) Jan 11 07:06:11 myserver kernel: [46080.212093] INFO: task jbd2/nbd1-8:12704 blocked for more than 120 seconds. Jan 11 07:06:11 myserver kernel: [46080.212101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 11 07:06:11 myserver kernel: [46080.212106] jbd2/nbd1-8 D 00005645 0 12704 2 0x00000000 Jan 11 07:06:11 myserver kernel: [46080.212114] c6ce9de4 00000046 f7490000 00005645 00000000 c08964e0 c47b9c24 c08964e0 Jan 11 07:06:11 myserver kernel: [46080.212124] 3c1ac4f3 0000299c c08964e0 c08964e0 c47b9c24 c08964e0 c08964e0 d7e7d800 Jan 11 07:06:11 myserver kernel: [46080.212132] 00000000 0000299c c47b9980 c47b9980 c30094e0 c6ce9e2c c6ce9df4 c05b71a1 Jan 11 07:06:11 myserver kernel: [46080.212140] Call Trace: Jan 11 07:06:11 myserver kernel: [46080.212156] [<c05b71a1>] io_schedule+0x61/0xa0 Jan 11 07:06:11 myserver kernel: [46080.212165] [<c01d4b9d>] sync_page+0x3d/0x50 Jan 11 07:06:11 myserver kernel: [46080.212171] [<c05b795d>] __wait_on_bit+0x4d/0x70 Jan 11 07:06:11 myserver kernel: [46080.212175] [<c01d4b60>] ? sync_page+0x0/0x50 Jan 11 07:06:11 myserver kernel: [46080.212180] [<c01d4dc1>] wait_on_page_bit+0x91/0xa0 Jan 11 07:06:11 myserver kernel: [46080.212187] [<c0170e90>] ? wake_bit_function+0x0/0x50 Jan 11 07:06:11 myserver kernel: [46080.212192] [<c01d5051>] wait_on_page_writeback_range+0xa1/0x110 Jan 11 07:06:11 myserver kernel: [46080.212200] [<c023ca2e>] ? bio_alloc_bioset+0x2e/0xf0 Jan 11 07:06:11 myserver kernel: [46080.212207] [<c02da977>] ? jbd2_journal_write_metadata_buffer+0x277/0x370 Jan 11 07:06:11 myserver kernel: [46080.212213] [<c01d511a>] filemap_fdatawait+0x5a/0x70 Jan 11 07:06:11 myserver kernel: [46080.212219] [<c02d2bae>] jbd2_journal_commit_transaction+0x51e/0x1030 Jan 11 07:06:11 myserver kernel: [46080.212228] [<c01311b8>] ? default_spin_lock_flags+0x8/0x10 Jan 11 07:06:11 myserver kernel: [46080.212234] [<c0164c78>] ? try_to_del_timer_sync+0x68/0xb0 Jan 11 07:06:11 myserver kernel: [46080.212239] [<c02d93d5>] kjournald2+0x95/0x1c0 Jan 11 07:06:11 myserver kernel: [46080.212245] [<c0170e40>] ? autoremove_wake_function+0x0/0x50 Jan 11 07:06:11 myserver kernel: [46080.212250] [<c02d9340>] ? kjournald2+0x0/0x1c0 Jan 11 07:06:11 myserver kernel: [46080.212254] [<c0170bb4>] kthread+0x74/0x80 Jan 11 07:06:11 myserver kernel: [46080.212259] [<c0170b40>] ? kthread+0x0/0x80 Jan 11 07:06:11 myserver kernel: [46080.212265] [<c010a467>] kernel_thread_helper+0x7/0x10 Jan 11 07:06:11 myserver kernel: [46080.212277] INFO: task flush-43:16:11670 blocked for more than 120 seconds. Jan 11 07:06:11 myserver kernel: [46080.212282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 11 07:06:11 myserver kernel: [46080.212285] flush-43:16 D 00000484 0 11670 2 0x00000000 Jan 11 07:06:11 myserver kernel: [46080.212292] c46d3b34 00000046 c07eaa20 00000484 00000000 c08964e0 ef5735a4 c08964e0 Jan 11 07:06:11 myserver kernel: [46080.212300] f5f3884b 0000299a c08964e0 c08964e0 ef5735a4 c08964e0 c08964e0 e8566a00 Jan 11 07:06:11 myserver kernel: [46080.212309] f5f2b230 0000299a ef573300 c2e0afe0 eb6accc0 c46d3b74 c46d3b94 c02d1ec5 Jan 11 07:06:11 myserver kernel: [46080.212317] Call Trace: Jan 11 07:06:11 myserver kernel: [46080.212325] [<c02d1ec5>] do_get_write_access+0x245/0x440 Jan 11 07:06:11 myserver kernel: [46080.212331] [<c0170e90>] ? wake_bit_function+0x0/0x50 Jan 11 07:06:11 myserver kernel: [46080.212337] [<c02d2238>] jbd2_journal_get_write_access+0x28/0x40 Jan 11 07:06:11 myserver kernel: [46080.212344] [<c02bb3fe>] __ext4_journal_get_write_access+0x2e/0x60 Jan 11 07:06:11 myserver kernel: [46080.212349] [<c02bd611>] ext4_mb_mark_diskspace_used+0x61/0x2f0 Jan 11 07:06:11 myserver kernel: [46080.212354] [<c02c38a6>] ext4_mb_new_blocks+0x306/0x590 Jan 11 07:06:11 myserver kernel: [46080.212359] [<c02b8613>] ? ext4_ext_find_extent+0x253/0x280 Jan 11 07:06:11 myserver kernel: [46080.212364] [<c02baf12>] ext4_ext_get_blocks+0x472/0x700 Jan 11 07:06:11 myserver kernel: [46080.212372] [<c0298f98>] ext4_get_blocks+0xf8/0x2c0 Jan 11 07:06:11 myserver kernel: [46080.212377] [<c02997c9>] mpage_da_map_blocks+0x99/0x410 Jan 11 07:06:11 myserver kernel: [46080.212383] [<c02b0d90>] ? ext4_journal_start_sb+0xc0/0xf0 Jan 11 07:06:11 myserver kernel: [46080.212388] [<c0296917>] ? ext4_meta_trans_blocks+0xb7/0xd0 Jan 11 07:06:11 myserver kernel: [46080.212393] [<c0299dc9>] ext4_da_writepages+0x289/0x580 Jan 11 07:06:11 myserver kernel: [46080.212400] [<c01489d9>] ? update_curr+0x169/0x2a0 Jan 11 07:06:11 myserver kernel: [46080.212407] [<c01dc66c>] do_writepages+0x1c/0x40 Jan 11 07:06:11 myserver kernel: [46080.212413] [<c0232a15>] writeback_single_inode+0xd5/0x380 Jan 11 07:06:11 myserver kernel: [46080.212418] [<c02331f9>] writeback_sb_inodes+0x149/0x230 Jan 11 07:06:11 myserver kernel: [46080.212422] [<c023364a>] writeback_inodes_wb+0x8a/0x160 Jan 11 07:06:11 myserver kernel: [46080.212427] [<c0233903>] wb_writeback+0x1e3/0x230 Jan 11 07:06:11 myserver kernel: [46080.212433] [<c0233a81>] wb_do_writeback+0x131/0x140 Jan 11 07:06:11 myserver kernel: [46080.212438] [<c0233acb>] bdi_writeback_task+0x3b/0xb0 Jan 11 07:06:11 myserver kernel: [46080.212443] [<c01eb9e0>] ? bdi_start_fn+0x0/0xc0 Jan 11 07:06:11 myserver kernel: [46080.212447] [<c01eba3d>] bdi_start_fn+0x5d/0xc0 Jan 11 07:06:11 myserver kernel: [46080.212451] [<c01eb9e0>] ? bdi_start_fn+0x0/0xc0 Jan 11 07:06:11 myserver kernel: [46080.212455] [<c0170bb4>] kthread+0x74/0x80 Jan 11 07:06:11 myserver kernel: [46080.212459] [<c0170b40>] ? kthread+0x0/0x80 Jan 11 07:06:11 myserver kernel: [46080.212464] [<c010a467>] kernel_thread_helper+0x7/0x10 Jan 11 07:08:11 myserver kernel: [46200.212124] INFO: task jbd2/nbd1-8:12704 blocked for more than 120 seconds.