Welcome to Soft32 Linux Forums!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

[PATCH 1/2] Disk hot removal causing oopses and fixes

 
   Soft32 Home -> Linux -> Kernel RSS
Next:  Webmail Account Upgrading  
Author Message
Jarkko Lavinen

External


Since: Dec 01, 2006
Posts: 7



(Msg. 1) Posted: Wed Oct 21, 2009 11:20 am
Post subject: [PATCH 1/2] Disk hot removal causing oopses and fixes
Archived from groups: linux>kernel (more info?)

I am proposing two patches to protect the request queue and io
elevator from inadvertent releasing.

I've been testing mmc driver robustness against rapid card removal
reinsert cycles and it has been rather easy to get a kernel oopses
(on 2.6.2Cool because of insufficient locking.

When MMC driver notices a card has been removed, it removes the
card and the block device. The call proceeds to
blk_cleanup_queue() which marks the request queue dead, calls
elevator exit to release the elevator and puts request queue.
This releases elevator always and may also release the request
queue.

If file system is still mounted and using the disk after
blk_cleanup_queue() has been called, io requests will access
already free'ed structures and oopses and BUG_ON()s jump in.

When the proposed patches are applied, I am not able reproduce
the oopses in io scheduler or elevator anymore. There is still
another oops is sysfs truing to add duplicate file, but i think
that is not related.

Cheers
Jarkko Lavinen

From 9559377f3166345649c3406427f410cd51472944 Mon Sep 17 00:00:00 2001
From: Jarkko Lavinen <jarkko.lavinen DeleteThis @nokia.com>
Date: Wed, 21 Oct 2009 18:48:18 +0300
Subject: [PATCH 1/2] block: Avoid dead request queue too early removal

If disk is removed while file system is still mounted, the disk
removal can release the dead request queue too early while
file system is still trying to sumit requests.

Signed-off-by: Jarkko Lavinen <jarkko.lavinen DeleteThis @nokia.com>
---
fs/block_dev.c | 24 ++++++++++++++++++++----
1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9cf4b92..91f2fc3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1165,6 +1165,8 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
int ret;
int partno;
int perm = 0;
+ int q_ref = 0;
+ struct module *owner;

if (mode & FMODE_READ)
perm |= MAY_READ;
@@ -1187,6 +1189,11 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
if (!disk)
goto out_unlock_kernel;

+ if (blk_get_queue(disk->queue))
+ goto out_unlock_kernel;
+ else
+ q_ref = 1;
+
mutex_lock_nested(&bdev->bd_mutex, for_part);
if (!bdev->bd_openers) {
bdev->bd_disk = disk;
@@ -1248,8 +1255,10 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
}
} else {
+ owner = disk->fops->owner;
+ blk_put_queue(disk->queue);
put_disk(disk);
- module_put(disk->fops->owner);
+ module_put(owner);
disk = NULL;
if (bdev->bd_contains == bdev) {
if (bdev->bd_disk->fops->open) {
@@ -1281,9 +1290,15 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
out_unlock_kernel:
unlock_kernel();

- if (disk)
- module_put(disk->fops->owner);
- put_disk(disk);
+ if (disk) {
+ owner = disk->fops->owner;
+ if (q_ref)
+ blk_put_queue(disk->queue);
+
+ put_disk(disk);
+ module_put(owner);
+ }
+
bdput(bdev);

return ret;
@@ -1360,6 +1375,7 @@ static int __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
if (!bdev->bd_openers) {
struct module *owner = disk->fops->owner;

+ blk_put_queue(disk->queue);
put_disk(disk);
module_put(owner);
disk_put_part(bdev->bd_part);
--
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Jarkko Lavinen

External


Since: Dec 01, 2006
Posts: 7



(Msg. 2) Posted: Wed Oct 21, 2009 1:20 pm
Post subject: [PATCH 2/2] Disk hot removal causing oopses and fixes [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

From ff822af94f86dbe559b7649b7f71c464260ef353 Mon Sep 17 00:00:00 2001
From: Jarkko Lavinen <jarkko.lavinen.TakeThisOut@nokia.com>
Date: Wed, 21 Oct 2009 18:53:20 +0300
Subject: [PATCH 2/2] block: Protect elevator from too early release

When a disk is being removed, blk_put_queue calls elevator exit function
which will release the elevator while the elevator may still be in use.

Signed-off-by: Jarkko Lavinen <jarkko.lavinen.TakeThisOut@nokia.com>
---
block/blk-core.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ac0fa10..da56586 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1173,6 +1173,13 @@ static int __make_request(struct request_queue *q, struct bio *bio)
*/
blk_queue_bounce(q, &bio);

+ mutex_lock(&q->elevator->sysfs_lock);
+ if (test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)) {
+ mutex_unlock(&q->elevator->sysfs_lock);
+ bio_endio(bio, -EIO);
+ return 0;
+ }
+
spin_lock_irq(q->queue_lock);

if (unlikely(bio_rw_flagged(bio, BIO_RW_BARRIER)) || elv_queue_empty(q))
@@ -1275,6 +1282,8 @@ out:
if (unplug || !queue_should_plug(q))
__generic_unplug_device(q);
spin_unlock_irq(q->queue_lock);
+ mutex_unlock(&q->elevator->sysfs_lock);
+
return 0;
}

--
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Stefan Richter

External


Since: Jul 26, 2007
Posts: 530



(Msg. 3) Posted: Wed Oct 21, 2009 5:20 pm
Post subject: Re: [PATCH 1/2] Disk hot removal causing oopses and fixes [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Jarkko Lavinen wrote:
> When MMC driver notices a card has been removed, it removes the
> card and the block device. The call proceeds to
> blk_cleanup_queue() which marks the request queue dead, calls
> elevator exit to release the elevator and puts request queue.
> This releases elevator always and may also release the request
> queue.
>
> If file system is still mounted and using the disk after
> blk_cleanup_queue() has been called, io requests will access
> already free'ed structures and oopses and BUG_ON()s jump in.
[...]
> fs/block_dev.c | 24 ++++++++++++++++++++----
> 1 files changed, 20 insertions(+), 4 deletions(-)
[...]

And in patch 2/2:
> block/blk-core.c | 9 +++++++++
> 1 files changed, 9 insertions(+), 0 deletions(-)

This doesn't look right. I suspect you need to fix the MMC subsystem.
It has to reference-count its objects so that they are not freed as long
as they are used by upper layers, notably by the block subsystem.

OK, actually I am not 100% sure whether the issue is in the lower layers
alone --- I'm only (reasonably) familiar with drivers below the SCSI
subsystem, not with those directly below the block subsystem.

In case of SCSI, low level requests the removal of scsi_device or
Scsi_Host instances when appropriate, and the respective removal
functions in SCSI core make sure that upper layers will not use a
scsi_device after scsi_remove_device (or other flavors of it) anymore;
likewise it guarantees that a Scsi_Host will not be used by upper layers
anymore when scsi_remove_host returns to the low level driver.

So, while I don't know what the block subsystem offers in this regard,
the fix cannot be that the block layer somehow notices that lower layers
went away and goes quiet then; instead the way to go is that lower
layers make sure that the block request queue is destroyed before MMC
device objects are destroyed (or in other words: that the MMC device
objects stay around at least as long as the block request queue exists).

[Quoting in different order than posted:]
> I've been testing mmc driver robustness against rapid card removal
> reinsert cycles and it has been rather easy to get a kernel oopses
> (on 2.6.2Cool because of insufficient locking.

I dare say without knowing the MMC and block system: It is not a
locking issue, it is a reference counting issue.

Make sure to always "get" an object before you hand a pointer to it over
to some store or to another context (and of course "put" it when that
reference is removed from that store or, respectively, not longer used
by that other context). And make sure that an object is destroyed only
when the reference count goes to zero. The kref API does this, and so
do the APIs which are wrapped around it (like get_device, put_device).

So, make sure that the reference to the MMC device which the block layer
needs is counted for as long as there is the request queue.

PS: The kref API is based on an atomic counter so that there is
fundamentally never a need to use locking (mutual exclusion) for proper
reference counting and proper deferral of object destruction. The whole
secret is not to forget to count _all_ references.
--
Stefan Richter
-=====-==--= =-=- =-=-=
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Stefan Richter

External


Since: Jul 26, 2007
Posts: 530



(Msg. 4) Posted: Wed Oct 21, 2009 5:20 pm
Post subject: Re: [PATCH 1/2] Disk hot removal causing oopses and fixes [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Stefan Richter wrote:
> I dare say without knowing the MMC and block system: It is not a
> locking issue, it is a reference counting issue.
[...]
> The whole secret is not to forget to count _all_ references.

PS, in case that it might be needed: Sometimes a low level wants to
wait for the moment until its upper level users went away. Reference
counting is then coupled with constructs like wait_for_completion then.
--
Stefan Richter
-=====-==--= =-=- =-=-=
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Stefan Richter

External


Since: Jul 26, 2007
Posts: 530



(Msg. 5) Posted: Tue Nov 03, 2009 3:20 pm
Post subject: Re: [PATCH 1/2] Disk hot removal causing oopses and fixes [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Jarkko Lavinen wrote:
> Hi Steven
>
> Sorry for late reply.
>
>> It has to reference-count its objects so that they are not freed as long
>> as they are used by upper layers,
>
> The block layer and device removal seems to be designed from
> top-down approach. Althouh disc is referenced from
> __blkdev_get(), disc's request queue is not. Also
> blk_cleanup_queue() calls elevator_exit() without caring if
> anyone still uses the elevator.
[...]

I still don't understand how there can be a problem here. Shouldn't the
sequence be:

1. low-level determines that a device went away
2. low-level takes note that from now on no new requests must be
enqueued anymore
3. low-level calls blk_cleanup_queue
4. blk_cleanup_queue waits until remaining requests are done
(it calls blk_sync_queue)
5. blk_cleanup_queue cleans up block layer data
6. low-level can now clean up/ free its own data

Does the MMC layer miss step 2? Because without this, step 4 would be
in vain.
--
Stefan Richter
-=====-==--= =-== ---==
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Display posts from previous:   
Related Topics:
[PATCH] Documentation: How to use GDB to decode OOPSes - From: Pekka Enberg <penberg@cs.helsinki.fi> Adds instructions how to use GDB to figure out the exact location of...

[PATCH] Print number of oopses in Sysrq-P output - From: Pavel Emelianov <xemul@openvz.org> Useful in deciding whether said output should be ignored in absence of ...

[PATCH] Invalid return value of execve() resulting in oopses - From: Alexey Kuznetsov <alexey@openvz.org> When elf loader fails to map executable (due to memory shortage or be...

[PATCH] MTRR: Fix race causing set_mtrr to go into infinit.. - Processors synchronization in set_mtrr requires the .gate field to be set after .count field is properly initialized...

Nbd problem now oopses. - Hi, After turning on the debugging for allocations and locks, I now get a kernel ooops. [ 5628.608000] BUG: unabl...

[BUG] 2.6.20 Oopses in xfrm_audit_log - Hi All, i upgraded to vanilla kernel 2.6.20 and while i was using strongswan 2.8.2 to setup an IPSEC VPN i got the..
       Soft32 Home -> Linux -> Kernel All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Categories:
 Windows
  Linux
 Mac
 PDA


[ Contact us | Terms of Service/Privacy Policy ]