Welcome to Soft32 Linux Forums!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

[PATCH] futex: add FUTEX_SET_WAIT operation

 
Goto page 1, 2, 3
   Soft32 Home -> Linux -> Kernel RSS
Next:  [PATCH] ARM: Add spi controller driver support fo..  
Author Message
Michel Lespinasse

External


Since: Nov 17, 2009
Posts: 6



(Msg. 1) Posted: Tue Nov 17, 2009 4:49 am
Post subject: [PATCH] futex: add FUTEX_SET_WAIT operation
Archived from groups: linux>kernel (more info?)

Hi,

I would like to propose adding a new FUTEX_SET_WAIT operation to the futex
code, as per the following patch against 2.6.32-rc7

The change adds a new FUTEX_SET_WAIT operation, which is similar to
FUTEX_WAIT_BITSET except for the addition of an additional 'val2'
parameter, which is an integer and is passed in place of the 'uaddr2'
parameter to the futex syscall.

When val2==val, the behavior is identical to FUTEX_WAIT_BITSET: The
kernel verifies that *uval == val, and waits if that condition is
satisfied. The typical use case is that userspace inspects the futex
value, finds out that it needs to block based on that value, changes the
value to something that indicates it wants to be woken up, and then goes
to execute the futex syscall with a FUTEX_WAIT or FUTEX_WAIT_BITSET
operation.

With the new FUTEX_SET_WAIT operation, the change of the futex value to
indicate a wakeup is desired is done atomically with the kernel's
inspection of the futex value to figure out if it is still necessary to
wait. That is, the kernel inspects the futex value and if 'val' is
found, atomically changes it to 'val2'. Then if the new futex value is
'val2' (either because that was the original value when the kernel first
inspected it, or because the atomic update from val to val2 succeeded),
the thread goes to wait on the futex.

By doing the futex value update atomically with the kernel's inspection
of it to decide to wait, we avoid the time window where the futex has
been set to the 'please wake me up' state, but the thread has not been
queued onto the hash bucket yet. This has two effects:
- Avoids a futex syscall with the FUTEX_WAKE operation if there is no
thread to be woken yet
- In the heavily contended case, avoids waking an extra thread that's
only likely to make the contention problem worse.

Sample test results, on a sun x4600 M2, using the test program included
after the diff (dumb lock/unlock stress test, comparing FUTEX_SET_WAIT
with FUTEX_WAIT):

FUTEX_SET_WAIT test
1 threads: 45662 Kiter/s (2.19s user 0.00s system 2.19s wall 1.00 cores)
2 threads: 11834 Kiter/s (11.07s user 4.70s system 8.45s wall 1.87 cores)
3 threads: 9425 Kiter/s (11.10s user 14.73s system 10.61s wall 2.43 cores)
4 threads: 20790 Kiter/s (5.73s user 10.53s system 4.81s wall 3.38 cores)
5 threads: 21505 Kiter/s (5.05s user 14.02s system 4.65s wall 4.10 cores)
6 threads: 18904 Kiter/s (5.64s user 19.07s system 5.29s wall 4.67 cores)
8 threads: 17212 Kiter/s (6.10s user 28.39s system 5.81s wall 5.94 cores)
10 threads: 19493 Kiter/s (5.20s user 35.82s system 5.13s wall 8.00 cores)
12 threads: 20325 Kiter/s (4.92s user 42.52s system 4.92s wall 9.64 cores)
16 threads: 22026 Kiter/s (4.64s user 56.58s system 4.54s wall 13.48 cores)
24 threads: 23041 Kiter/s (4.33s user 84.76s system 4.34s wall 20.53 cores)
32 threads: 23585 Kiter/s (4.11s user 112.75s system 4.24s wall 27.56 cores)
64 threads: 25907 Kiter/s (3.93s user 115.99s system 3.86s wall 31.07 cores)
128 threads: 26455 Kiter/s (4.02s user 114.50s system 3.78s wall 31.35 cores)
256 threads: 26596 Kiter/s (3.93s user 114.55s system 3.76s wall 31.51 cores)
512 threads: 26596 Kiter/s (3.92s user 114.74s system 3.76s wall 31.56 cores)
1024 threads: 26525 Kiter/s (3.95s user 115.96s system 3.77s wall 31.81 cores)

FUTEX_WAIT test
1 threads: 46083 Kiter/s (2.17s user 0.00s system 2.17s wall 1.00 cores)
2 threads: 10811 Kiter/s (12.39s user 4.71s system 9.25s wall 1.85 cores)
3 threads: 5353 Kiter/s (21.02s user 25.85s system 18.68s wall 2.51 cores)
4 threads: 4277 Kiter/s (27.12s user 54.89s system 23.38s wall 3.51 cores)
5 threads: 3861 Kiter/s (24.51s user 85.21s system 25.90s wall 4.24 cores)
6 threads: 3540 Kiter/s (20.37s user 125.47s system 28.25s wall 5.16 cores)
8 threads: 7257 Kiter/s (12.11s user 81.09s system 13.78s wall 6.76 cores)
10 threads: 8271 Kiter/s (10.87s user 90.33s system 12.09s wall 8.37 cores)
12 threads: 10965 Kiter/s (9.16s user 88.66s system 9.12s wall 10.73 cores)
16 threads: 14472 Kiter/s (7.24s user 95.69s system 6.91s wall 14.90 cores)
24 threads: 17331 Kiter/s (6.01s user 123.90s system 5.77s wall 22.51 cores)
32 threads: 18939 Kiter/s (5.60s user 155.93s system 5.28s wall 30.59 cores)
64 threads: 18727 Kiter/s (5.66s user 162.57s system 5.34s wall 31.50 cores)
128 threads: 18349 Kiter/s (5.56s user 167.46s system 5.45s wall 31.75 cores)
256 threads: 17271 Kiter/s (5.41s user 178.54s system 5.79s wall 31.77 cores)
512 threads: 16207 Kiter/s (5.15s user 191.55s system 6.17s wall 31.88 cores)
1024 threads: 14948 Kiter/s (4.72s user 208.38s system 6.69s wall 31.85 cores)


Signed-off-by: Michel Lespinasse <walken RemoveThis @google.com>

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..c5e887d 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,7 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_SET_WAIT 13

#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -39,6 +40,7 @@
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)

/*
* Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index a8cc4e1..a199606 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -25,6 +25,7 @@ struct restart_block {
struct {
u32 *uaddr;
u32 val;
+ u32 val2;
u32 flags;
u32 bitset;
u64 time;
diff --git a/kernel/futex.c b/kernel/futex.c
index fb65e82..be9de2b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1694,6 +1694,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* futex_wait_setup() - Prepare to wait on a futex
* @uaddr: the futex userspace address
* @val: the expected value
+ * @val2: the value we want to set, in replacement of val
* @fshared: whether the futex is shared (1) or not (0)
* @q: the associated futex_q
* @hb: storage for hash_bucket pointer to be returned to caller
@@ -1704,10 +1705,10 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* with no q.key reference on failure.
*
* Returns:
- * 0 - uaddr contains val and hb has been locked
+ * 0 - uaddr contains val2 and hb has been locked
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
+static int futex_wait_setup(u32 __user *uaddr, u32 val, u32 val2, int fshared,
struct futex_q *q, struct futex_hash_bucket **hb)
{
u32 uval;
@@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
*
* The basic logical guarantee of a futex is that it blocks ONLY
* if cond(var) is known to be true at the time of blocking, for
- * any cond. If we queued after testing *uaddr, that would open
- * a race condition where we could block indefinitely with
+ * any cond. If we locked the hash-bucket after testing *uaddr, that
+ * would open a race condition where we could block indefinitely with
* cond(var) false, which would violate the guarantee.
*
- * A consequence is that futex_wait() can return zero and absorb
- * a wakeup when *uaddr != val on entry to the syscall. This is
- * rare, but normal.
+ * On the other hand, we insert q and release the hash-bucket only
+ * after testing *uaddr. This guarantees that futex_wait() will NOT
+ * absorb a wakeup if *uaddr does not match the desired values
+ * while the syscall executes.
*/
retry:
q->key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &q->key,
+ (val == val2) ? VERIFY_READ : VERIFY_WRITE);
if (unlikely(ret != 0))
return ret;

retry_private:
*hb = queue_lock(q);

- ret = get_futex_value_locked(&uval, uaddr);
-
- if (ret) {
+ pagefault_disable();
+ if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
+ pagefault_enable();
queue_unlock(q, *hb);
-
ret = get_user(uval, uaddr);
+ fault_common:
if (ret)
goto out;
-
if (!fshared)
goto retry_private;
-
put_futex_key(fshared, &q->key);
goto retry;
}
-
- if (uval != val) {
- queue_unlock(q, *hb);
- ret = -EWOULDBLOCK;
+ if (val != val2 && uval == val) {
+ uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
+ if (unlikely(uval == -EFAULT)) {
+ pagefault_enable();
+ queue_unlock(q, *hb);
+ ret = fault_in_user_writeable(uaddr);
+ goto fault_common;
+ }
}
+ pagefault_enable();
+
+ if (uval == val || uval == val2)
+ return 0; /* success */

+ queue_unlock(q, *hb);
+ ret = -EWOULDBLOCK;
out:
- if (ret)
- put_futex_key(fshared, &q->key);
+ put_futex_key(fshared, &q->key);
return ret;
}

-static int futex_wait(u32 __user *uaddr, int fshared,
- u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int fshared, u32 val, u32 val2,
+ ktime_t *abs_time, u32 bitset, int clockrt)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
@@ -1795,7 +1805,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,

retry:
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val2, fshared, &q, &hb);
if (ret)
goto out;

@@ -1827,6 +1837,7 @@ retry:
restart->fn = futex_wait_restart;
restart->futex.uaddr = (u32 *)uaddr;
restart->futex.val = val;
+ restart->futex.val2 = val2;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
restart->futex.flags = FLAGS_HAS_TIMEOUT;
@@ -1862,8 +1873,8 @@ static long futex_wait_restart(struct restart_block *restart)
restart->fn = do_no_restart_syscall;
if (restart->futex.flags & FLAGS_SHARED)
fshared = 1;
- return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
- restart->futex.bitset,
+ return (long)futex_wait(uaddr, fshared, restart->futex.val,
+ restart->futex.val2, tp, restart->futex.bitset,
restart->futex.flags & FLAGS_CLOCKRT);
}

@@ -2219,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = &key2;

/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val, fshared, &q, &hb);
if (ret)
goto out_key2;

@@ -2532,14 +2543,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
fshared = 1;

clockrt = op & FUTEX_CLOCK_REALTIME;
- if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ if (clockrt && cmd != FUTEX_WAIT_BITSET &&
+ cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_SET_WAIT)
return -ENOSYS;

switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ val2 = val;
+ case FUTEX_SET_WAIT:
+ ret = futex_wait(uaddr, fshared, val, val2, timeout, val3,
+ clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
@@ -2595,7 +2610,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,

if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI || cmd == FUTEX_SET_WAIT)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
@@ -2609,10 +2624,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
/*
* requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
* number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
+ * new uval value in 'uaddr2' if cmd == FUTEX_SET_WAIT.
*/
if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
val2 = (u32) (unsigned long) utime;
+ else if (cmd == FUTEX_SET_WAIT) {
+ if (!futex_cmpxchg_enabled)
+ return -ENOSYS;
+ val2 = (u32) (unsigned long) uaddr2;
+ }

return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}




test code:

#include <stdio.h>
#include <errno.h>
#include <linux/unistd.h>
#include <linux/futex.h>
#include <limits.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/times.h>
#include <sys/syscall.h>

#define FUTEX_SET_WAIT 13
#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)

static inline int sys_futex(int * uaddr, int op, int val,
struct timespec * timeout, int * uaddr2, int val3)
{
return syscall(__NR_futex, uaddr, op, val, timeout, uaddr2, val3);
}

static inline int cmpxchg(volatile int *ptr, int old, int new)
{
int prev;
asm volatile("lock; cmpxchgl %1,%2"
: "=a" (prev)
: "r"(new), "m"(*ptr), "0"(old)
: "memory");
return prev;
}



/***** Sample futex based lock/unlock operations *****/

/* States: 0 = unlocked; 1 = locked ; 2 = locked with possible waiters */

/* Classic lock, using FUTEX_WAIT */
static inline void futex_wait_lock(volatile int *ptr)
{
int status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, 1);
while (status != 0) {
if (status == 1)
status = cmpxchg(ptr, 1, 2);
if (status != 0) {
sys_futex((int *)ptr, FUTEX_WAIT_PRIVATE, 2, NULL, NULL, 0);
status = *ptr;
}
if (status == 0)
status = cmpxchg(ptr, 0, 2);
}
}

/* Optimized lock, using the proposed FUTEX_SET_WAIT operation */
static inline void futex_setwait_lock(volatile int *ptr)
{
int status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, 1);
if (status != 0) {
int desired_status = 1;
do {
if (sys_futex((int *)ptr, FUTEX_SET_WAIT_PRIVATE, 1, NULL,
(int *)2, ~0) == 0) {
/* We absorbed a wakeup; so make sure to unblock next thread */
desired_status = 2;
}
status = *ptr;
if (status == 0)
status = cmpxchg(ptr, 0, desired_status);
} while (status != 0);
}
}

/* Unlock. cmpxchg is not optimal for this sample 3-state locking protocol;
but it is often required when implementing more complex locking protocols */
static inline void futex_cmpxchg_unlock(volatile int *ptr)
{
int status = *ptr;
if (status == 1)
status = cmpxchg(ptr, 1, 0);
if (status == 2) {
cmpxchg(ptr, 2, 0); /* Guaranteed to succeed in 3-state protocol */
sys_futex((int *)ptr, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
}
}



/***** Lock/unlock perf test *****/

struct thread_barrier {
int threads;
int unblock;
};

static void barrier_sync(struct thread_barrier *barrier);

struct locktest_shared {
struct thread_barrier barrier_before;
struct thread_barrier barrier_after;
int loops;
int test_futex;
};

static void * futex_wait_test(void * dummy)
{
struct locktest_shared * shared = dummy;
int i = shared->loops;
barrier_sync(&shared->barrier_before);
while (i--) {
futex_wait_lock(&shared->test_futex);
futex_cmpxchg_unlock(&shared->test_futex);
}
barrier_sync(&shared->barrier_after);
return NULL;
}

static void * futex_setwait_test(void * dummy)
{
struct locktest_shared * shared = dummy;
int i = shared->loops;
barrier_sync(&shared->barrier_before);
while (i--) {
futex_setwait_lock(&shared->test_futex);
futex_cmpxchg_unlock(&shared->test_futex);
}
barrier_sync(&shared->barrier_after);
return NULL;
}



/***** Test harness & support functions *****/


static inline void decrement(int *ptr)
{
asm volatile("lock; decl %0"
: "=m" (*ptr)
: "m" (*ptr));
}

/* Called by main thread to initialize barrier */
static void barrier_init(struct thread_barrier *barrier, int threads)
{
barrier->threads = threads;
barrier->unblock = 0;
}

/* Called by worker threads to synchronize with main thread */
static void barrier_sync(struct thread_barrier *barrier)
{
decrement(&barrier->threads);
if (barrier->threads == 0)
sys_futex(&barrier->threads, FUTEX_WAKE_PRIVATE, 1, NULL, NULL, 0);
while (barrier->unblock == 0)
sys_futex(&barrier->unblock, FUTEX_WAIT_PRIVATE, 0, NULL, NULL, 0);
}

/* Called by main thread to wait for all workers to reach sync point */
static void barrier_wait(struct thread_barrier *barrier)
{
int threads;
while ((threads = barrier->threads) > 0)
sys_futex(&barrier->threads, FUTEX_WAIT_PRIVATE, threads, NULL, NULL, 0);
}

/* Called by main thread to unblock worker threads from their sync point */
static void barrier_unblock(struct thread_barrier *barrier)
{
barrier->unblock = 1;
sys_futex(&barrier->unblock, FUTEX_WAKE_PRIVATE, INT_MAX, NULL, NULL, 0);
}


static void locktest(void * thread_function(void *), int iterations,
int num_threads)
{
struct locktest_shared shared;
pthread_t thread[num_threads];
int i;
clock_t before, after;
struct tms tms_before, tms_after;
int wall, user, system;
double tick;

barrier_init(&shared.barrier_before, num_threads);
barrier_init(&shared.barrier_after, num_threads);
shared.loops = iterations / num_threads;
shared.test_futex = 0;

for (i = 0; i < num_threads; i++)
pthread_create(thread + i, NULL, thread_function, &shared);
barrier_wait(&shared.barrier_before);
before = times(&tms_before);
barrier_unblock(&shared.barrier_before);
barrier_wait(&shared.barrier_after);
after = times(&tms_after);
wall = after - before;
user = tms_after.tms_utime - tms_before.tms_utime;
system = tms_after.tms_stime - tms_before.tms_stime;
tick = 1.0 / sysconf(_SC_CLK_TCK);
printf("%d threads: %.0f Kiter/s "
"(%.2fs user %.2fs system %.2fs wall %.2f cores)\n",
num_threads, (num_threads * shared.loops) / (wall * tick * 1000),
user * tick, system * tick, wall * tick,
wall ? (user + system) * 1. / wall : 1.);
barrier_unblock(&shared.barrier_after);
for (i = 0; i < num_threads; i++)
pthread_join(thread[i], NULL);
}

int futex_setwait_supported(void)
{
int futex = 0;
sys_futex(&futex, FUTEX_SET_WAIT_PRIVATE, 1, NULL, (int *)2, ~0);
return errno == EWOULDBLOCK;
}

int main (void)
{
int threads[] = { 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 24, 32,
64, 128, 256, 512, 1024, 0 };
int i;
if (futex_setwait_supported()) {
printf("\nFUTEX_SET_WAIT test\n");
for (i = 0; threads[i]; i++)
locktest(futex_setwait_test, 100000000, threads[i]);
}
printf("\nFUTEX_WAIT test\n");
for (i = 0; threads[i]; i++)
locktest(futex_wait_test, 100000000, threads[i]);
printf("\n");
return 0;
}


--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Ingo Molnar

External


Since: Nov 05, 2003
Posts: 3016



(Msg. 2) Posted: Tue Nov 17, 2009 4:49 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Michel Lespinasse <walken.DeleteThis@google.com> wrote:

> Sample test results, on a sun x4600 M2, using the test program
> included after the diff (dumb lock/unlock stress test, comparing
> FUTEX_SET_WAIT with FUTEX_WAIT):
>
> FUTEX_SET_WAIT test
> 1 threads: 45662 Kiter/s (2.19s user 0.00s system 2.19s wall 1.00 cores)
> 2 threads: 11834 Kiter/s (11.07s user 4.70s system 8.45s wall 1.87 cores)
> 3 threads: 9425 Kiter/s (11.10s user 14.73s system 10.61s wall 2.43 cores)
> 4 threads: 20790 Kiter/s (5.73s user 10.53s system 4.81s wall 3.38 cores)
> 5 threads: 21505 Kiter/s (5.05s user 14.02s system 4.65s wall 4.10 cores)
> 6 threads: 18904 Kiter/s (5.64s user 19.07s system 5.29s wall 4.67 cores)
> 8 threads: 17212 Kiter/s (6.10s user 28.39s system 5.81s wall 5.94 cores)
> 10 threads: 19493 Kiter/s (5.20s user 35.82s system 5.13s wall 8.00 cores)
> 12 threads: 20325 Kiter/s (4.92s user 42.52s system 4.92s wall 9.64 cores)
> 16 threads: 22026 Kiter/s (4.64s user 56.58s system 4.54s wall 13.48 cores)
> 24 threads: 23041 Kiter/s (4.33s user 84.76s system 4.34s wall 20.53 cores)
> 32 threads: 23585 Kiter/s (4.11s user 112.75s system 4.24s wall 27.56 cores)
> 64 threads: 25907 Kiter/s (3.93s user 115.99s system 3.86s wall 31.07 cores)
> 128 threads: 26455 Kiter/s (4.02s user 114.50s system 3.78s wall 31.35 cores)
> 256 threads: 26596 Kiter/s (3.93s user 114.55s system 3.76s wall 31.51 cores)
> 512 threads: 26596 Kiter/s (3.92s user 114.74s system 3.76s wall 31.56 cores)
> 1024 threads: 26525 Kiter/s (3.95s user 115.96s system 3.77s wall 31.81 cores)
>
> FUTEX_WAIT test
> 1 threads: 46083 Kiter/s (2.17s user 0.00s system 2.17s wall 1.00 cores)
> 2 threads: 10811 Kiter/s (12.39s user 4.71s system 9.25s wall 1.85 cores)
> 3 threads: 5353 Kiter/s (21.02s user 25.85s system 18.68s wall 2.51 cores)
> 4 threads: 4277 Kiter/s (27.12s user 54.89s system 23.38s wall 3.51 cores)
> 5 threads: 3861 Kiter/s (24.51s user 85.21s system 25.90s wall 4.24 cores)
> 6 threads: 3540 Kiter/s (20.37s user 125.47s system 28.25s wall 5.16 cores)
> 8 threads: 7257 Kiter/s (12.11s user 81.09s system 13.78s wall 6.76 cores)
> 10 threads: 8271 Kiter/s (10.87s user 90.33s system 12.09s wall 8.37 cores)
> 12 threads: 10965 Kiter/s (9.16s user 88.66s system 9.12s wall 10.73 cores)
> 16 threads: 14472 Kiter/s (7.24s user 95.69s system 6.91s wall 14.90 cores)
> 24 threads: 17331 Kiter/s (6.01s user 123.90s system 5.77s wall 22.51 cores)
> 32 threads: 18939 Kiter/s (5.60s user 155.93s system 5.28s wall 30.59 cores)
> 64 threads: 18727 Kiter/s (5.66s user 162.57s system 5.34s wall 31.50 cores)
> 128 threads: 18349 Kiter/s (5.56s user 167.46s system 5.45s wall 31.75 cores)
> 256 threads: 17271 Kiter/s (5.41s user 178.54s system 5.79s wall 31.77 cores)
> 512 threads: 16207 Kiter/s (5.15s user 191.55s system 6.17s wall 31.88 cores)
> 1024 threads: 14948 Kiter/s (4.72s user 208.38s system 6.69s wall 31.85 cores)

This test program looks really useful.

Would you be interested in adding it as a 'perf bench futex' testcase?
That way kernel developers could monitor futex performance in the future
as well.

See 'perf bench' in the latest perf events tree:

http://people.redhat.com/mingo/tip.git/README

Do 'cd tools/perf; make -j install' to install perf.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Peter Zijlstra

External


Since: Jun 06, 2007
Posts: 394



(Msg. 3) Posted: Tue Nov 17, 2009 4:49 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>
> Would you be interested in adding it as a 'perf bench futex' testcase?
> That way kernel developers could monitor futex performance in the future
> as well.
>
> See 'perf bench' in the latest perf events tree:
>
> http://people.redhat.com/mingo/tip.git/README
>
> Do 'cd tools/perf; make -j install' to install perf.

Darren has recently been putting a lot of effort in making a futex test
suite:

http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git

Its not yet complete, but hopefully it will soon be Smile

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Linus Torvalds

External


Since: Jan 22, 2007
Posts: 910



(Msg. 4) Posted: Tue Nov 17, 2009 11:20 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, 17 Nov 2009, Peter Zijlstra wrote:
>
> (long quote for Linus' benefit, added an old patch to the tail)

Both look conceptually sane to me.

The FUTEX_SET_WAIT concept seems well-defined, although it sounds more
like a FUTEX_CMPXCHG_WAIT to me than a "SET" operation. I'm not entirely
sure that we really want to do the CMPXCHG in the kernel rather than in
user space, since lock stealing generally isn't a problem, but I don't
think it's _wrong_ to add this concept.

In fact, CMPXCHG is generally seen to be the "fundamental" base for
implementing locking, so in that sense it makes perfect sense to have it
as a FUTEX model.

That said, I personally think the adaptive wait model is (a) more likely
to fix many performance issues and (b) a bit more high-level concept, so I
like Peter's patch too, but I don't see that the patches would really be
mutually exclusive.

Of course, it's possible that Michel's performance problem is fixed by the
adaptive approach too, in which case the FUTEX_SET_WAIT (or _CMPXCHG_WAIT)
patch is just fundamentally less interesting. But some people do need
fairness - even when it's bad for performance - so...

One thing that does strike me is that _if_ we want to do both interfaces,
then I would assume that we quite likely also want to have an adaptive
version of the FUTEX_SET|CMPXCHG_WAIT thing. Which perhaps implies that
the "ADAPTIVE" part should be a bitflag in the command value?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 5) Posted: Tue Nov 17, 2009 11:20 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Peter Zijlstra wrote:
> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>> Would you be interested in adding it as a 'perf bench futex' testcase?
>> That way kernel developers could monitor futex performance in the future
>> as well.
>>
>> See 'perf bench' in the latest perf events tree:
>>
>> http://people.redhat.com/mingo/tip.git/README
>>
>> Do 'cd tools/perf; make -j install' to install perf.
>
> Darren has recently been putting a lot of effort in making a futex test
> suite:
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Its not yet complete, but hopefully it will soon be Smile
>

Michael, would you be willing to include a version of this test in the
above test suite? If so, then in keeping with the rest of the test
suite, I would recommend splitting into two tests, one of each opcode
being tested, and add argument to define thread count. The run.sh
script would then run each thread count as a separate test run.

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Ingo Molnar

External


Since: Nov 05, 2003
Posts: 3016



(Msg. 6) Posted: Tue Nov 17, 2009 1:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

* Peter Zijlstra <peterz DeleteThis @infradead.org> wrote:

> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
> >
> > Would you be interested in adding it as a 'perf bench futex' testcase?
> > That way kernel developers could monitor futex performance in the future
> > as well.
> >
> > See 'perf bench' in the latest perf events tree:
> >
> > http://people.redhat.com/mingo/tip.git/README
> >
> > Do 'cd tools/perf; make -j install' to install perf.
>
> Darren has recently been putting a lot of effort in making a futex test
> suite:
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>
> Its not yet complete, but hopefully it will soon be Smile

Looks nice! Any futex performance measurements within it could be added
to perf bench - with standardized options and output, etc.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 7) Posted: Tue Nov 17, 2009 1:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Michel Lespinasse wrote:
> Hi,
>

Hi Michel,

Thanks for the excellent writeup. The concept looks reasonable, and
useful, to me. Just a few thoughts, comments below.

> ...

>
> By doing the futex value update atomically with the kernel's inspection
> of it to decide to wait, we avoid the time window where the futex has
> been set to the 'please wake me up' state, but the thread has not been
> queued onto the hash bucket yet. This has two effects:
> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
> thread to be woken yet

This also reduces lock contention on the hash-bucket locks, another plus.

> - In the heavily contended case, avoids waking an extra thread that's
> only likely to make the contention problem worse.

I'm not seeing this. What is the extra thread that would be woken which
isn't with FUTEX_SET_WAIT?

> ...

>
> Signed-off-by: Michel Lespinasse <walken.TakeThisOut@google.com>
>
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 1e5a26d..c5e887d 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -20,6 +20,7 @@
> #define FUTEX_WAKE_BITSET 10
> #define FUTEX_WAIT_REQUEUE_PI 11
> #define FUTEX_CMP_REQUEUE_PI 12
> +#define FUTEX_SET_WAIT 13
>
> #define FUTEX_PRIVATE_FLAG 128
> #define FUTEX_CLOCK_REALTIME 256
> @@ -39,6 +40,7 @@
> FUTEX_PRIVATE_FLAG)
> #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
> FUTEX_PRIVATE_FLAG)
> +#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
>
> /*
> * Support for robust futexes: the kernel cleans up held futexes at
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index a8cc4e1..a199606 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -25,6 +25,7 @@ struct restart_block {
> struct {
> u32 *uaddr;
> u32 val;
> + u32 val2;

It's a nitpic, but val2 is used in the futex syscall arguments already
and another variable of the same name that is actually initially derived
from uaddr2... is more likely to confuse than not. Perhaps "setval"?
Throughout the patch.


> @@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
> *
> * The basic logical guarantee of a futex is that it blocks ONLY
> * if cond(var) is known to be true at the time of blocking, for
> - * any cond. If we queued after testing *uaddr, that would open
> - * a race condition where we could block indefinitely with
> + * any cond. If we locked the hash-bucket after testing *uaddr, that
> + * would open a race condition where we could block indefinitely with
> * cond(var) false, which would violate the guarantee.
> *
> - * A consequence is that futex_wait() can return zero and absorb
> - * a wakeup when *uaddr != val on entry to the syscall. This is
> - * rare, but normal.
> + * On the other hand, we insert q and release the hash-bucket only
> + * after testing *uaddr. This guarantees that futex_wait() will NOT
> + * absorb a wakeup if *uaddr does not match the desired values
> + * while the syscall executes.
> */
> retry:
> q->key = FUTEX_KEY_INIT;
> - ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
> + ret = get_futex_key(uaddr, fshared, &q->key,
> + (val == val2) ? VERIFY_READ : VERIFY_WRITE);

Have you compared the performance of FUTEX_WAIT before and after the
application of this patch? I'd be interested to see your test results on
a prepatched kernel (with the FUTEX_SET_WAIT side commented out of course).

> if (unlikely(ret != 0))
> return ret;
>
> retry_private:
> *hb = queue_lock(q);
>
> - ret = get_futex_value_locked(&uval, uaddr);
> -
> - if (ret) {
> + pagefault_disable();
> + if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
> + pagefault_enable();

What about the addition of val2 makes it so we have to expand
get_futex_value_locked() here with the nested fault handling?

> queue_unlock(q, *hb);
> -

Superfluous whitespace change

> ret = get_user(uval, uaddr);
> + fault_common:

Inconsistent label indentation with the rest of the file.

> if (ret)
> goto out;
> -
> if (!fshared)
> goto retry_private;
> -
> put_futex_key(fshared, &q->key);
> goto retry;
> }
> -

Several of superfluous whitespace changes

> - if (uval != val) {
> - queue_unlock(q, *hb);
> - ret = -EWOULDBLOCK;
> + if (val != val2 && uval == val) {
> + uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
> + if (unlikely(uval == -EFAULT)) {
> + pagefault_enable();
> + queue_unlock(q, *hb);
> + ret = fault_in_user_writeable(uaddr);
> + goto fault_common;
> + }
> }
> + pagefault_enable();
> +
> + if (uval == val || uval == val2)
> + return 0; /* success */

If the comment is necessary, please give it its own line (I've done a
lot of futex commentary cleanup recently and am a little sensitive to
maintaining that Smile.

I'd rather not add another return point to the code, even if it saves us
an if statement. You've already tested for uval == val above, perhaps
this test can be integrated in the above blocks and then use the common
out: label?

Now I need to review Peter's Adaptive bits and think on how these two
relate...

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 8) Posted: Tue Nov 17, 2009 1:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Ingo Molnar wrote:
> * Peter Zijlstra <peterz DeleteThis @infradead.org> wrote:
>
>> On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
>>> Would you be interested in adding it as a 'perf bench futex' testcase?
>>> That way kernel developers could monitor futex performance in the future
>>> as well.
>>>
>>> See 'perf bench' in the latest perf events tree:
>>>
>>> http://people.redhat.com/mingo/tip.git/README
>>>
>>> Do 'cd tools/perf; make -j install' to install perf.
>> Darren has recently been putting a lot of effort in making a futex test
>> suite:
>>
>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>
>> Its not yet complete, but hopefully it will soon be Smile
>
> Looks nice! Any futex performance measurements within it could be added
> to perf bench - with standardized options and output, etc.
>
> Ingo

I'll take a look at perf this week and see what would be required to
share (?) tests between perf and futextest.

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 9) Posted: Tue Nov 17, 2009 7:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Michel Lespinasse wrote:
> Hi,
>
> I would like to propose adding a new FUTEX_SET_WAIT operation to the futex
> code, as per the following patch against 2.6.32-rc7

If we go ahead with this, I'd like to adapt FUTEX_WAIT_REQUEUE_PI to use
this as well, as it faces all the same races related to the calling task
getting to sleep in the kernel before a wake call (FUTEX_CMP_REQUEUE_PI
in this case) as futex_wait does. FUTEX_SET_WAIT_REQUEUE_PI is
ridiculously long... but it is what it is I guess Smile

--
Darren "not-as-brave-as-peterz (short quote for Linus)" Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Hitoshi Mitake

External


Since: Oct 05, 2009
Posts: 41



(Msg. 10) Posted: Tue Nov 17, 2009 9:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

From: Ingo Molnar <mingo DeleteThis @elte.hu>
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation
Date: Tue, 17 Nov 2009 18:24:56 +0100

>
> * Peter Zijlstra <peterz DeleteThis @infradead.org> wrote:
>
> > On Tue, 2009-11-17 at 09:18 +0100, Ingo Molnar wrote:
> > >
> > > Would you be interested in adding it as a 'perf bench futex' testcase?
> > > That way kernel developers could monitor futex performance in the future
> > > as well.
> > >
> > > See 'perf bench' in the latest perf events tree:
> > >
> > > http://people.redhat.com/mingo/tip.git/README
> > >
> > > Do 'cd tools/perf; make -j install' to install perf.
> >
> > Darren has recently been putting a lot of effort in making a futex test
> > suite:
> >
> > http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
> >
> > Its not yet complete, but hopefully it will soon be Smile
>
> Looks nice! Any futex performance measurements within it could be added
> to perf bench - with standardized options and output, etc.

It is interesting for me, too.
Can I add your test programs to perf bench, Michel and Darren?
Of course, I'll do every odd job if you permit.

Hitoshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Michel Lespinasse

External


Since: Nov 17, 2009
Posts: 6



(Msg. 11) Posted: Tue Nov 17, 2009 11:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Nov 17, 2009 at 09:22:18AM -0800, Darren Hart wrote:
>> By doing the futex value update atomically with the kernel's inspection
>> of it to decide to wait, we avoid the time window where the futex has
>> been set to the 'please wake me up' state, but the thread has not been
>> queued onto the hash bucket yet. This has two effects:
>> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
>> thread to be woken yet
>
> This also reduces lock contention on the hash-bucket locks, another plus.

Yes Smile

>> - In the heavily contended case, avoids waking an extra thread that's
>> only likely to make the contention problem worse.
>
> I'm not seeing this. What is the extra thread that would be woken which
> isn't with FUTEX_SET_WAIT?

I meant that when several threads get queued on the futex, one can
choose to wake only one of them when releasing the futex - the remaining
ones stay queued and will ideally be woken only by a thread that already
blocked, or when a new thread blocks.

> It's a nitpic, but val2 is used in the futex syscall arguments already
> and another variable of the same name that is actually initially derived
> from uaddr2... is more likely to confuse than not. Perhaps "setval"?
> Throughout the patch.

OK, I renamed val2 into setval in futex_wait and futex_wait_setup.
I did not do this in do_futex - I followed the example of bitset vs val3
(val2 is just a made up name for one of the sys_futex pointer arguments
casted to integer, I did not want to add a second such variable).

> Have you compared the performance of FUTEX_WAIT before and after the
> application of this patch? I'd be interested to see your test results on
> a prepatched kernel (with the FUTEX_SET_WAIT side commented out of
> course).

Performance seems to be similar as far as I can tell. It's hard to gauge it
for the 2-24 threads case, as there is quite a bit of run-to-run noise there
depending on where your threads land up. In the >=32 threads case, it looks
like the patched kernel is actually slightly faster. It's not clear to me if
this is pure luck or if it's due to getting the fault patch out of the way
with the unlikely() if statement.

FUTEX_WAIT test (with unpatched 2.6.32-rc7):
1 threads: 45045 Kiter/s (2.21s user 0.00s system 2.22s wall 1.00 cores)
2 threads: 17483 Kiter/s (3.98s user 3.80s system 5.72s wall 1.36 cores)
3 threads: 5984 Kiter/s (16.78s user 24.06s system 16.71s wall 2.44 cores)
4 threads: 2910 Kiter/s (21.61s user 86.48s system 34.36s wall 3.15 cores)
5 threads: 4619 Kiter/s (17.53s user 78.59s system 21.65s wall 4.44 cores)
6 threads: 6549 Kiter/s (14.01s user 68.49s system 15.27s wall 5.40 cores)
8 threads: 7663 Kiter/s (12.21s user 72.55s system 13.05s wall 6.50 cores)
10 threads: 7831 Kiter/s (11.54s user 91.41s system 12.77s wall 8.06 cores)
12 threads: 10941 Kiter/s (8.81s user 79.86s system 9.14s wall 9.70 cores)
16 threads: 13661 Kiter/s (7.51s user 92.31s system 7.32s wall 13.64 cores)
24 threads: 17483 Kiter/s (6.13s user 118.48s system 5.72s wall 21.78 cores)
32 threads: 18215 Kiter/s (5.91s user 148.16s system 5.49s wall 28.06 cores)
64 threads: 18349 Kiter/s (5.72s user 160.06s system 5.45s wall 30.42 cores)
128 threads: 18382 Kiter/s (5.68s user 166.96s system 5.44s wall 31.74 cores)
256 threads: 17007 Kiter/s (5.30s user 181.25s system 5.88s wall 31.73 cores)
512 threads: 16026 Kiter/s (4.79s user 193.34s system 6.24s wall 31.75 cores)
1024 threads: 15432 Kiter/s (4.38s user 202.08s system 6.48s wall 31.86 cores)

>> - ret = get_futex_value_locked(&uval, uaddr);
>> -
>> - if (ret) {
>> + pagefault_disable();
>> + if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
>> + pagefault_enable();
>
> What about the addition of val2 makes it so we have to expand
> get_futex_value_locked() here with the nested fault handling?

This is because I did not want to re-enable page faults right afterwards
if we end up going to the futex_atomic_cmpxchg_inatomic path...

I applied your other feedback into the updated change down below.


Signed-off-by: Michel Lespinasse <walken.RemoveThis@google.com>

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..c5e887d 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,7 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_SET_WAIT 13

#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -39,6 +40,7 @@
FUTEX_PRIVATE_FLAG)
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define FUTEX_SET_WAIT_PRIVATE (FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)

/*
* Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index a8cc4e1..b08c3ec 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -25,6 +25,7 @@ struct restart_block {
struct {
u32 *uaddr;
u32 val;
+ u32 setval;
u32 flags;
u32 bitset;
u64 time;
diff --git a/kernel/futex.c b/kernel/futex.c
index fb65e82..7ed0869 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1694,6 +1694,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* futex_wait_setup() - Prepare to wait on a futex
* @uaddr: the futex userspace address
* @val: the expected value
+ * @setval: the value we want to set, in replacement of val
* @fshared: whether the futex is shared (1) or not (0)
* @q: the associated futex_q
* @hb: storage for hash_bucket pointer to be returned to caller
@@ -1704,11 +1705,12 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
* with no q.key reference on failure.
*
* Returns:
- * 0 - uaddr contains val and hb has been locked
+ * 0 - uaddr contains setval and hb has been locked
* <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
*/
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
- struct futex_q *q, struct futex_hash_bucket **hb)
+static int futex_wait_setup(u32 __user *uaddr, u32 val, u32 setval,
+ int fshared, struct futex_q *q,
+ struct futex_hash_bucket **hb)
{
u32 uval;
int ret;
@@ -1722,29 +1724,33 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
*
* The basic logical guarantee of a futex is that it blocks ONLY
* if cond(var) is known to be true at the time of blocking, for
- * any cond. If we queued after testing *uaddr, that would open
- * a race condition where we could block indefinitely with
+ * any cond. If we locked the hash-bucket after testing *uaddr, that
+ * would open a race condition where we could block indefinitely with
* cond(var) false, which would violate the guarantee.
*
- * A consequence is that futex_wait() can return zero and absorb
- * a wakeup when *uaddr != val on entry to the syscall. This is
- * rare, but normal.
+ * On the other hand, we insert q and release the hash-bucket only
+ * after testing *uaddr. This guarantees that futex_wait() will NOT
+ * absorb a wakeup if *uaddr does not match the desired values
+ * while the syscall executes.
*/
retry:
q->key = FUTEX_KEY_INIT;
- ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
+ ret = get_futex_key(uaddr, fshared, &q->key,
+ (val == setval) ? VERIFY_READ : VERIFY_WRITE);
if (unlikely(ret != 0))
return ret;

retry_private:
*hb = queue_lock(q);

- ret = get_futex_value_locked(&uval, uaddr);
+ pagefault_disable();
+ if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
+ pagefault_enable();

- if (ret) {
queue_unlock(q, *hb);

ret = get_user(uval, uaddr);
+fault_common:
if (ret)
goto out;

@@ -1755,7 +1761,21 @@ retry_private:
goto retry;
}

- if (uval != val) {
+ if (val != setval && uval == val) {
+ uval = futex_atomic_cmpxchg_inatomic(uaddr, val, setval);
+ if (unlikely(uval == -EFAULT)) {
+ pagefault_enable();
+
+ queue_unlock(q, *hb);
+
+ ret = fault_in_user_writeable(uaddr);
+ goto fault_common;
+ }
+ }
+
+ pagefault_enable();
+
+ if (uval != val && uval != setval) {
queue_unlock(q, *hb);
ret = -EWOULDBLOCK;
}
@@ -1766,8 +1786,8 @@ out:
return ret;
}

-static int futex_wait(u32 __user *uaddr, int fshared,
- u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int fshared, u32 val, u32 setval,
+ ktime_t *abs_time, u32 bitset, int clockrt)
{
struct hrtimer_sleeper timeout, *to = NULL;
struct restart_block *restart;
@@ -1795,7 +1815,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,

retry:
/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, setval, fshared, &q, &hb);
if (ret)
goto out;

@@ -1827,6 +1847,7 @@ retry:
restart->fn = futex_wait_restart;
restart->futex.uaddr = (u32 *)uaddr;
restart->futex.val = val;
+ restart->futex.setval = setval;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
restart->futex.flags = FLAGS_HAS_TIMEOUT;
@@ -1862,7 +1883,8 @@ static long futex_wait_restart(struct restart_block *restart)
restart->fn = do_no_restart_syscall;
if (restart->futex.flags & FLAGS_SHARED)
fshared = 1;
- return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
+ return (long)futex_wait(uaddr, fshared, restart->futex.val,
+ restart->futex.setval, tp,
restart->futex.bitset,
restart->futex.flags & FLAGS_CLOCKRT);
}
@@ -2219,7 +2241,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
q.requeue_pi_key = &key2;

/* Prepare to wait on uaddr. */
- ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+ ret = futex_wait_setup(uaddr, val, val, fshared, &q, &hb);
if (ret)
goto out_key2;

@@ -2532,14 +2554,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
fshared = 1;

clockrt = op & FUTEX_CLOCK_REALTIME;
- if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+ if (clockrt && cmd != FUTEX_WAIT_BITSET &&
+ cmd != FUTEX_WAIT_REQUEUE_PI && cmd != FUTEX_SET_WAIT)
return -ENOSYS;

switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
case FUTEX_WAIT_BITSET:
- ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+ val2 = val;
+ case FUTEX_SET_WAIT:
+ ret = futex_wait(uaddr, fshared, val, val2, timeout, val3,
+ clockrt);
break;
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
@@ -2595,7 +2621,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,

if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI || cmd == FUTEX_SET_WAIT)) {
if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
return -EFAULT;
if (!timespec_valid(&ts))
@@ -2609,10 +2635,16 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
/*
* requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
* number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
+ * new uval value in 'uaddr2' if cmd == FUTEX_SET_WAIT.
*/
if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
cmd == FUTEX_CMP_REQUEUE_PI || cmd == FUTEX_WAKE_OP)
val2 = (u32) (unsigned long) utime;
+ else if (cmd == FUTEX_SET_WAIT) {
+ if (!futex_cmpxchg_enabled)
+ return -ENOSYS;
+ val2 = (u32) (unsigned long) uaddr2;
+ }

return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}


--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Michel Lespinasse

External


Since: Nov 17, 2009
Posts: 6



(Msg. 12) Posted: Tue Nov 17, 2009 11:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>> Darren has recently been putting a lot of effort in making a futex test
>> suite:
>>
>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>
>> Its not yet complete, but hopefully it will soon be Smile
>
> Michael, would you be willing to include a version of this test in the
> above test suite? If so, then in keeping with the rest of the test suite,
> I would recommend splitting into two tests, one of each opcode being
> tested, and add argument to define thread count. The run.sh script would
> then run each thread count as a separate test run.

I had a quick look at the tree (using the http interface). I sure would not
mind adding the tests, and the futextest.h file would make them look
quite cleaner than my previous version. Just to be sure, would you put them
under performance/ or stress/ ? This would be the first test to be added
in these directories in any case, right ? (just asking in case I missed
something obvious).

And if Hitoshi wants to do it first, I sure won't mind the help either Smile


--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Michel Lespinasse

External


Since: Nov 17, 2009
Posts: 6



(Msg. 13) Posted: Tue Nov 17, 2009 11:20 pm
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Tue, Nov 17, 2009 at 07:24:09AM -0800, Linus Torvalds wrote:
> The FUTEX_SET_WAIT concept seems well-defined, although it sounds more
> like a FUTEX_CMPXCHG_WAIT to me than a "SET" operation. I'm not entirely
> sure that we really want to do the CMPXCHG in the kernel rather than in
> user space, since lock stealing generally isn't a problem, but I don't
> think it's _wrong_ to add this concept.
>
> In fact, CMPXCHG is generally seen to be the "fundamental" base for
> implementing locking, so in that sense it makes perfect sense to have it
> as a FUTEX model.

My first version called the operation that way, but it did *NOT* block if
val2 (now renamed setval) was already set in the futex. Turned out it helps
my use case if I do block in that situation, so I changed the operation
accordingly and renamed it into FUTEX_SET_WAIT (with a CAS model in mind,
though it's still also similar to cmpxchg in that it just returns if
the uval is not 'val' or 'setval').

> That said, I personally think the adaptive wait model is (a) more likely
> to fix many performance issues and (b) a bit more high-level concept, so I
> like Peter's patch too, but I don't see that the patches would really be
> mutually exclusive.
>
> Of course, it's possible that Michel's performance problem is fixed by the
> adaptive approach too, in which case the FUTEX_SET_WAIT (or _CMPXCHG_WAIT)
> patch is just fundamentally less interesting. But some people do need
> fairness - even when it's bad for performance - so...
>
> One thing that does strike me is that _if_ we want to do both interfaces,
> then I would assume that we quite likely also want to have an adaptive
> version of the FUTEX_SET|CMPXCHG_WAIT thing. Which perhaps implies that
> the "ADAPTIVE" part should be a bitflag in the command value?

I like the adaptive approach as well, though I'm not sure yet if it'd work
for us. I can try it but it'll take a bit of time.


One difficulty with adaptive spinning is that we want to avoid deadlocks.
If two threads end up spinning in-kernel waiting for each other, we better
have preemption enabled... or detect and deal with the situation somehow.


Also one aspect I dislike is that this would impose a given format on the
futex for storing the TID. I would prefer if there were several bits available
in the futex for userspace to do whatever they want. 8 bits would likely
be enough, which leaves 24 for the TID - enough for us, but I have no idea
if that's good enough for upstream inclusion. It that's not possible,
one possible compromise could be:

- userspace passes a TID (which it extracted from the futex value; but kernel
does not necessarily know how)
- kernel spins until that TID goes to sleep, or the futex value is not equal
to val or setval anymore
- if val != setval and the futex value is val, set it to setval
- if the futex valus is setval, block, otherwise -EWOULDBLOCK.

If the lock got stolen from a different thread, userspace can decide to
retry with or without adaptive spinning.

That would be the most generic interface I can think of, though it's
starting to be a LOT of parameters - actually, too many to pass through
the _syscall6 interface.


I also like Darren's suggestion to do a FUTEX_SET_WAIT_REQUEUE_PI,
but it's hitting the same 'too many parameters' limitation as well :/

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 14) Posted: Wed Nov 18, 2009 3:20 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Michel Lespinasse wrote:
> On Tue, Nov 17, 2009 at 08:16:06AM -0800, Darren Hart wrote:
>>> Darren has recently been putting a lot of effort in making a futex test
>>> suite:
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git
>>>
>>> Its not yet complete, but hopefully it will soon be Smile
>> Michael, would you be willing to include a version of this test in the
>> above test suite? If so, then in keeping with the rest of the test suite,
>> I would recommend splitting into two tests, one of each opcode being
>> tested, and add argument to define thread count. The run.sh script would
>> then run each thread count as a separate test run.
>
> I had a quick look at the tree (using the http interface). I sure would not
> mind adding the tests, and the futextest.h file would make them look
> quite cleaner than my previous version. Just to be sure, would you put them
> under performance/ or stress/ ? This would be the first test to be added
> in these directories in any case, right ? (just asking in case I missed
> something obvious).

Michel, excellent!

I think performance sounds like the right place to me. Performance is
where I plan to put measurement tests where you report some score.
Stress tests are considered to have passed if the run to completion
without locking up or crashing the kernel Smile

We should take a look at the perf tool to see what requirements it may
impose on tests we want to share between perf and futextest.

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Darren Hart

External


Since: May 19, 2007
Posts: 57



(Msg. 15) Posted: Wed Nov 18, 2009 3:20 am
Post subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Michel Lespinasse wrote:

> One difficulty with adaptive spinning is that we want to avoid deadlocks.
> If two threads end up spinning in-kernel waiting for each other, we better
> have preemption enabled... or detect and deal with the situation somehow.

This is really only a problem for SCHED_FIFO tasks right? (SCHED_OTHER
should get scheduled() out when CFS deems they've exhausted their fair
share). Real-Time tasks typically should be using PI anyway as adaptive
locking is non-deterministic and doesn't provide for PI. So I'm not sure
how critical this problem is in practice.

> Also one aspect I dislike is that this would impose a given format on the
> futex for storing the TID.

We do have a precedent for this with robust as well as PI futexes.

I would prefer if there were several bits available
> in the futex for userspace to do whatever they want. 8 bits would likely
> be enough, which leaves 24 for the TID - enough for us, but I have no idea
> if that's good enough for upstream inclusion. It that's not possible,
> one possible compromise could be:

And we already use two of those bits for OWNER_DIED and FUTEX_WAITERS.
Perhaps you just have to choose between your own value scheme and
adaptive spinning (sounds horribly limiting as I'm typing this...).

>
> - userspace passes a TID (which it extracted from the futex value; but kernel
> does not necessarily know how)
> - kernel spins until that TID goes to sleep, or the futex value is not equal
> to val or setval anymore
> - if val != setval and the futex value is val, set it to setval
> - if the futex valus is setval, block, otherwise -EWOULDBLOCK.
>
> If the lock got stolen from a different thread, userspace can decide to
> retry with or without adaptive spinning.

I'll think on this a bit more...

>
> That would be the most generic interface I can think of, though it's
> starting to be a LOT of parameters - actually, too many to pass through
> the _syscall6 interface.
>
>
> I also like Darren's suggestion to do a FUTEX_SET_WAIT_REQUEUE_PI,
> but it's hitting the same 'too many parameters' limitation as well :/

We don't use val2 for FUTEX_WAIT_REQUEUE_PI, so we should be able to use
that for setval.


--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo DeleteThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Display posts from previous:   
Related Topics:
[patch] futex: restartable futex_wait - LTP test sigaction_16_24 fails, because it expects sem_wait to be restarted if SA_RESTART is set. sem_wait is..

[PATCH] futex: init error ckeck - This patch checks register_filesystem() and kern_mount() return values. Cc: Ingo Molnar <mingo@redhat.com>..

[PATCH] futex null pointer timeout - This fix is mostly from Thomas .. The problem was that a futex can be called with a zero timeout (0 seconds, 0..

[rfc][patch] futex: restartable futex_wait? - Hi Ingo, I'm seeing an LTP test fail for ltp test sigaction_16_24. Basically, it tests whether the SA_RESTART flag..

[patch 0/4] PI-Futex bugfixes and cleanups - Andrew, the following patch series solves the issues with pi-futexes which were reported from Alexey. While the first...

[PATCH] Futex: Revert the non-functional REQUEUE_PI - Patch d0aa7a70bf03b9de9e995ab272293be1f7937822 titled "futex_requeue_pi optimization" introduced user spac...
       Soft32 Home -> Linux -> Kernel All times are: Pacific Time (US & Canada) (change)
Goto page 1, 2, 3
Page 1 of 3

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Categories:
 Windows
  Linux
 Mac
 PDA


[ Contact us | Terms of Service/Privacy Policy ]