Welcome to Soft32 Linux Forums!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

sched_yield proposals/rationale

 
   Soft32 Home -> Linux -> Kernel RSS
Next:  [PATCH] deflate stack usage in lib/inflate.c  
Author Message
Buytaert_Steven

External


Since: Apr 12, 2007
Posts: 5



(Msg. 1) Posted: Thu Apr 12, 2007 5:40 am
Post subject: sched_yield proposals/rationale
Archived from groups: linux>kernel (more info?)

Since the new 2.6.x O(1) scheduler I'm having latency problems. Probably due
to excessive use of sched_yield in code in components I don't have control
over. This 'problem'/behavioral change has been reported also by other
applications (e.g. OpenLDAP, Gnome netmeeting, Postgress, e.google...)

I have analysed the sched_yield code in kernel/sched.c (2.6.16 SLES10) and have 3 questions/proposals:

1) Would it be beneficial to give each task one try to be enqueued on the
end of the active queue (by means of a boolean flag, reset each time the
time slices reach 0 and it is put in the expired list by scheduler_tick)?

2) When a task is eventually put in the expired list in sched_yield, give it
back the full time slices round (as done in scheduler_tick), not with the
remaining slices as is done now?

3) Put the task in the expired list at a random position, not at the end as
is done now?

Rationales:

For 1) suppose the resource that the task wants to acquire (if any) and for
which it yields, since it is not available, is kept by another task that is
in the active list, behind this task. Putting the task on the expired list
would induce some kind of 'tail chasing'. This would be broken by 1.

For 2) When a task is put on the expired list with e.g. 1 time quantum left,
next time it gets the resource and is put on the expired list at the next
scheduler_tick occurence, before it has the chance to release the resource
again, keeping it for a longer time than necessary.

For 3) when the resource (if any) is requested by many tasks
regularly, always putting the yielding task at the end of the expired list
could result in 'unfairness' since the resource is handed out to the tasks
as they get on the cpu. The only 'fairness' that can be applied with yield
is than to insert in a random position.

Any insight on these proposed changes? Thanks

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne reste rien à enlever. (Antoine de Saint-Exupéry)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Sven-Thorsten Dietrich

External


Since: Apr 12, 2007
Posts: 18



(Msg. 2) Posted: Thu Apr 12, 2007 7:00 am
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, 2007-04-12 at 04:31 -0400, Buytaert_Steven.RemoveThis@emc.com wrote:
> Since the new 2.6.x O(1) scheduler I'm having latency problems.

1. Have you elevated the process priority?
2. Have you tried running SCHED_FIFO, or SCHED_RR?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Andi Kleen

External


Since: Feb 12, 2007
Posts: 663



(Msg. 3) Posted: Thu Apr 12, 2007 9:40 am
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Buytaert_Steven.RemoveThis@emc.com writes:

> Since the new 2.6.x O(1) scheduler I'm having latency problems. Probably due
> to excessive use of sched_yield in code in components I don't have control
> over. This 'problem'/behavioral change has been reported also by other
> applications (e.g. OpenLDAP, Gnome netmeeting, Postgress, e.google...)

On the other hand when they fix their code to not rely on sched_yield
but use some more directed wakeup method they will end up with a much
more robust configuration. sched_yield always assumes the rest of the system
is idle, which is just wrong.

But yes the new sched_yield semantics seem to be definitely unexpected
for a lot of people.

The only way I could think of to make sched_yield work the way they
expect would be to define some way of gang scheduling and give
sched_yield semantics that it preferably yields to other members
of the gang.

But it would be still hard to get these semantics (how to define
the gangs) into your uncontrollable broken applications and also
it has the risk of either unfairness or not full utilization of the
machine. Getting it to scale well on MP systems would be also likely
a challenge.

> I have analysed the sched_yield code in kernel/sched.c (2.6.16 SLES10) and have 3 questions/proposals:
>
> 1) Would it be beneficial to give each task one try to be enqueued on the
> end of the active queue (by means of a boolean flag, reset each time the
> time slices reach 0 and it is put in the expired list by scheduler_tick)?

That would still not unbreak most applications I would suspect -- they
will likely try to yield multiple times before using up a full time slice
unless their event handlers are very compute intensive.

In general other subsystems (e.g. VM) had had quite bad experiences
with similar "one more try" hacks -- they tend to be not robust and
actually penalize some loads.

> 2) When a task is eventually put in the expired list in sched_yield, give it
> back the full time slices round (as done in scheduler_tick), not with the
> remaining slices as is done now?

That would likely be unfair and exploitable.

> 3) Put the task in the expired list at a random position, not at the end as
> is done now?

Sounds like an interesting approach, but to do it in O(1) you would
need a new data structure with possibly much larger constant overhead.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.RemoveThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Buytaert_Steven

External


Since: Apr 12, 2007
Posts: 5



(Msg. 4) Posted: Thu Apr 12, 2007 10:10 am
Post subject: RE: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

> -----Original Message-----
> From: Andi Kleen
> [ ... about use of sched_yield ...]
> On the other hand when they fix their code to not rely on sched_yield
> but use [...]

Agreed, but $ find . -name "*.[ch]" | xargs grep -E "yield[ ]*\(" | wc over
the 2.6.16 kernel yields 105 hits, note including comments... An interesting spot is e.g. fs/buffer.c free_more_memory()

> The only way I could think of to make sched_yield work the way they
> expect would be to define some way of gang scheduling and give
> sched_yield semantics that it preferably yields to other members
> of the gang. [...]

That would indeed make things too complicated.

> [about question 1]
> That would still not unbreak most applications I would suspect -- they
> will likely try to yield multiple times before using up a full time slice
> unless their event handlers are very compute intensive.

No, they only get 1 go again on the active queue, until they pass the proper exit to the expired queue via scheduler_tick. And they are enqueued at the end of the active queue with whatever slices they have left, thus fair.

> In general other subsystems (e.g. VM) had had quite bad experiences
> with similar "one more try" hacks -- they tend to be not robust and
> actually penalize some loads.
>
> > 2) When a task is eventually put in the expired list in sched_yield,
> > give it back the full time slices round (as done in scheduler_tick), not > > with the remaining slices as is done now?
>
> That would likely be unfair and exploitable.

I don't understand; how more unfair would it be than passing via scheduler_tick? Grabbing a resource with a single time slice left would be more unfair towards other tasks IMHO when you get moved to the expired list with the resource in still in your possession.

>
> > 3) Put the task in the expired list at a random position, not at the end
> > is done now?
>
> Sounds like an interesting approach, but to do it in O(1) you would
> need a new data structure with possibly much larger constant overhead.

Agreed, but not dramatic. Suppose you need to insert at position X, you would do, on the linked list after proper setup:

while (X--) { prev = current; current = current->next }

You could have a small duffs device to reduce the X-- checking overhead.

Thanks for the insight Andi!

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne reste rien à enlever. (Antoine de Saint-Exupéry)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Nick Piggin

External


Since: Nov 27, 2006
Posts: 373



(Msg. 5) Posted: Thu Apr 12, 2007 10:30 am
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Buytaert_Steven.TakeThisOut@emc.com wrote:
>>-----Original Message-----
>>From: Andi Kleen
>>[ ... about use of sched_yield ...]
>>On the other hand when they fix their code to not rely on sched_yield
>>but use [...]
>
>
> Agreed, but $ find . -name "*.[ch]" | xargs grep -E "yield[ ]*\(" | wc over
> the 2.6.16 kernel yields 105 hits, note including comments...

Most of these (in core code, anyway) seem to use yield when they really don't
care about running for a while.

> An interesting spot is e.g. fs/buffer.c free_more_memory()

This one should be pretty rare (actually I think it is dead code in practice,
due to the way the page allocator works).

Avoiding sched_yield is a really good idea outside realtime scheduling. Since
we have gone this far with the current semantics, I think it would be sad to
back down now.

It would be nice if you could pressure those other components to adapt Smile

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Andi Kleen

External


Since: Feb 12, 2007
Posts: 663



(Msg. 6) Posted: Thu Apr 12, 2007 10:40 am
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, Apr 12, 2007 at 09:05:25AM -0400, Buytaert_Steven RemoveThis @emc.com wrote:
> > -----Original Message-----
> > From: Andi Kleen
> > [ ... about use of sched_yield ...]
> > On the other hand when they fix their code to not rely on sched_yield
> > but use [...]
>
> Agreed, but $ find . -name "*.[ch]" | xargs grep -E "yield[ ]*\(" | wc over
> the 2.6.16 kernel yields 105 hits, note including comments... An interesting spot is e.g. fs/buffer.c free_more_memory()

A lot of those are probably broken in some way agreed.

> >
> > > 2) When a task is eventually put in the expired list in sched_yield,
> > > give it back the full time slices round (as done in scheduler_tick), not > > with the remaining slices as is done now?
> >
> > That would likely be unfair and exploitable.
>
> I don't understand; how more unfair would it be than passing via scheduler_tick? Grabbing a resource with a single time slice left would be more unfair towards other tasks IMHO when you get moved to the expired list with the resource in still in your possession.

With a particular sleep pattern it could get more CPU time.

> > > 3) Put the task in the expired list at a random position, not at the end
> > > is done now?
> >
> > Sounds like an interesting approach, but to do it in O(1) you would
> > need a new data structure with possibly much larger constant overhead.
>
> Agreed, but not dramatic. Suppose you need to insert at position X, you would do, on the linked list after proper setup:
>
> while (X--) { prev = current; current = current->next }
>
> You could have a small duffs device to reduce the X-- checking overhead.

You would need to rename the scheduler to "sometimes O(1)" first @)

Besides - but I guess you're aware of it - any randomized algorithms tend
to drive benchmarkers and performance analysts crazy because their performance
cannot be repeated. So it's usually better to avoid them unless there is
really no alternative.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Buytaert_Steven

External


Since: Apr 12, 2007
Posts: 5



(Msg. 7) Posted: Thu Apr 12, 2007 11:21 am
Post subject: RE: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

> -----Original Message-----
> > Agreed, but $ find . -name "*.[ch]" | xargs grep -E "yield[ ]*\(" | wc
> over
> > the 2.6.16 kernel yields 105 hits, note including comments... An
> interesting spot is e.g. fs/buffer.c free_more_memory()
>
> A lot of those are probably broken in some way agreed.

OK, so there are 2 options: 1) fix 100 spots or 2) ... Wink

> [ about giving a thread its full slice quota after yielding ]
> With a particular sleep pattern it could get more CPU time.

I'm not a linux scheduler expert, but I don't understand this. Suppose 2 extreme situations:

1) I have full time slices and call yield, going to the expired list with again full times slices; this is IMHO a NOOP, i.e. fair.

2) I have only a single slice left, NOT yielding would get me to the expired list via scheduler_tick the next timer interrupt and I would get my full time slice quota. DO YIELD at that moment, will give me full time slice quota again (with my proposal).

>
> 3) Put the task in the expired list at a random position [...]
> > while (X--) { prev = current; current = current->next }
>
> You would need to rename the scheduler to "sometimes O(1)" first @)

Again, agreed, but this tight loop would run in code cache completely. We could call it O(1.1) Wink With a duffs device it would mean a few clock cycles per position.

> Besides - but I guess you're aware of it - any randomized algorithms tend
> to drive benchmarkers and performance analysts crazy because their
> performance
> cannot be repeated. So it's usually better to avoid them unless there is
> really no alternative.

That could already solve your concern from above. Statistically speaking, it will give them (benchmarkers) the smoothest curve they've ever seen.

Please be aware that I'm just exploring options/insight here. It is not something I intend to push inside the mainline kernel. I just want to find reasonable and logic criticism as you and some others have provided already. Thanks for that!

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne reste rien à enlever. (Antoine de Saint-Exupéry)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Thomas Gleixner

External


Since: Nov 10, 2006
Posts: 705



(Msg. 8) Posted: Thu Apr 12, 2007 7:04 pm
Post subject: RE: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, 2007-04-12 at 10:15 -0400, Buytaert_Steven.TakeThisOut@emc.com wrote:
> > -----Original Message-----
> > > Agreed, but $ find . -name "*.[ch]" | xargs grep -E "yield[ ]*\(" | wc
> > over
> > > the 2.6.16 kernel yields 105 hits, note including comments... An
> > interesting spot is e.g. fs/buffer.c free_more_memory()
> >
> > A lot of those are probably broken in some way agreed.
>
> OK, so there are 2 options: 1) fix 100 spots or 2) ... Wink

100, simply because at least 95 of them are broken.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
William Lee Irwin III

External


Since: Jul 03, 2005
Posts: 282



(Msg. 9) Posted: Thu Apr 12, 2007 7:04 pm
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, Apr 12, 2007 at 03:31:31PM +0200, Andi Kleen wrote:
> The only way I could think of to make sched_yield work the way they
> expect would be to define some way of gang scheduling and give
> sched_yield semantics that it preferably yields to other members
> of the gang.
> But it would be still hard to get these semantics (how to define
> the gangs) into your uncontrollable broken applications and also
> it has the risk of either unfairness or not full utilization of the
> machine. Getting it to scale well on MP systems would be also likely
> a challenge.

Gang scheduling isn't a squishy concept whose name can be arbitrarily
repurposed. Perhaps "group scheduling" or similar would be appropriate
if the standard gang scheduling semantics are not what you have in mind.

Standard gang scheduling would not be appropriate for applications that
don't know what they're doing. All threads of a gang falling asleep
when one sleeps (or more properly, the gang is considered either
runnable or unrunnable as a unit) is not to be taken lightly.

I'd call this something like a "directed yield with a group as a target,"
but I wouldn't actually try to do this. I'd try to provide ways for a
directed yield to donate remaining timeslice and dynamic priority if
possible to a particular task associated with a resource, for instance,
a futex or SysV semaphore owner. The priority inversion one desperately
wants to avoid is the resource owner running out of timeslice or
otherwise losing priority to where it falls behind busywaiters such as
callers of sched_yield for the purposes of multitier locking.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
William Lee Irwin III

External


Since: Jul 03, 2005
Posts: 282



(Msg. 10) Posted: Thu Apr 12, 2007 7:05 pm
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Thu, Apr 12, 2007 at 11:27:22PM +1000, Nick Piggin wrote:
> This one should be pretty rare (actually I think it is dead code in
> practice, due to the way the page allocator works).
> Avoiding sched_yield is a really good idea outside realtime scheduling.
> Since we have gone this far with the current semantics, I think it
> would be sad to back down now.
> It would be nice if you could pressure those other components to adapt Smile

Outside of realtime scheduling there appear to be two desired behaviors:
(1) busywait: demote as aggressively as possible
(2) CPU burn: give other apps a chance to run but demote lightly at most

There is no way for the scheduler to distinguish which of the two
behaviors is desired. A fresh system call taking an argument to describe
which is the desired behavior is my recommended solution. Most unaware
apps should be able to be dealt with via LD_PRELOAD.

Busywaiters able to be modified could be given more specific scheduling
primitives, in particular "directed yields," which donate timeslice and
possibly dynamic priority to their targets. They would look something
like:
int yield_to(pid_t);
int yield_to_futex(int *);
int yield_to_sem(int);
/* etc. */
as userspace library functions where yielding to a resource is intended
to donate timeslice to its owner or one of its owners, where those
owner(s) are to be determined by the kernel. Directed yields are a more
direct attack on the priority inversion one most desperately wants to
avoid in the case of sched_yield() -based busywaiters on a resource,
namely the resource owner falling behind the busywaiters in priority or
running out of timeslice. They furthermore reduce the competition for
CPU between resource owners and busywaiters on that resource.

A less direct alternative suggested by Andi Kleen is to have coprocess
groups and an alternative to sched_yield() that directs yielding toward
a member of the same coprocess group as the yielder, possibly using the
standard system call by making that the default behavior when a process
is a member of such a coprocess group.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Bill Davidsen

External


Since: Jul 27, 2004
Posts: 210



(Msg. 11) Posted: Thu Apr 12, 2007 7:40 pm
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Buytaert_Steven.TakeThisOut@emc.com wrote:
>> -----Original Message-----

>> Besides - but I guess you're aware of it - any randomized
>> algorithms tend to drive benchmarkers and performance analysts
>> crazy because their performance cannot be repeated. So it's usually
>> better to avoid them unless there is really no alternative.
>
> That could already solve your concern from above. Statistically
speaking, it will give them (benchmarkers) the smoothest curve they've
ever seen.
>
> Please be aware that I'm just exploring options/insight here. It is
not something I intend to push inside the mainline kernel. I just want
to find reasonable and logic criticism as you and some others have
provided already. Thanks for that!

And having gotten same, are you going to code up what appears to be a
solution, based on this feedback?

I'm curious how well it would run poorly written programs, having
recently worked with a company which seemed to have a whole part of
purchasing dedicated to buying same. Sad

--
Bill Davidsen <davidsen.TakeThisOut@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.TakeThisOut@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Buytaert_Steven

External


Since: Apr 12, 2007
Posts: 5



(Msg. 12) Posted: Fri Apr 13, 2007 2:40 am
Post subject: RE: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

> From: Bill Davidsen
>
> And having gotten same, are you going to code up what appears to be a
> solution, based on this feedback?

The feedback was helpful in verifying whether there are any arguments against my approach. The real proof is in the pudding.

I'm running a kernel with these changes, as we speak. Overall system throughput is about up 20%. With 'system throughput' I mean measured performance of a rather large (experimental) system. The patch isn't even 24h old... Also the application latency has improved.

Additional settings: my patch is running *also* with a kernel modified to have only 8 default time slices at 250Hz setting. And no, the overall number of context switches per second hasn't blown up. The kernel was compiled with low latency and in-kernel preemption enabled, BKL preemption enabled. I haven't checked the patch stand alone yet.

> I'm curious how well it would run poorly written programs, having
> recently worked with a company which seemed to have a whole part of
> purchasing dedicated to buying same. Sad

So first signs are positive; note that it requires much more run time and a slew of other tests/scrutiny before we can be really sure.

W.r.t. the remarks; I am most interested in possibilities of DOS attacks that could exploit this change in sched_yield. Therefore the comments of Andi were interesting, but I haven't heard back from him yet. I'm still not sure how a task could juggle more slices from the system because of these changes.

Last remark on the O(1)'ness being violated. I think it's a mooth point. The sched_yield is executed on the CPU time of the yielder. Being O(1) is most important for the scheduler proper at each timer tick (interrupt). That being O(1) is crucial.

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne reste rien à enlever. (Antoine de Saint-Exupéry)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Mark Lord

External


Since: Nov 09, 2006
Posts: 195



(Msg. 13) Posted: Fri Apr 13, 2007 4:50 pm
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Buytaert_Steven.DeleteThis@emc.com wrote:
>> From: Bill Davidsen
>>
>> And having gotten same, are you going to code up what appears to be a
>> solution, based on this feedback?
>
> The feedback was helpful in verifying whether there are any arguments against my approach. The real proof is in the pudding.
>
> I'm running a kernel with these changes, as we speak. Overall system throughput is about up 20%. With 'system throughput' I mean measured performance of a rather large (experimental) system. The patch isn't even 24h old... Also the application latency has improved.

Cool. You *do know* that there is a brand new CPU scheduler
scheduled to replace the current one for the 2.6.22 Kernel, right?

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Bill Davidsen

External


Since: Jul 27, 2004
Posts: 210



(Msg. 14) Posted: Tue Apr 17, 2007 4:50 pm
Post subject: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Mark Lord wrote:
> Buytaert_Steven.DeleteThis@emc.com wrote:
>>> From: Bill Davidsen
>>>
>>> And having gotten same, are you going to code up what appears to be a
>>> solution, based on this feedback?
>>
>> The feedback was helpful in verifying whether there are any arguments
>> against my approach. The real proof is in the pudding.
>>
>> I'm running a kernel with these changes, as we speak. Overall system
>> throughput is about up 20%. With 'system throughput' I mean measured
>> performance of a rather large (experimental) system. The patch isn't
>> even 24h old... Also the application latency has improved.
>
> Cool. You *do know* that there is a brand new CPU scheduler
> scheduled to replace the current one for the 2.6.22 Kernel, right?
>
Having tried both nicksched and Con's fair sched on some normal loads,
as opposed to benchmarks, I sure hope Linus changes his mind about
having several schedulers in the kernel. The "one perfect and
self-adjusting scheduler" isn't here yet.

--
Bill Davidsen <davidsen.DeleteThis@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo.DeleteThis@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Buytaert_Steven

External


Since: Apr 12, 2007
Posts: 5



(Msg. 15) Posted: Wed Apr 18, 2007 2:50 am
Post subject: RE: Re: sched_yield proposals/rationale [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

> -----Original Message-----
> From: linux-kernel-owner RemoveThis @vger.kernel.org [mailto:linux-kernel-
> owner RemoveThis @vger.kernel.org] On Behalf Of Bill Davidsen
> Sent: dinsdag 17 april 2007 21:38
> To: linux-kernel RemoveThis @vger.kernel.org
> Cc: Buytaert, Steven; andi RemoveThis @firstfloor.org; linux-kernel RemoveThis @vger.kernel.org
> Subject: Re: sched_yield proposals/rationale
>
> Mark Lord wrote:
> >
> > Cool. You *do know* that there is a brand new CPU scheduler
> > scheduled to replace the current one for the 2.6.22 Kernel, right?
> >
> Having tried both nicksched and Con's fair sched on some normal loads,
> as opposed to benchmarks, I sure hope Linus changes his mind about
> having several schedulers in the kernel. The "one perfect and
> self-adjusting scheduler" isn't here yet.

I have the same opinion, and it is still a long time out I'm afraid. Probably people only read my suggestion for a 'fix' diagonally, let alone they read my footer. Too bad the archives only go back to 95, I would love to retrieve my posts from 93. Anybody still have these?

Now a bit more on topic:

1) My problem is/was solved by making the default time slice much smaller than the default 100 in a 250Hz system. But that's only masking it away.

2) I made a schedstat program sort of like vmstat that samples and prints deltas each second (from /proc/schedstat). Just printing the jiffies delta and the # times schedule is called per CPU, is already thought provoking; a small 12 second example:

2.6.16 vanilla scheduler

Jiffies CPU0 CPU1
Delta
250 | 1756 | 7301 |
252 | 1730 | 2638 |
254 | 1963 | 1663 |
385 | 868 | 658 | <--- stall starts
325 | 138 | 112 |
330 | 184 | 130 |
339 | 682 | 122 |
335 | 367 | 159 |
334 | 653 | 127 |
345 | 467 | 137 |
335 | 673 | 128 |
337 | 471 | 131 |
334 | 673 | 127 |
332 | 321 | 144 |
333 | 523 | 129 |
332 | 98 | 123 |
356 | 496 | 124 |
270 | 96 | 87 |
277 | 5878 | 26228 | <-- yes 26K
252 |18263 | 19130 | ...
255 | 2024 | 5747 | <-- back to normal

Let's talk about fairness when we get the basics right. And yes, the real physical elapsed time per line IS 1 second, so the jiffies jumping up from the normal expected value of 250 to up to 356 is not normal; it's in fact very very bad. This box was unusable for 13 seconds in a row.

But this doesn't look serious enough to most people, it seems.

Steven Buytaert
--
La perfection est atteinte non quand il ne reste rien ajouter, mais quand il ne reste rien à enlever. (Antoine de Saint-Exupéry)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo RemoveThis @vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Back to top
Login to vote
Display posts from previous:   
Related Topics:
linux.conf.au 2008 - call for proposals - I hope this is of some interest to some of you. :-) We can't run LCA without your participation and we hope you'll hel...

[PATCH 5/8] resend cciss: disable DMA prefetch on P600 - PATCH 5 of 8 resend This patch unconditionally disables DMA prefetch on the P600 controller. A bug in the ASIC may..

[PATCH 2.6.19-rc4 1/3] ehea: Nullpointer dereferencation fix - Fix: Must check for nullpointer before dereferencing it - not afterwards. Signed-off-by: Thomas Klein..

[PATCH 2.6.19-rc4 3/3] ehea: 64K page support fix - This patch fixes 64k page support by using PAGE_MASK and appropriate pagesize defines in several places. Signed-off-by...

[PATCH 2.6.19-rc4 2/3] ehea: Removed redundant define - Removed define H_CB_ALIGNMENT which is already defined in include/asm-powerpc/hvcall.h Signed-off-by: Thomas Klein..

[PATCH] lockdep: fix delayacct locking bug - ====================================================== [ INFO: soft-safe -> soft-unsafe lock order detected ] 2.6.19...
       Soft32 Home -> Linux -> Kernel All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Categories:
 Windows
  Linux
 Mac
 PDA


[ Contact us | Terms of Service/Privacy Policy ]