Welcome to Soft32 Linux Forums!
FAQFAQ    SearchSearch      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

thread memory size

 
Goto page Previous  1, 2, 3, 4, 5
   Soft32 Home -> Linux -> System Development RSS
Next:  sh sell  
Author Message
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 61) Posted: Mon Oct 12, 2009 5:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: comp>os>linux>development>system (more info?)

David Schwartz wrote:

> If you meant the stuff about lock-free algorithms, you are (in
> general) wrong. Lock-free algorithms typically perform significantly
> worse than comparable lock-based algorithms expect in very special
> circumstances (the ones where clueful people use them).
>
> Contention is bad, it results in CPUs wasting their time synchronizing
> with each other. Lock-free algorithms actually tend to maximize
> contention, as it encourages contending threads to run in parallel.

I think your information is quite outdated. You should have a look at
Cliff Clicks blog article about a non-blocking hash table (for java)
He has been testing it on a 750 cpu machine and it scales linearly. The
algorithm is based on CAS and state machines. He has tested it on other
machines with other architectures as well, with the results.

http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html

With the current interest in massively parallel systems, a lot of the
best people in the world on concurrency are working on improving
performance of concurrent algorithms.

It is with these advances in technology, I base my arguments. Since the
people working on these algorithms are acknowledged experts on the
matter of concurrency, I trust them a bit more than your unsupported
claims. Since you have not been able to convince me with your arguments,
nor have you been able to produce any references that supports your
claim, I have little incentive to agree with you.

regards

thomas
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 62) Posted: Mon Oct 12, 2009 5:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:
> On Oct 12, 2:27 am, Jan Helgesen <s....DeleteThis@nospam.com> wrote:
>
>> It is with these advances in technology, I base my arguments. Since the
>> people working on these algorithms are acknowledged experts on the
>> matter of concurrency, I trust them a bit more than your unsupported
>> claims. Since you have not been able to convince me with your arguments,
>> nor have you been able to produce any references that supports your
>> claim, I have little incentive to agree with you.
>
> I cannot convince you with my arguments because you are beyond
> rational persuasion. I notice that you don't point out any errors or
> flaws of any kind, you simply point to something completely irrelevant
> and claim that it somehow refutes my argument when, ironically, it
> actually supports it.

I am not doing anything different from what you are doing. I am removing
your arguments that that are irrelevant to my point.

you have seen one of my references, maybe you could explain why you
think that is irrelevant.

regards

Jan

>
> In any event, continue doing what you're doing having no understanding
> of any of the concepts involved. I have exhausted my patience trying
> to educate you.
>
> DS
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 63) Posted: Mon Oct 12, 2009 5:37 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 12, 5:17 am, Jan Helgesen <s....TakeThisOut@nospam.com> wrote:
> David Schwartz wrote:
> > On Oct 12, 3:09 am, Jan Helgesen <s....TakeThisOut@nospam.com> wrote:

> >> you have seen one of my references, maybe you could explain why you
> >> think that is irrelevant.

> > That it is possible to do something right does not refute an argument
> > that it is possible to do something wrong. If you look at my last
> > paragraph of the post you didn't respond to, I explained that.

> So your argument is: just because it is possible that something can go
> wrong with anything, nothing is worth looking at? That's the most silly
> rationale I have ever heard. The reference I showed you, shows that it
> is possible to do it with 750 cpus. I have never claimed it works for 1
> million cores yet, I have said that the future might be bright.

I don't understand how you got that from what I said.

> > But this is a waste of time. You actively refuse to think and I'm
> > through trying. (I see no evidence you even attempted to understand my
> > argument.)

> Arrgh, wtf is wrong with you. I have answered all your questions or
> arguments that I find relevant. I have tried as hard as I can understand
> how you get to your conclusions and how it is related to the case. But
> every time I ask you to explain, you just change the examples you use or
> the focus of your arguments, it is impossible to keep track of how all
> your arguments are related and how it is relevant.
>
> Your only response is to say generic and silly statements like "you show
> no evidence... " or "you do not understand what I say.." and so on
> without explaining explicitly what it is in my arguments you think are
> are wrong.

You don't make arguments. You just say that I'm wrong and then cite
things that are either irrelevant or that actually agree with me Can
you show some sign, any sign, that you understood my example about the
lock-free collections, the four threads, and the two cores?

DS
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 64) Posted: Mon Oct 12, 2009 5:46 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 12, 5:37 am, David Schwartz <dav... RemoveThis @webmaster.com> wrote:

> You don't make arguments. You just say that I'm wrong and then cite
> things that are either irrelevant or that actually agree with me. Can
> you show some sign, any sign, that you understood my example about the
> lock-free collections, the four threads, and the two cores?

From my end, it looked like you just thought "he said lock-free
doesn't work or doesn't scale, but here's a reference that shows that
it does". Is that what you were thinking? If so, you *completely*
missed my argument.

DS
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 65) Posted: Mon Oct 12, 2009 7:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:
> On Oct 12, 3:09 am, Jan Helgesen <s... RemoveThis @nospam.com> wrote:
>
>> you have seen one of my references, maybe you could explain why you
>> think that is irrelevant.
>
> That it is possible to do something right does not refute an argument
> that it is possible to do something wrong. If you look at my last
> paragraph of the post you didn't respond to, I explained that.

So your argument is: just because it is possible that something can go
wrong with anything, nothing is worth looking at? That's the most silly
rationale I have ever heard. The reference I showed you, shows that it
is possible to do it with 750 cpus. I have never claimed it works for 1
million cores yet, I have said that the future might be bright.

> But this is a waste of time. You actively refuse to think and I'm
> through trying. (I see no evidence you even attempted to understand my
> argument.)

Arrgh, wtf is wrong with you. I have answered all your questions or
arguments that I find relevant. I have tried as hard as I can understand
how you get to your conclusions and how it is related to the case. But
every time I ask you to explain, you just change the examples you use or
the focus of your arguments, it is impossible to keep track of how all
your arguments are related and how it is relevant.

Your only response is to say generic and silly statements like "you show
no evidence... " or "you do not understand what I say.." and so on
without explaining explicitly what it is in my arguments you think are
are wrong.

regards

Jan
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 66) Posted: Mon Oct 12, 2009 9:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:
> On Oct 11, 9:22 pm, David Schwartz <dav....DeleteThis@webmaster.com> wrote:
>
>> Now imagine we switch collections A and B to a lock-free algorithm.
>> Now, there's a 50 percent chance that one thread will be A1 or A2 and
>> the other will be B1 or B2. In this case, there will be frequent
>> contention and the shared collection will constantly ping-pong from
>> cache to cache. FOR NO REASON.
>
> Grr. Of course, this should read "that one thread will be A1 and the
> other will be A2, or one thread will be B1 and the other will be B2".
> In either case, for no reason at all, two threads are horribly
> contending and saturating the FSB. Worse, if you have more than two
> cores running threads from other processes, those threads can't make
> decent forward progress because the FSB is saturated.
>
> (My terminology assumes there *is* an FSB. But the issues are similar,
> though not always precisely the same or as bad, if you have some other
> technology.)

You mentioned T2, but it has only 8 cores in one chip, and uses a
traditional chip layout. The Tilera Tile64 chip uses a tile
architecture, which means that all cores are connected to 4 other cores
by way of 4 super fast switches (one per core). So there are 64 switches
and 64 cores, which are each connected to a number of ram controllers.
This will give different concurrency possibilities than an ordinary FSB. See

http://en.wikipedia.org/wiki/File:Tile64.svg


regards

Jan
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 67) Posted: Mon Oct 12, 2009 9:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:

> You don't make arguments. You just say that I'm wrong and then cite
> things that are either irrelevant or that actually agree with me Can
> you show some sign, any sign, that you understood my example about the
> lock-free collections, the four threads, and the two cores?

Ok, here is my game plan. I will refrain from arguing anything until we
both agree that I about what the example

understand the example you gave, ok?

If you are talking about the example about threads A1, A2, B1, B2, I
will continue from that thread, ok?

regards

Jan
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 68) Posted: Mon Oct 12, 2009 11:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:
> For example, consider two cores running four threads, A1, A2, B1, and
> B2. Assume threads A1 and A2 both frequently access structure A and
> threads B1 and B2 both frequently access structure B. If structures A
> and B are locked, threads A1 and A2 will not run concurrently. If
> thread A1 is running, A2 will be blocked and B1 or B2 will run. There
> will be minimal contention.
>
> Now imagine we switch collections A and B to a lock-free algorithm.
> Now, there's a 50 percent chance that one thread will be A1 or A2 and
> the other will be B1 or B2. In this case, there will be frequent
> contention and the shared collection will constantly ping-pong from
> cache to cache. FOR NO REASON.

First of all, I am assuming that when you say "lock" for threads you
actually mean a CPU lock produced by a CAS instruction.
And I am also assuming that when you say "structure" you are in fact
only talking about something that hold only 1 memory place, i.e.
something that can be wholly changed by one CAS instruction i.e. a 32
bit memory location on a 32 bit machine.

Which of the following are the most correct summaries of you statement:

1) that because thread A1 is running on core1 and thread A2 is running
on core2, the resource will be sent from cache1(core1's cache) to
cache2(core2's cache), and that it will be sent back and forth between
cache 1 and 2 as the threads are scheduled for different CPUs

2) or are you saying that when thread A1 has structure A in cpu1's
chache1, if thread A2 in cpu2 asks for access to A, it will cause A to
be sent to cache2, and then be rejected because A1 in cache1 already has
a lock on the resource

regards

Jan
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 69) Posted: Mon Oct 12, 2009 1:52 pm
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 12, 8:21 am, Jan Helgesen <s... RemoveThis @nospam.com> wrote:
> David Schwartz wrote:
> > For example, consider two cores running four threads, A1, A2, B1, and
> > B2. Assume threads A1 and A2 both frequently access structure A and
> > threads B1 and B2 both frequently access structure B. If structures A
> > and B are locked, threads A1 and A2 will not run concurrently. If
> > thread A1 is running, A2 will be blocked and B1 or B2 will run. There
> > will be minimal contention.
>
> > Now imagine we switch collections A and B to a lock-free algorithm.
> > Now, there's a 50 percent chance that one thread will be A1 or A2 and
> > the other will be B1 or B2. In this case, there will be frequent
> > contention and the shared collection will constantly ping-pong from
> > cache to cache. FOR NO REASON.

> First of all, I am assuming that when you say "lock" for threads you
> actually mean a CPU lock produced by a CAS instruction.

No, I mean "lock-based" as opposed to "lock-free".

> And I am also assuming that when you say "structure" you are in fact
> only talking about something that hold only 1 memory place, i.e.
> something that can be wholly changed by one CAS instruction i.e. a 32
> bit memory location on a 32 bit machine.

No, I mean a complex collection.

> Which of the following are the most correct summaries of you statement:
>
> 1) that because thread A1 is running on core1 and thread A2 is running
> on core2, the resource will be sent from cache1(core1's cache) to
> cache2(core2's cache), and that it will be sent back and forth between
> cache 1 and 2 as the threads are scheduled for different CPUs

Yes, exactly. If threads A1 and A2 run at the same time, there will be
expensive contention and both threads will make less forward progress
per unit time.

> 2) or are you saying that when thread A1 has structure A in cpu1's
> chache1, if thread A2 in cpu2 asks for access to A, it will cause A to
> be sent to cache2, and then be rejected because A1 in cache1 already has
> a lock on the resource

That will only happen if you use a lock. And that's good, because then
once thread A2s access fails, thread B1 or B2 will be scheduled and
everything will run at full speed.

But inappropriate use of a lock-free algorithm simply allows the
threads to continue contending when that contention is avoidable.

In other words, lock-free algorithms are great for dealing with
unavoidable contention. But if you use them when your contention is
avoidable, the net will result will be the contention will *not* be
avoidable.

DS
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 70) Posted: Tue Oct 13, 2009 7:20 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

David Schwartz wrote:

You are throwing a lot into the discussion in one go, so I will ask
questions, one by one, and make sure we are in agreement.

> No, I mean "lock-based" as opposed to "lock-free".

Ok, "locks" can occur at two levels 1) cpu locks 2) software locks.
"lock-free" can only occur at level 2, i.e. software level.

- CPU locks exists to be able to keep threads from accessing the same
memory region without destroying the data for another thread.
- Locks at software level often encompasses critical sections, i.e. a
group of machine or language instructions in one. But it can also map to
an atomic operations in it self. I.e. in java one has objects called
AtomicReference or AtomicInteger (contains an implicit lock)

Agreed so far?

the lock it self (not the region it possibly protects), whether on cpu
level or on software level, has to be something that can be changed in
one CAS operation, right? For example a mutex, that sets a flag to
signal that this lock is in use and other threads just have to wait, right?
Such a lock is then either a bit flag or a single memory location, that
can be changed in one memory operation.

When the critical region is complete, another operation resets the flag
or memory location and then allows other threads to acquire the lock.

Agreed?

>> And I am also assuming that when you say "structure" you are in fact
>> only talking about something that hold only 1 memory place, i.e.
>> something that can be wholly changed by one CAS instruction i.e. a 32
>> bit memory location on a 32 bit machine.
>
> No, I mean a complex collection.
>
>> Which of the following are the most correct summaries of you statement:
>>
>> 1) that because thread A1 is running on core1 and thread A2 is running
>> on core2, the resource will be sent from cache1(core1's cache) to
>> cache2(core2's cache), and that it will be sent back and forth between
>> cache 1 and 2 as the threads are scheduled for different CPUs
>
> Yes, exactly. If threads A1 and A2 run at the same time, there will be
> expensive contention and both threads will make less forward progress
> per unit time.

This will occur no matter whether the operation is on a CPU level or
software level. Since a critical section lock will still require a CAS
operation to acquire the lock, and for a CPU level lock, it is a CAS
operation, so in both cases the mutex will have to be sent between
caches if the threads are running on different cores, right?


>> 2) or are you saying that when thread A1 has structure A in cpu1's
>> chache1, if thread A2 in cpu2 asks for access to A, it will cause A to
>> be sent to cache2, and then be rejected because A1 in cache1 already has
>> a lock on the resource
>
> That will only happen if you use a lock. And that's good, because then
> once thread A2s access fails, thread B1 or B2 will be scheduled and
> everything will run at full speed.

So what you are also saying here is that with a lock, when the A1 is
running and A2 is blocked, A2 will not be scheduled and that frees up
the other core so that B1 or B2 can run instead, right?
In other words all those threads that can't really do any real work,
will waste cpu cycles when based on a lock-free algorithm, because the
only thing they will do is spin-wait on different cores until the lock
is released and the try to acquire the lock, agreed?

A question would then be, if there are 100 threads waiting for that
resource, even if all threads are running on the same core, they will
have to wait one after another to acquire the lock right?
So as there are more threads the threads will have to wait longer to get
access, right?

>
> But inappropriate use of a lock-free algorithm simply allows the
> threads to continue contending when that contention is avoidable.

You mean its avoidable by using a software based lock instead, right?
(Since we are talking about complex operations)

> In other words, lock-free algorithms are great for dealing with
> unavoidable contention. But if you use them when your contention is
> avoidable, the net will result will be the contention will *not* be
> avoidable.

I dont have comment to this right now.

regards

Jan
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 71) Posted: Tue Oct 13, 2009 1:25 pm
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 13, 3:43 am, Jan Helgesen <s....TakeThisOut@nospam.com> wrote:

> So what you are also saying here is that with a lock, when the A1 is
> running and A2 is blocked, A2 will not be scheduled and that frees up
> the other core so that B1 or B2 can run instead, right?

Exactly.

> In other words all those threads that can't really do any real work,
> will waste cpu cycles when based on a lock-free algorithm, because the
> only thing they will do is spin-wait on different cores until the lock
> is released and the try to acquire the lock, agreed?

No, lock-free algorithms don't spin-wait on locks. The problem is that
accesses to shared resources drop to FSB speed (or its equivalent,
depending on CPU architecture) when there is contention.

> A question would then be, if there are 100 threads waiting for that
> resource, even if all threads are running on the same core, they will
> have to wait one after another to acquire the lock right?
> So as there are more threads the threads will have to wait longer to get
> access, right?

It depends on the lock architecture and what else the system as a
whole can do.

> > But inappropriate use of a lock-free algorithm simply allows the
> > threads to continue contending when that contention is avoidable.

> You mean its avoidable by using a software based lock instead, right?
> (Since we are talking about complex operations)

I'm not sure what you mean by a software-based lock. It's avoidable by
blocking threads that contend so that threads that don't contend can
run instead. Lock-free algorithms do not block threads that contend,
which is good if there's no other thread that can run instead but bad
if there is.

DS
Back to top
Login to vote
Jan Helgesen

External


Since: Jan 24, 2009
Posts: 41



(Msg. 72) Posted: Tue Oct 13, 2009 5:20 pm
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

I will make some comment questions and then at the bottom I will make my
reply to you.

David Schwartz wrote:
>
> No, lock-free algorithms don't spin-wait on locks. The problem is that

I know it does not automatically spin-lock, but a lock-free algorithm
has to do something, so what does it do...?

> accesses to shared resources drop to FSB speed (or its equivalent,
> depending on CPU architecture) when there is contention.

It always does, because any shared resource has to be synchronised
between caches and RAM memory before it can be accessed by a thread at a
cpu cache.

>> A question would then be, if there are 100 threads waiting for that
>> resource, even if all threads are running on the same core, they will
>> have to wait one after another to acquire the lock right?
>> So as there are more threads the threads will have to wait longer to get
>> access, right?
>
> It depends on the lock architecture and what else the system as a
> whole can do.

Since you think I dont understand these things, you are free to
elaborate on your thoughts so that we can get to a conclusion in this
duscussion...

>> You mean its avoidable by using a software based lock instead, right?
>> (Since we are talking about complex operations)
>
> I'm not sure what you mean by a software-based lock. It's avoidable by
> blocking threads that contend so that threads that don't contend can
> run instead.

That's what I am saying, a software lock is typically a lock that
protects critical regions, hence a thread that does not have the lock
has to block (so calling it blocking lock might be more appropriate I
suppose). Unless we are talking about atomic software operations. But
since you are not....


So to my answers:

For the problem of cache thrashing, or cache ping-pong as you called it:

It depends on the architecture of the CPU. But most decent CPU
architectures (AMD64, POWER, Sparc etc) that supports many cores will
reduce that by locality, i.e. the cpu and os scheduler will try to
schecdule threads that contend for a reasource on the same core or
similar. That way, that kind of ping-pong effect on the cache will be
severely reduced.
As side note, massively parallel CPUs such as Tilera's Tile64, but also
future AMD64, POWER etc, will have to improve on that so as to avoid the
problem even further, otherwise it will be a waste of money to increase
the number of CPUs. The manufacturers are not that stupid.


For the problem of "runaway" threads when using lock-free algorithms:

Any decent algorithm that is lock-free, will seek to utilise the threads
computing power somehow instead of busy-waiting. A proper "optimistic
locking" algorithm will do exactly that. It will use only CAS operations
on its data structure, but will not use critical regions. It will
arrange the structure and operations so that it has states. That means
that any thread that uses the structure can atomically see the state of
the structure, add data and help the structure get into its new
consistent state, while not destroying its state or data consistency.
I.e. it will do some good while it can not complete the job it intended.
A thread that was supposed to complete the operation, sees from the
state that the structure has been set to a consistent state and will
jump on to its next task. This is an incomplete description, but I am
painting you a picture. Which is, a lock-free algorithm can use the
non-blocking period to do something else instead of just contending for
a resource and wasting cpu cycles.


So, this brings us back to message passing:

With such an algorithm, or similar, it is possible to create message
passing that can scale and perform without wasting much computing power
or other resource (of course depending on a decent CPU). This is what I
tried to show you with the article about the non-blocking hash-table.
And it also scales, as it states up to 750 cpus.

regards

Jan
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 73) Posted: Wed Oct 14, 2009 2:59 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 13, 3:11 pm, Jan Helgesen <s... RemoveThis @nospam.com> wrote:

> > No, lock-free algorithms don't spin-wait on locks. The problem is that

> I know it does not automatically spin-lock, but a lock-free algorithm
> has to do something, so what does it do...?

I could have sworn I already answered exactly this question. It either
uses atomic compare-and-swap type operations or atomic increment/
decrement operations. These operations act typically by internally
locking cache lines. Ideally, lock-free algorithms are designed so
that if one thread fails to make forward progress, say because a
compare-and-swap failed in the compare, it can only be because another
thread *did* make forward progress.

But none of that is the issue here. The issue here is contention.

> > accesses to shared resources drop to FSB speed (or its equivalent,
> > depending on CPU architecture) when there is contention.

> It always does, because any shared resource has to be synchronised
> between caches and RAM memory before it can be accessed by a thread at a
> cpu cache.

Right. That's why it's bad to run threads that contend at the same
time. But the only purpose for lock-free algorithms is that you *can*
run threads that contend at the same time. In other words, the sole
purpose of lock-free algorithms is to allow you to do something that
is usually *bad*.

So lock-free algorithms are only useful when the benefits of running
contending threads concurrently outweigh the disadvantages of doing
so. One case where the benefits may outweigh the disadvantages -- when
the threads don't really contend all that much and there are no non-
contending threads that could be run anyway.

However, in most of those cases, lock-based algorithms would typically
behave almost exactly the same as lock-free algorithms. For example,
consider a thread that gets a task by some kind of mechanism and then
works that task. Assuming working the task takes much longer than
getting the task, the locks will almost never conflict anyway, since
threads will hold the lock for only a very tiny fraction of the time.

> >> You mean its avoidable by using a software based lock instead, right?
> >> (Since we are talking about complex operations)

> > I'm not sure what you mean by a software-based lock. It's avoidable by
> > blocking threads that contend so that threads that don't contend can
> > run instead.

> That's what I am saying, a software lock is typically a lock that
> protects critical regions, hence a thread that does not have the lock
> has to block (so calling it blocking lock might be more appropriate I
> suppose). Unless we are talking about atomic software operations. But
> since you are not....
>
> So to my answers:

> For the problem of cache thrashing, or cache ping-pong as you called it:
>
> It depends on the architecture of the CPU. But most decent CPU
> architectures (AMD64, POWER, Sparc etc) that supports many cores will
> reduce that by locality, i.e. the cpu and os scheduler will try to
> schecdule threads that contend for a reasource on the same core or
> similar. That way, that kind of ping-pong effect on the cache will be
> severely reduced.

I don't know how you think they do this. There is no "contention
indicator" that is propagated to the scheduler. Schedulers may try to
run related threads (those from the same process) in intelligent ways,
but there's no way in general they can detect contention between
unrelated threads. Perhaps in the future they'll do this, and as
systems have more and more non-uniform memory access they may have to.

But at least for the near future, the trend seems to be going in the
other direction. Systems are being designed and built with memory
access that looks more and more like uniform, and NUMA optimizations
are proving to be more trouble than they're worth.

> As side note, massively parallel CPUs such as Tilera's Tile64, but also
> future AMD64, POWER etc, will have to improve on that so as to avoid the
> problem even further, otherwise it will be a waste of money to increase
> the number of CPUs. The manufacturers are not that stupid.

How we'll use massively multi-core CPUs is still an open question.
There are people who think they'll stay in the same SMP domain for the
forseeable future, people who think they won't, and people who think
we'll need systems that allow us to change that flexibly. There are
people who see non-uniform memory access and people who don't.

> For the problem of "runaway" threads when using lock-free algorithms:
>
> Any decent algorithm that is lock-free, will seek to utilise the threads
> computing power somehow instead of busy-waiting. A proper "optimistic
> locking" algorithm will do exactly that. It will use only CAS operations
> on its data structure, but will not use critical regions. It will
> arrange the structure and operations so that it has states. That means
> that any thread that uses the structure can atomically see the state of
> the structure, add data and help the structure get into its new
> consistent state, while not destroying its state or data consistency.
> I.e. it will do some good while it can not complete the job it intended.
> A thread that was supposed to complete the operation, sees from the
> state that the structure has been set to a consistent state and will
> jump on to its next task. This is an incomplete description, but I am
> painting you a picture. Which is, a lock-free algorithm can use the
> non-blocking period to do something else instead of just contending for
> a resource and wasting cpu cycles.

All of those things you describe are simply forms of contention. You
can't even *read* the structure after another thread has written to it
without forcing a cache ping-pong. If you think you'll get performance
from increased complexity in synchronization objects, I think you're
wrong. I'm not 100% sure, but that's not the sense I get.

> So, this brings us back to message passing:

> With such an algorithm, or similar, it is possible to create message
> passing that can scale and perform without wasting much computing power
> or other resource (of course depending on a decent CPU). This is what I
> tried to show you with the article about the non-blocking hash-table.
> And it also scales, as it states up to 750 cpus.

Right, but the whole point of that is that you *don't* use threads.

DS
Back to top
Login to vote
David Schwartz

External


Since: Apr 25, 2007
Posts: 134



(Msg. 74) Posted: Thu Oct 15, 2009 3:13 am
Post subject: Re: thread memory size [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

On Oct 13, 3:11 pm, Jan Helgesen <s....RemoveThis@nospam.com> wrote:

> It always does, because any shared resource has to be synchronised
> between caches and RAM memory before it can be accessed by a thread at a
> cpu cache.

Oh, just to clarify one detail: Most modern SMB cache coherency
protocols do *not* synchronize caches to RAM for any reason other than
to make more room in the caches (or to prepare data for I/O or the
like). They synchronize cache-to-cache without having to synchronize
to or through RAM.

If you think about it, this makes perfect sense. RAM is going to be
much slower than the caches are. And in many cases the FSB (or
whatever CPU-to-CPU interconnect you have) will also be much faster
than RAM.

DS
Back to top
Login to vote
Display posts from previous:   
Related Topics:
Access shared memory from kernel module - Hi All, I wanted to know if shared memory created in user space can be accessed from a loadable kernel module. Have no...

Size 8 bit, 16 bit, 32 bit and 64 bit systems. - I need to find out what is the size of following data structures in 8 bit, 16 bit, 32 bit, and 64 bit systems. struct....

Size 8 bit, 16 bit, 32 bit and 64 bit systems. - I need to find out what is the size of following data structures in 8 bit, 16 bit, 32 bit, and 64 bit systems. struct....

RFC3971 - Does anyone know if RFC3971 support is being developed for Linux? It does not seem to be implemented in the mainline..

Controlling UART transmission of bytes - I'm programming an ARM's UART that comes with a library implementing the standard unix termios interface. Regarding..

Max memory size per thread? - What is max RAM limitations in single thread with Red Hat 9? Need to run large monolithic process (~2.5GB) on system..
       Soft32 Home -> Linux -> System Development All times are: Pacific Time (US & Canada) (change)
Goto page Previous  1, 2, 3, 4, 5
Page 5 of 5

 
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Categories:
 Windows
  Linux
 Mac
 PDA


[ Contact us | Terms of Service/Privacy Policy ]