 |
|
 |
|
Next: [News] More Signs of Microsoft Running Out of Mon..
|
| Author |
Message |
External

Since: Jun 16, 2009 Posts: 2
|
(Msg. 1) Posted: Tue Jun 16, 2009 9:20 am
Post subject: System time freezes?? Archived from groups: comp>os>linux>hardware (more info?)
|
|
|
Hi,
On one particular server, I've been experiencing strange system lockups
for months now, with all kinds of different linux versions. Applications
lock up one by one, until after maybe an hour or two (depending on
usage) the entire system is deadlocked. This happens maybe once in a
month, and I found no way to trigger it.
Now after activating network logging, I found that, before the latest
lockup, the system clock simply stopped after netdate set it 2 seconds back.
The system then kept on running for a while but by the time users wanted
to access data it was completely locked up.
Now what can this be? Can it be hardware? How does the kernel maintain
the system time? How would you go about debugging this? I'm out of ideas.
--
Victor Mataré
Server- & Netzwerk-Administration
Lehrstuhl für Ingenieurgeologie und Hydrogeologie der RWTH-Aachen
Lochnerstraße 4-20, 52064 Aachen
Tel: 0241 80 96778
Fax: 0241 80 92280 |
|
| Back to top |
|
 |  |
External

Since: May 08, 2009 Posts: 19
|
(Msg. 2) Posted: Tue Jun 16, 2009 10:45 am
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Victor Matare wrote:
> Hi,
>
> On one particular server, I've been experiencing strange system lockups
> for months now, with all kinds of different linux versions. Applications
> lock up one by one, until after maybe an hour or two (depending on
> usage) the entire system is deadlocked. This happens maybe once in a
> month, and I found no way to trigger it.
> Now after activating network logging, I found that, before the latest
> lockup, the system clock simply stopped after netdate set it 2 seconds
> back. The system then kept on running for a while but by the time users
> wanted to access data it was completely locked up.
>
> Now what can this be? Can it be hardware? How does the kernel maintain
> the system time? How would you go about debugging this? I'm out of ideas.
>
I recommend that you run a memtest program on the machine for about 4 or 5
hours, if you can spare that time.
It sounds as though you will need a kernel debugger if the machine passes a
long memory test. There have been 2 major kernel debuggers, and I believe
that one was merged with the official Linux kernel tree. From what I
understand you may need a null modem cable, or an ethernet connection
between the computer that is having problems, and another computer that you
can use the debugger with. You will probably also need to build a custom
kernel. I don't have experience with the Linux kernel debuggers, because my
usage of kernel debuggers was primarily with OpenBSD and NetBSD (which
include ddb by default), but I am aware of kgdb and kdb for Linux from
articles I have read.
If the problem is in the software for the kernel, it's most likely a driver.
The driver may have incorrect locking, which may lead to the deadlock, or a
race condition. You may find that the dmesg indicates something goes wrong.
The lock usage in a kernel can be difficult to manage, because there may be
a pattern of locks to use when acquiring a locked area or critical region,
and there are often return or error paths between the acquiring of the lock
area, and the unlocking of the lock. This is further complicated by some
kernel functions potentially needing to acquire some of the same locks,
which can lead to deadlock. It gets more complex, because if a driver
originally written to use a lock in code path X works fine one day, it may
be that the path for X later on changes the lock usage, so that X now should
be called with some lock in an unlocked state. This means that to fully
understand the proper locking behavior, you may need to understand every
piece of code that you call, or potentially call. This makes the cost of
maintaining and changing code increase.
-George |
|
| Back to top |
|
 |  |
External

Since: Apr 11, 2007 Posts: 203
|
(Msg. 3) Posted: Wed Jun 17, 2009 7:20 pm
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
On 2009-06-16, Victor Matare <v.matare DeleteThis @lih.rwth-aachen.de> wrote:
> Hi,
>
> On one particular server, I've been experiencing strange system lockups
> for months now, with all kinds of different linux versions. Applications
> lock up one by one, until after maybe an hour or two (depending on
> usage) the entire system is deadlocked. This happens maybe once in a
> month, and I found no way to trigger it.
> Now after activating network logging, I found that, before the latest
> lockup, the system clock simply stopped after netdate set it 2 seconds back.
> The system then kept on running for a while but by the time users wanted
> to access data it was completely locked up.
>
> Now what can this be? Can it be hardware? How does the kernel maintain
> the system time? How would you go about debugging this? I'm out of ideas.
Do you run ntp on your system? NTP will wind back your clock using small
decrements. 2 seconds can be a big time change under some circumstances,
especially VPN connections.
--
Regards,
Gregory.
Gentoo Linux - Penguin Power |
|
| Back to top |
|
 |  |
External

Since: May 29, 2009 Posts: 3
|
(Msg. 4) Posted: Sat Jun 20, 2009 11:19 pm
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
GPS wrote:
> Victor Matare wrote:
>
>> Hi,
>>
>> On one particular server, I've been experiencing strange system lockups
>> for months now, with all kinds of different linux versions. Applications
>> lock up one by one, until after maybe an hour or two (depending on
>> usage) the entire system is deadlocked. This happens maybe once in a
>> month, and I found no way to trigger it.
>> Now after activating network logging, I found that, before the latest
>> lockup, the system clock simply stopped after netdate set it 2 seconds
>> back. The system then kept on running for a while but by the time users
>> wanted to access data it was completely locked up.
>>
>> Now what can this be? Can it be hardware? How does the kernel maintain
>> the system time? How would you go about debugging this? I'm out of ideas.
>>
>
> I recommend that you run a memtest program on the machine for about 4 or 5
> hours, if you can spare that time.
>
[snip]
Download the latest version of memtest86+ and run at least 2 full passes. On
a big server with a lot of ram this may take a day or more. I have lost
count of the number of times i have seen people (including myself) chase
weird system problems only to eventually find out they had a bad dimm (or 2
or 3). As an example: I was trying to install Mandriva on one system and I'd
get to the part where it would start formatting partitions and that would
fail part way through the format, it wouldnt crash but it would return an
error message saying the format failed. Hmmm, i was sure it was a bad disk,
but another disk did the same thing. Tried several OS's and each failed to
install in its own way. Ran memtest86+ and discovered a bad dimm which i
pulled and replaced. I retried the original install and that went
perfectly - no more trouble.
Eric |
|
| Back to top |
|
 |  |
External

Since: Jun 16, 2009 Posts: 2
|
(Msg. 5) Posted: Thu Jun 25, 2009 11:20 am
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Gregory Shearman wrote:
> Do you run ntp on your system? NTP will wind back your clock using small
> decrements. 2 seconds can be a big time change under some circumstances,
> especially VPN connections.
>
ATM I'm just using netdate. Its manpage makes no statements about how
the time is decremented. That's an interesting pointer for sure. I
didn't realize rewinding the clock might be a critical operation.
However I just swapped mainboard, ram and cpu with another server to
rule that out. Guess I'll have to wait another 30 days now to see what
happens. Or maybe do some crazy clock-setting-back-and-forth. We'll see...
Thanks for the help so far.
--
Victor Mataré
Server- & Netzwerk-Administration
Lehrstuhl für Ingenieurgeologie und Hydrogeologie der RWTH-Aachen
Lochnerstraße 4-20, 52064 Aachen
Tel: 0241 80 96778
Fax: 0241 80 92280 |
|
| Back to top |
|
 |  |
External

Since: Apr 11, 2007 Posts: 203
|
(Msg. 6) Posted: Fri Jun 26, 2009 9:20 am
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
On 2009-06-25, Victor Matare <v.matare RemoveThis @lih.rwth-aachen.de> wrote:
> Gregory Shearman wrote:
>> Do you run ntp on your system? NTP will wind back your clock using small
>> decrements. 2 seconds can be a big time change under some circumstances,
>> especially VPN connections.
>>
>
> ATM I'm just using netdate. Its manpage makes no statements about how
> the time is decremented. That's an interesting pointer for sure. I
> didn't realize rewinding the clock might be a critical operation.
Winding back the clock can be *disastrous* for some operations. If the
difference is more than a few seconds, the ntp daemon will refuse to
wind back (or wind forward) your system clock and it will exit.
--
Regards,
Gregory.
Gentoo Linux - Penguin Power |
|
| Back to top |
|
 |  |
External

Since: May 03, 2006 Posts: 66
|
(Msg. 7) Posted: Fri Jun 26, 2009 1:20 pm
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
Gregory Shearman <ZekeGregory.RemoveThis@netscape.net> wrote:
> Do you run ntp on your system? NTP will wind back your clock using small
> decrements. 2 seconds can be a big time change under some circumstances,
> especially VPN connections.
I've *never* seen NTP jump the clock backwards unless you force it to do
so at startup. What it does do in this situation, however, is to slow
down the clock until reality has caught up. There is a vast difference
between the two approaches.
Chris |
|
| Back to top |
|
 |  |
External

Since: Dec 26, 2004 Posts: 371
|
(Msg. 8) Posted: Fri Jun 26, 2009 2:55 pm
Post subject: Re: System time freezes?? [Login to view extended thread Info.] Archived from groups: per prev. post (more info?)
|
|
|
On 26 Jun 2009, in the Usenet newsgroup comp.os.linux.hardware, in article
<slrnh49goo.bvu.ZekeGregory.RemoveThis@netscape.net>, Gregory Shearman wrote:
>Victor Matare <v.matare.RemoveThis@lih.rwth-aachen.de> wrote:
>> Gregory Shearman wrote:
>>> Do you run ntp on your system? NTP will wind back your clock using
>>> small decrements.
Not exactly true. See Appendix G of RFC1305. The clock is _slowed_
rather than stepped backwards. Time should NEVER go backwards.
>> That's an interesting pointer for sure. I didn't realize rewinding
>> the clock might be a critical operation.
>Winding back the clock can be *disastrous* for some operations.
It may actually be a violation of some laws - those relating to the
banking or (financial) securities industries for example.
>If the difference is more than a few seconds, the ntp daemon will
>refuse to wind back (or wind forward) your system clock and it will
>exit.
There are two sanity checks - NTP won't touch a clock that is out by
900 seconds (15 minutes), and it won't adjust the time RATE if the
error is more than 500 parts/million (about 43 seconds a day).
Old guy |
|
| Back to top |
|
 |  |
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|
 |
|
|