Cron oddity

List overview All Threads
Download

newer

older

Re: [ALUG] Cron oddity

Re: [ALUG] Inserting env vars or...

Mark Rogers

11 May 2012 11 May '12

9:57 a.m.

I have a script which runs from cron every minute. I've added some logging to the script and I can see it starting reliably every minute, usually around 1s past the minute. So far so good.

Occasionally (about 30 times yesterday) the script will also start within the minute, and looking at the logs there's some consistency to it - most (but not all) started at 47s past the minute. So I end up with:

21:31:01 Start 21:32:01 Start 21:32:47 Start 21:33:01 Start

What could cause this? The only thing kicking the script off is cron.

(It's running Debian Wheezy on ARM.)

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Show replies by date

Mark Rogers

11 May 11 May

11:01 a.m.

On 11/05/12 09:57, Mark Rogers wrote:

...

What could cause this? The only thing kicking the script off is cron.

(It's running Debian Wheezy on ARM.)

OK, I'm closer to an idea of what's going on: the system clock is jumping around all over the place.

What's the best way to install an ntp client on Debian? I did "apt-get install ntp" a couple of days ago because my clock was wrong, but haven't done more than that in terms of configuration. But it seems to be ntp that is causing the clock to jump.

Here's a snippet from syslog; note the time jumps forward just at the end of process ID 12633, and then jumps back again just at the end of process 12720. Beyond that I can't make much sense of this log:

May 11 08:42:38 debian ntpd[12632]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 08:42:38 debian ntpd[12633]: proto: precision = 1.445 usec May 11 08:42:38 debian ntpd[12633]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen and drop on 1 v6wildcard :: UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen normally on 2 lo 127.0.0.1 UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen normally on 3 eth1 192.168.251.59 UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen normally on 4 uap0 192.168.1.1 UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen normally on 5 eth1 fe80::f2ad:4eff:fe00:b303 UDP 123 May 11 08:42:38 debian ntpd[12633]: Listen normally on 6 lo ::1 UDP 123 May 11 08:42:38 debian ntpd[12633]: peers refreshed May 11 08:42:38 debian ntpd[12633]: Listening on routing socket on fd #23 for interface updates May 11 09:03:11 debian ntpd[12633]: ntpd exiting on signal 15 May 11 09:03:13 debian ntpd[12719]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 09:03:13 debian ntpd[12720]: proto: precision = 1.455 usec May 11 09:03:13 debian ntpd[12720]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen and drop on 1 v6wildcard :: UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen normally on 2 lo 127.0.0.1 UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen normally on 3 eth1 192.168.251.59 UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen normally on 4 uap0 192.168.1.1 UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen normally on 5 eth1 fe80::f2ad:4eff:fe00:b303 UDP 123 May 11 09:03:13 debian ntpd[12720]: Listen normally on 6 lo ::1 UDP 123 May 11 09:03:13 debian ntpd[12720]: peers refreshed May 11 09:03:13 debian ntpd[12720]: Listening on routing socket on fd #23 for interface updates May 11 08:56:03 debian ntpd[12720]: ntpd exiting on signal 15

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

steve-ALUG＠hst.me.uk

11:39 a.m.

On 11/05/12 11:01, Mark Rogers wrote:

...

On 11/05/12 09:57, Mark Rogers wrote:

...
What could cause this? The only thing kicking the script off is cron.

(It's running Debian Wheezy on ARM.)

OK, I'm closer to an idea of what's going on: the system clock is jumping around all over the place.

What's the best way to install an ntp client on Debian? I did "apt-get install ntp" a couple of days ago because my clock was wrong, but haven't done more than that in terms of configuration. But it seems to be ntp that is causing the clock to jump.

[snip log]

"This is Ground control to Major Cron. Can you hear me now? Is there something wrong?"

Not an expert, but have you looked at /etc/ntp.conf?

I have several time servers listed in mine. Perhaps you don't?

I have lines saying: server ntp.ubuntu.com server ntp2a.mcc.ac.uk server ntp2b.mcc.ac.uk server ntp.maths.tcd.ie server ntp2.ja.net

There may be sufficient comments in your conf file to show how to configure it, or find out where to configure it.

HTH Steve

Mark Rogers

11:56 a.m.

On 11/05/12 11:39, steve-ALUG@hst.me.uk wrote:

...

"This is Ground control to Major Cron. Can you hear me now? Is there something wrong?"

Thanks for the smile that gave me!

...

Not an expert, but have you looked at /etc/ntp.conf?

Yes. Normally it works with the default servers, which are 0.debian.pool.ntp.org through 3.debian.pool.ntp.org. My "preferred" NTP server setting is uk.pool.ntp.org so I've just put that in instead. All of the servers have "iburst" appended which should make them more reliable not less, so I haven't tried removing that yet.

I've also just rebooted to see if that helps.

One thing I did notice was that the hardware clock was about 15 mins out, so I also updated that (hwclock --systohc --utc). If something was randomly alternating between system and hardware clock that would explain things but I don't see why that would happen.

Having looked further into this, I'm not sure that ntp is the culprit: it looks like ntp is running for "a while" then dying (signal 15) to be replaced by another ntp process, where "a while" is anything from a minute to several minutes. That explains why the clock "jumps" forward during the runtime of each ntp process (it's not jumping forward 15 mins, it's just running for 15 mins). But it doesn't explain the jumps backwards.

I note that there's nothing in the log to suggest that ntp is actually setting the clock either.

So I'm even more confused than I was!

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

steve-ALUG＠hst.me.uk

1:24 p.m.

On 11/05/12 11:56, Mark Rogers wrote:

...

[] Having looked further into this, I'm not sure that ntp is the culprit: it looks like ntp is running for "a while" then dying (signal 15) to be replaced by another ntp process, where "a while" is anything from a minute to several minutes. That explains why the clock "jumps" forward during the runtime of each ntp process (it's not jumping forward 15 mins, it's just running for 15 mins). But it doesn't explain the jumps backwards.

I note that there's nothing in the log to suggest that ntp is actually setting the clock either.

Is there anything else in the log that's getting a signal 15 at the same time as ntp dying? I can understand the log times jumping by 15 minutes as you say, that's just when separate ntp processes start.

The time jumping backwards, I would guess is because NTP *is* adjusting the clock, but, if NTP is running correctly, I'd have thought it would adjust the clock gradually, having noted the drift between the h/w clock and the internet time servers, rather than jumping.

Ideas: log ntp to a separate log file. Increase the debug-log level. Investigate ntpq -p as per http://perdues.com/doc/ntp.html

Any of this help? http://linux.die.net/man/8/ntpd

AFAICS, the iburst setting just speeds up the time in which the clock is initially set, I dunno if it makes things more reliable.

Hope this helps in some way!

Steve

Mark Rogers

2:24 p.m.

On 11/05/12 13:24, steve-ALUG@hst.me.uk wrote:

...

Is there anything else in the log that's getting a signal 15 at the same time as ntp dying?

Nothing else in the logs around that time, other than cron (which you'll recall was where I started).

Looking at them now (at about 2:02pm): - hwclock reports the time as 13:02:35 UTC (correct) - date reports the time as 13:16:46 UTC (~15 mins out).

syslog reports ("Listen" lines stripped for clarity) the following: note clock goes from 12:46 to 12:43 (and that was presumably after 15 minutes), 12:43 to 13:05 (seems reasonable after ~15 minutes), 13:05 to 12:57 (backwards again).

May 11 12:46:57 debian ntpd[3728]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 12:46:57 debian ntpd[3729]: proto: precision = 1.440 usec May 11 12:46:57 debian ntpd[3729]: peers refreshed May 11 12:46:57 debian ntpd[3729]: Listening on routing socket on fd #25 for interface updates May 11 12:43:05 debian ntpd[3729]: ntpd exiting on signal 15

May 11 12:43:07 debian ntpd[3872]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 12:43:07 debian ntpd[3873]: proto: precision = 1.445 usec May 11 12:43:08 debian ntpd[3873]: peers refreshed May 11 12:43:08 debian ntpd[3873]: Listening on routing socket on fd #25 for interface updates May 11 13:05:26 debian ntpd[3873]: ntpd exiting on signal 15

May 11 13:05:29 debian ntpd[3986]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 13:05:29 debian ntpd[3987]: proto: precision = 1.445 usec May 11 13:05:29 debian ntpd[3987]: peers refreshed May 11 13:05:29 debian ntpd[3987]: Listening on routing socket on fd #25 for interface updates May 11 12:57:39 debian ntpd[3987]: ntpd exiting on signal 15

May 11 12:57:41 debian ntpd[4093]: ntpd 4.2.6p5@1.2349 Mon Feb 27 12:39:16 UTC 2012 (1) May 11 12:57:41 debian ntpd[4094]: proto: precision = 1.440 usec May 11 12:57:41 debian ntpd[4094]: peers refreshed May 11 12:57:41 debian ntpd[4094]: Listening on routing socket on fd #25 for interface updates

...

The time jumping backwards, I would guess is because NTP *is* adjusting the clock, but, if NTP is running correctly, I'd have thought it would adjust the clock gradually, having noted the drift between the h/w clock and the internet time servers, rather than jumping.

But also, shouldn't it be adding something to the log to say it's changed the clock?

...

Ideas: log ntp to a separate log file. Increase the debug-log level.

I couldn't see how to do this?

...

Investigate ntpq -p as per http://perdues.com/doc/ntp.html

$ ntpq -p remote refid st t when poll reach delay offset jitter ======================================== +46-38-164-94.st 83.138.151.80 4 u 22 64 37 29.882 8.223 30.137 *85.12.35.12 193.79.237.14 2 u 15 64 77 36.732 7.545 63.424

The figures don't look great from my reading of the above page, but not bad either.

Notable is the comment on that page that regarding syslog that: "If your local server steps the time ahead or back suddenly, it will report that here also with a message such as: time reset -6.394626 s." I am not seeing anything like this logged.

...

Any of this help? http://linux.die.net/man/8/ntpd

Not really, or at least if there's something helpful there I'm missing it.

I have a nagging feeling that this nothing to do with ntp. I've just killed ntp to see what happens to the clock in its absence.

Thanks for the help!

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

steve-ALUG＠hst.me.uk

2:41 p.m.

On 11/05/12 14:24, Mark Rogers wrote:

...

On 11/05/12 13:24, steve-ALUG@hst.me.uk wrote:

...
Is there anything else in the log that's getting a signal 15 at the same time as ntp dying?

Nothing else in the logs around that time, other than cron (which you'll recall was where I started).

Looking at them now (at about 2:02pm):

hwclock reports the time as 13:02:35 UTC (correct)

date reports the time as 13:16:46 UTC (~15 mins out).

ISTR something about NTP having trouble if the H/W clock and system time were vastly different, but I'd have thought it could cope with 15 mins, but more to the point, why are they different if NTP is working? Can you manually set both to more or less the correct time, and then see what happens?

...

...
The time jumping backwards, I would guess is because NTP *is* adjusting the clock, but, if NTP is running correctly, I'd have thought it would adjust the clock gradually, having noted the drift between the h/w clock and the internet time servers, rather than jumping.

But also, shouldn't it be adding something to the log to say it's changed the clock?

Ah, yes, probably.

...

...
Ideas: log ntp to a separate log file. Increase the debug-log level.

I couldn't see how to do this?

http://linux.die.net/man/8/ntpd if you kill the deamon, run it from a prompt, and use -d or -dd or -ddd etc for more levels of debug logging. As to logging to a separate file, I misread the config file - you can log the statistics to a file if you want by having statsdir /var/log/ntpstats/ in the conf file. Dunno if that would help though.

...

Notable is the comment on that page that regarding syslog that: "If your local server steps the time ahead or back suddenly, it will report that here also with a message such as: time reset -6.394626 s." I am not seeing anything like this logged.

Indeed.

...

I have a nagging feeling that this nothing to do with ntp. I've just killed ntp to see what happens to the clock in its absence.

Thanks for the help!

Dunno how much help I'm actually being! Don't really know what else to suggest - sorry! :-(

Steve

Mark Rogers

2:52 p.m.

On 11/05/12 14:41, steve-ALUG@hst.me.uk wrote:

...

ISTR something about NTP having trouble if the H/W clock and system time were vastly different, but I'd have thought it could cope with 15 mins, but more to the point, why are they different if NTP is working? Can you manually set both to more or less the correct time, and then see what happens?

I've just been reading about how the two are synchronised. As I understand it the kernel has an option to sync the hwclock to system clock every 11 minutes, which is disabled by default but enabled by things like ntp. So that should keep the system clock up to date.

However when I posted, it was hwclock (not system clock) that was right.

What I am sure about is that apart from at startup, hardware clock plays in part in the system operation, so it could do whatever it likes and make no difference other than the clock being wrong on startup (until ntp corrects it). So that should rule out a hardware clock issue (such as battery).

...

http://linux.die.net/man/8/ntpd if you kill the deamon, run it from a prompt, and use -d or -dd or -ddd etc for more levels of debug logging.

OK, that'll be the next thing I try after turning ntp off to see if the problem goes away (it's been OK for about 20 mins but that's not long enough to test).

...

As to logging to a separate file, I misread the config file - you can log the statistics to a file if you want by having statsdir /var/log/ntpstats/ in the conf file. Dunno if that would help though.

Tesco would say every little helps so I'll try it, not that they're much of a role model!

...

Dunno how much help I'm actually being! Don't really know what else to suggest - sorry! :-(

This is helping, believe me.

Mark

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Mark Rogers

3:07 p.m.

On 11/05/12 14:41, steve-ALUG@hst.me.uk wrote:

...

if you kill the deamon, run it from a prompt, and use -d or -dd or -ddd etc for more levels of debug logging.

Unless when you try it you get: The ``debug-level'' option has been disabled -- this package was built using 'configure --disable--debug'

The clock does seem[*] very stable now that I have disabled ntp. Are there any obvious alternatives to ntp?

[*] Too early to tell if I'm honest but I'd like to have something resembling a "fix" by the end of the day so i can leave it running over the weekend as a decent test.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

Mark Rogers

3:53 p.m.

On 11/05/12 15:07, Mark Rogers wrote:

...

Are there any obvious alternatives to ntp?

apt-get install chrony Seems to be working better but too early to tell for sure.

-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER

4792

Age (days ago)

4792

Last active (days ago)

main@lists.alug.org.uk

9 comments

2 participants

tags (0)

participants (2)

Mark Rogers
steve-ALUG＠hst.me.uk