View previous topic :: View next topic |
Author |
Message |
jpsollie Guru

Joined: 17 Aug 2013 Posts: 324
|
Posted: Wed Apr 16, 2025 7:02 am Post subject: malfunctioning standbyscript: any ideas? |
|
|
To make my system more energy-efficient, I made a standbyscript where I put the Gentoo-made NAS in standby.
The idea is "send a WoL packet when you need the NAS, otherwise simply keep it suspended"
This script is meant to say "umount all filesystems with soft raid and stop the raid devices",
this avoids any "device kicked out" at resume, which is pretty annoying (and requires rebuilding the whole array).
at startup, the script is meant to do:
"carefully wait for all scsi devices to be up and running again (20 seconds), then carefully try to reassemble devices, and remount filesystems on these md arrays if they were mounted previously"
so, here's the script:
Code: |
#!/bin/bash
#umount filesystems having a /dev/md device or in /mnt
mp=$(cat /etc/fstab | grep -E "(/mnt|/dev/md)" | cut -d$'\t' -f2)
wasmounted=()
for i in $mp; do if [[ "$(grep $i /proc/mounts)" ]]; then umount $i; wasmounted+=("$i"); fi; done
#stop md raid arrays
for i in /dev/md[0-9]; do mdadm --stop $i; done
# print date for debugging,
# sleep and wait 20s after wakeup, so all sd* devices on the scsi bus have been reattached properly
date
echo mem > /sys/power/state
sleep 20
#reassemble raid arrays
for i in $(cat /etc/mdadm.conf | grep '^ARRAY' | cut -d' ' -f2 ); do mdadm --assemble --scan --no-degraded $i; done
#remount devices
for i in ${wasmounted[@]}; do mount $i; done
|
when launching it from a SSH session, it works perfectly: if I send an WoL packet to the NAS, it wakes up and everything is the way it should have been.
but ... from crontab, it does not!
I arrach a dmesg with 3x this script trying to suspend the system:
https://pastebin.com/E8Ndwr2x
The first (and failed) suspend script execution is at line 2402, ending resume at line :
Code: |
[26419.594009] [ T9148] bcachefs (b1fe4470-e6b8-4aab-a557-998404619502): clean shutdown complete, journal seq 1853520
...
[26559.452664] [ T9703] bcachefs (fde6c4aa-7e4c-4429-b7e0-98a10224feb4): delete_dead_inodes... done
|
The second one (where I had to wakeup the system via a WoL packet)(line 3171 -> 3925):
Code: |
[55650.925550] [T14940] bcachefs (b1fe4470-e6b8-4aab-a557-998404619502): clean shutdown complete, journal seq 1853525
...
[55818.422715] [T15559] bcachefs (fde6c4aa-7e4c-4429-b7e0-98a10224feb4): delete_dead_inodes... done
|
and the third one (where I left > 5m between shutdown and resume, just to be sure)(line 3936 -> 4662):
Code: |
[56105.726554] [T15679] bcachefs (b1fe4470-e6b8-4aab-a557-998404619502): clean shutdown complete, journal seq 1853526
...
[56169.457662] [T16289] bcachefs (fde6c4aa-7e4c-4429-b7e0-98a10224feb4): delete_dead_inodes... done
|
As I thought "hey, this works properly", I put it in /etc/crontab:
Code: |
# Global variables
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# For details see man 5 crontab
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
*/5 * * * * root anacron -s
30 23 * * * root /opt/standbyscript.sh
...
|
... and in the morning, I saw the suspend operation (1st attempt in my dmesg log) failed, the system was completely up-and-running.
does anybody have an idea why the command in crontab doesn't work but if executed via ssh, it works?
any idea would be welcome, even the weirdest one!
*EDIT* to check how long it suspended, I checked my dhcpd log (it is on a not-useful loglevel, but it currently is):
Code: |
Apr 15 23:23:10 linuxserver dhcpd[4032]: DHCPACK on 192.168.177.102 to 10:92:66:88:ed:d6 via eth0
Apr 15 23:29:11 linuxserver dhcpd[4032]: DHCPDISCOVER from 00:68:eb:37:78:0f (portablejp) via eth0
Apr 15 23:29:11 linuxserver dhcpd[4032]: DHCPOFFER on 192.168.177.84 to 00:68:eb:37:78:0f (portablejp) via eth0
Apr 15 23:29:11 linuxserver dhcpd[4032]: DHCPREQUEST for 192.168.177.84 (192.168.177.29) from 00:68:eb:37:78:0f (portablejp) via eth0
Apr 15 23:29:11 linuxserver dhcpd[4032]: DHCPACK on 192.168.177.84 to 00:68:eb:37:78:0f (portablejp) via eth0
Apr 15 23:29:11 linuxserver dhcpd[4032]: bind update on 192.168.177.84 from costadelsollie rejected: incoming update is less critical than outgoing update
Apr 15 23:29:11 linuxserver dhcpd[4032]: Added new forward map from portablejp.costadelsollie.home.arpa to 192.168.177.84
Apr 15 23:29:11 linuxserver dhcpd[4032]: Added reverse map from 84.177.168.192.in-addr.arpa. to portablejp.costadelsollie.home.arpa
Apr 15 23:29:13 linuxserver dhcpd[4094]: Information-request message from fe80::4ac:53e7:ebcc:d185 port 546, transaction ID 0x5A9C1100
Apr 15 23:29:13 linuxserver dhcpd[4094]: Sending Reply to fe80::4ac:53e7:ebcc:d185 port 546
Apr 15 23:59:50 linuxserver dhcpd[4032]: timeout waiting for failover peer costadelsollie
Apr 15 23:59:50 linuxserver dhcpd[4032]: peer costadelsollie: disconnected
Apr 15 23:59:51 linuxserver dhcpd[4032]: failover peer costadelsollie: I move from normal to communications-interrupted
Apr 15 23:59:51 linuxserver dhcpd[4032]: DHCPDISCOVER from dc:e5:5b:6a:06:5a via eth0
Apr 15 23:59:51 linuxserver dhcpd[4032]: DHCPOFFER on 192.168.177.176 to dc:e5:5b:6a:06:5a via eth0
Apr 15 23:59:51 linuxserver dhcpd[4032]: DHCPREQUEST for 192.168.177.176 (192.168.177.30) from dc:e5:5b:6a:06:5a via eth0
Apr 15 23:59:51 linuxserver dhcpd[4032]: DHCPACK on 192.168.177.176 to dc:e5:5b:6a:06:5a via eth0
Apr 15 23:59:51 linuxserver dhcpd[4032]: failover peer costadelsollie: peer moves from normal to communications-interrupted
Apr 15 23:59:51 linuxserver dhcpd[4032]: failover peer costadelsollie: I move from communications-interrupted to normal
Apr 15 23:59:51 linuxserver dhcpd[4032]: balancing pool 556faf3570c0 192.168.177.0/24 total 121 free 45 backup 67 lts 11 max-own (+/-)11
Apr 15 23:59:51 linuxserver dhcpd[4032]: balanced pool 556faf3570c0 192.168.177.0/24 total 121 free 45 backup 67 lts 11 max-misbal 17
Apr 15 23:59:51 linuxserver dhcpd[4032]: Sending updates to costadelsollie.
Apr 15 23:59:51 linuxserver dhcpd[4032]: failover peer costadelsollie: peer moves from communications-interrupted to normal
Apr 15 23:59:51 linuxserver dhcpd[4032]: failover peer costadelsollie: Both servers normal
|
so it looks like the system went to sleep for half an hour ... and then woke up.
So, what happened here?
*EDIT2*:
thinking about "it may be an event triggered after half an our / every hour (0:00, 1:00, ...), I put the system to sleep at 9h25, and woke up via WoL at 10:10, so it was probably not a timeout issue.
running unside screen:
Code: |
linuxserver /var/log # sleep 30; echo "beginning standby"; date; /opt/standbyscript.sh; echo "ending standby"; date;
beginning standby
wo 16 apr 2025 09:25:34 CEST
mdadm: stopped /dev/md0
mdadm: stopped /dev/md1
mdadm: stopped /dev/md2
mdadm: stopped /dev/md3
wo 16 apr 2025 09:27:03 CEST
mdadm: /dev/md0 has been started with 4 drives and 1 spare and 1 journal.
mdadm: /dev/md2 has been started with 5 drives.
mdadm: /dev/md3 has been started with 6 drives and 5 spares and 1 journal.
mdadm: /dev/md1 has been started with 10 drives and 2 spares and 1 journal.
ending standby
wo 16 apr 2025 10:10:29 CEST
|
_________________ The power of Gentoo optimization (not overclocked): [img]https://www.passmark.com/baselines/V10/images/503714802842.png[/img] |
|
Back to top |
|
 |
RumpletonBongworth Tux's lil' helper


Joined: 17 Jun 2024 Posts: 104
|
Posted: Thu May 08, 2025 2:35 pm Post subject: |
|
|
I can discern no obvious reason for the script to behave differently where executed by crond. That being said, I have a few suggestions. Perhaps one of them will help you to get the bottom of the matter.
Firstly, you can trace your script's execution by enabling xtrace.
Code: | #!/bin/bash
PS4='+$BASH_SOURCE:$LINENO:$FUNCNAME: '
set -x |
The resulting diagnostic messages will be conveyed to STDERR. Owing to the fact that your crontab defines MAIL=root, crond will attempt to deliver this output to the root user's mailbox by invoking sendmail(1). However, if you do not have a working sendmail implementation, or if has not been configured correctly, this output may be lost, or will perhaps end up as a "dead.letter" in the home directory of the applicable user. Still, you are free to have your script direct its STDERR to wherever you please. For example:
Code: | exec 2>>"$HOME/standbyscript.log" |
Alternatively, both STDOUT and STDERR:
Code: | exec >>"$HOME/standbyscript.log" 2>&1 |
Secondly, it might be interesting to determine whether there is any difference in behaviour in the case that your script is backgrounded and disowned by the initial SHELL that crond spawns:
Code: | 30 23 * * * root /opt/standbyscript.sh & disown |
Thirdly, your script has several defects. You may determine what most of them are by evaluating it at shellcheck.net. Here is a rewrite that should be a little more robust, and which also introduces (some) error checking. In turn, that may help you to debug more effectively.
Code: | #!/bin/bash
declare -a md_devs mounted
while read -r source target; do
printf -v target %b "$target"
if [[ $source == /dev/md+([0-9]) ]]; then
md_devs+=("$source")
elif [[ $target != /mnt/* ]]; then
continue
fi
mounted+=("$target")
done < <(findmnt --real -rno source,target)
for i in "${mounted[@]}"; do
umount "$i" || exit
done
for i in "${md_devs[@]}"; do
mdadm --stop "$i" || exit
done
date
echo mem > /sys/power/state
sleep 20
for i in "${md_devs[@]}"; do
mdadm --assemble --scan --no-degraded "$i" || exit
done
for i in "${mounted[@]}"; do
mount "$i" || exit
done |
|
|
Back to top |
|
 |
szatox Advocate

Joined: 27 Aug 2013 Posts: 3650
|
Posted: Thu May 08, 2025 4:18 pm Post subject: |
|
|
Quote: | I can discern no obvious reason for the script to behave differently where executed by crond. | Cron runs its jobs with a very minimal env. This is pretty damn good reason for things to break. Notably PATH doesn't cover all directories you might expect.
jpsollie, if your script prints anything at all to stdout or stderr, cron should collect it and email to its owner account (or whatever address you defined). Check that mail, or modify your script (or crontab) to log it to a file to find out what's wrong.
You might also try adding -l at the end of your shebang. If it's about env, going through a proper setup might be all you need to correct it. _________________ Make Computing Fun Again |
|
Back to top |
|
 |
RumpletonBongworth Tux's lil' helper


Joined: 17 Jun 2024 Posts: 104
|
Posted: Thu May 08, 2025 4:25 pm Post subject: |
|
|
szatox wrote: | Quote: | I can discern no obvious reason for the script to behave differently where executed by crond. | Cron runs its jobs with a very minimal env. This is pretty damn good reason for things to break. Notably PATH doesn't cover all directories you might expect. |
The provided crontab(5) defines a reasonable default PATH.
Code: | PATH=/sbin:/bin:/usr/sbin:/usr/bin |
None of the utilities executed by the script should be especially sensitive to the leaner environment created by crond. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|