quite right, so I will be revisiting. However it may well be suitable for other
people so I'll write it up.
What is it suitable for?
A single NAS, with a single UPS and one or more client computers.
What it does.
Following a power cut, shuts down the NAS, then shuts down the client computers.
What I want/need it to do (version2, coming later)
- Right after Power cut, shutdown (power hungry) Qnap
- When power becomes "critical" shutdown the computer (server)
- At the Server shutdown, turn off the UPS
- As power comes back, Qnap and server computer both reboot (e.g. boot at power on)
- If power is restored before computer & UPS have shutdown, bring Qnap back up (e.g. using WoL)
Step 1: Wiring it up
- APC BX700U USB based UPS
- Qnap TS-451+ (Draws 33w when disks spun down about 44w when spun up) on protected power out
- Zotac Intel based box (Draws 9w) Running Buster on protected power out
- Netgear 16 way Gigabit Ethernet switch (Draws 6w) on protected power out
- Various network hardware (e.g switch , WiFi AP) on protected power out
to B type cable].
In general you need to ensure the network switch is protected (by the UPS) in
order for the notifications to be sent. However in this case it will work even
if the switch is unprotected as it is triggered by loss of access to the UPS.
(this is a bad way to do it)
How I imagine the UPS software (NUT) works on the Qnap
I suspect there are some more subtle interactions, but this is my simplified
perspective:
Three parts
- client
- server
- monitor
(SNMP, which can be confusing as this is used later, as we shall see). The
client(s) can be used to talk to the server (on any machine). The Monitor is a
specialised client which checks the status of the server an takes actions,
e.g. Mains lost, schedule a shutdown.
The software I run on the Zotac is NUT
this is available in binary in usual repositories
along with guides. The software running on the Qnap appears to be a modified copy
of this. One place where it's been modified is "access" (which machines can connect).
The monitor needs, in part at least, to actively poll the server; because if the
server vanishes (e.g. shutdown, unplugged etc) it can't simply wait to be told.
In version1 (this one) the NAS runs the NUT server. It is configured via the QNAP web interface.
In Qnap web interface, either go via "external devices" -> "UPS" or simply select the external devices
at the top of the page.
- Select a short (1 minute) period. The NAS will begin shutdown 1 minute after power-loss (it can take quite a while , so don't delay, but it might just be a glitch)
- Enable it as a UPS Master. The Master is physically connected to USP (e.g. via USB) and is generally shutdown last but I am abusing this in version1.
- Add the IP address of machines you want to allow to access this "server" **
controlled (shutdown) machines. So I need to add Zotec which I want to
shutdown and also my client machine where I want to run clients, such as
checking the status, level of charge etc. This feature is a Qnap added
feature it's not used in the original NUT configuration files, which use
certificates to control access.
The Above settings get saved (on the Qnap) in:
/etc/config/ups/upsd.conf
Code: Select all
...
ACL all 0.0.0.0/0
ACL localhost 127.0.0.1/32
ACL client_1 aa.bb.cc.dd/32
ACL client_2 ee.ff.gg.hh/32
ACCEPT localhost
ACCEPT client_1
ACCEPT client_2
REJECT all
MAXAGE 20
# =======================================================================
# MAXAGE <seconds>
# MAXAGE 15
#
# This defaults to 15 seconds. After a UPS driver has stopped updating
# the data for this many seconds, upsd marks it stale and stops making
# that information available to clients. After all, the only thing worse
# than no data is bad data.
#
# You should only use this if your driver has difficulties keeping
# the data fresh within the normal 15 second interval. Watch the syslog
# for notifications from upsd about staleness.
LISTEN 0.0.0.0
Code: Select all
[admin]
password = 123456
allowfrom = localhost
actions = SET
instcmds = ALL
upsmon master # or upsmon slave
"admin" not the system admin user (AKA root) however, as we see later it really
is admin(root).
The file (on Qnap) /etc/config/ups/upsmon.conf, contains:
Code: Select all
# This user should not have write access to upsmon.conf.
#
# RUN_AS_USER nutmon
RUN_AS_USER admin
it runs as "RUN_AS_USER" (typically called nut or nutmon) which as per the
comment has limited access. The Qnap config uses "admin", so pretty much wide
open and only protected by a fixed password...sigh.
The file (on Qnap) /etc/config/ups/upsmon.conf, contains:
Code: Select all
# "master" means this system will shutdown last, allowing the slaves
# time to shutdown first.
#
# "slave" means this system shuts down immediately when power goes critical.
#
# Examples:
#
# MONITOR myups@bigserver 1 monmaster blah master
# MONITOR su700@server.example.com 1 upsmon secretpass slave
MONITOR qnapups@localhost 1 admin 123456 master
on the same machine), this is the key to monitor this UPS from another
system we will need a version of the above line, but suitably modified to use the IP address
of the Qnap.
Code: Select all
MONITOR qnapups@nas 1 admin 123456 master
name or IP address [more later]).
Finally the file /etc/config/ups/ups.conf (on Qnap) :
Code: Select all
[qnapups]
driver = usbhid-ups
port = /dev/ttyS1
desc = "Workstation"
pollinterval=1
identifies itself as a "Workstation" rather than say "NAS" is indicative of some
laziness on the part of Qnap development.
So a pretty straight forward setup:
- daemon interacts with /dev/ttyS1 (USB port) using the driver usbhid
- monitor process, running on Qnap, listen/talks to the above daemon and takes action
- two client systems are allowed to talk to the daemon, e.g. query status
does not appear to be documented anywhere, including in nut. However it
looks like SNMPTRAPD.CONF(5)
So looking @ /etc/config/ups_snmptrapd.conf (on Qnap):
So looking up those OID in https://github.com/humphries40/APC-UPS- ... /traps.csv:#
# use fully qualified prefix... just to be safe
#
traphandle .1.3.6.1.4.1.318.0.5 /sbin/ups_snmptrap_handler POWER_LOST
traphandle .1.3.6.1.4.1.318.0.9 /sbin/ups_snmptrap_handler POWER_RESTORED
traphandle .1.3.6.1.4.1.318.0.7 /sbin/ups_snmptrap_handler BATTERY_LOW
traphandle .1.3.6.1.4.1.318.0.1 /sbin/ups_snmptrap_handler COMM_LOST
traphandle .1.3.6.1.4.1.318.0.8 /sbin/ups_snmptrap_handler COMM_RESTORED
NB, upsOnBattery also occurs during battery calibration, so don't really trust it on it's own.upsOnBattery,"WARNING: The UPS has switched to battery backup power.",...,WARNING,OPERATIONAL,1.3.6.1.4.1.318.0.5
powerRestored,"INFORMATIONAL: Utility power has been restored.",...,INFORMATIONAL,OPERATIONAL,1.3.6.1.4.1.318.0.9
lowBattery,"SEVERE: The UPS batteries are low and will soon be exhausted....",...,SEVERE,DEGRADED,1.3.6.1.4.1.318.0.7
communicationLost,"SEVERE: Communication to the UPS has been lost. ...",...,SEVERE,DEGRADED,1.3.6.1.4.1.318.0.1
communicationEstablished,"INFORMATIONAL: Communication with the UPS has been established.",...,INFORMATIONAL,OPERATIONAL,1.3.6.1.4.1.318.0.8
So, the above are the events relating to UPS status changes.
The usage of ups_snmptrap_handler is:
Code: Select all
# /sbin/ups_snmptrap_handler -help
Unrecognized Option: -help
Usage: /sbin/ups_snmptrap_handler BATTERY_LOW | POWER_LOST | POWER_RESTORED | COMM_LOST | COMM_RESTORED
using: /sbin/ups_snmptrap_handler COMM_RESTORED generates an error:
Code: Select all
[/etc/config/ups] # /sbin/ups_snmptrap_handler COMM_RESTORED
sh: /etc/init.d/genpowerfail.sh: Permission denied
Code: Select all
-rw-r--r-- 1 admin administrators 9802 2021-04-05 18:41 /etc/init.d/genpowerfail.sh
execute bit not being set. However before running it blind, I replace it with a
"trace" script:
Modified /etc/init.d/genpowerfail.sh:
Code: Select all
[/etc/config/ups] # cat /etc/init.d/genpowerfail.sh
#!/bin/sh
echo "Called at $(date) with args $0 $@ " >> /tmp/genpowerfail.log
exit 0
So, it can be seen for a "power/UPS restore type trap" it ends up passing "stop" , slightly$ /sbin/ups_snmptrap_handler COMM_RESTORED
$ /sbin/ups_snmptrap_handler POWER_RESTORED
$ tail /tmp/genpowerfail.log
# tail /tmp/genpowerfail.log
Called at Tue Apr 27 11:46:19 GMT 2021 with args /etc/init.d/genpowerfail.sh stop
Called at Tue Apr 27 11:48:52 GMT 2021 with args /etc/init.d/genpowerfail.sh stop
counter-intuitively, but would have ended up doing:
Code: Select all
/bin/kill `/bin/pidof poweroff` 2>/dev/null
the Qnap will continue to shutdown. This needs some testing but it looks like if the mains
in interupted and restored the Qnap may continue to shutdown.
Ready to rock & Roll
So, I've changed no files (permanently), so this just what happens when you use the
Qnap web interface.
To test it, I switch off the mains, the UPS beeps a lot, one minute later the Qnap beeps
and starts it's shutdown. It takes a few minutes and the UPS battery is at 93% by the time
it done. The UPS itself remains powered up and the Zotec is left running.
So, next step, bring it back up and go to the Zotec. Install some packages:
- nut
- nut-client
- nut-doc
- nut-monitor
- nut-server
- nut-snmp
Code: Select all
[dummy]
driver = dummy-ups
#port = dummyups.seq
port = dummyups.dev
desc = "dummy-ups in dummy mode"
Zotec (not the Qnap) this is helpful for debugging. It uses two more files
in /etc/nut:
dummyups.seq
Code: Select all
ups.status: OL
TIMER 60
ups.status: OB
TIMER 60
ups.status: OB LB
TIMER 60
Code: Select all
upsc qnapups@nas > /etc/nut/dummyups.dev
dummpups.seq cycles the dummyups between the three states. Whereas
dummpups.dev keeps the same status unless you edit the file, so you can simulate
many different changes.
Edit the file upsd.users and add the stanza:
Code: Select all
[admin]
password = 123456
allowfrom = localhost
actions = SET
instcmds = ALL
upsmon master
password 123456 on the Zotec. The idea is this "just like" the user on the
Qnap (but we won't be using root!)
Finally, the critical change, the file /etc/nut/upsmon.conf:
Add these two lines (only the first is strictly necessary):
Code: Select all
MONITOR qnapups@nas.mydomain 1 admin 123456 slave
MONITOR dummy@localhost 1 admin 123456 master
Here we monitor 2 different UPS. qnapups@nas.mydomain is the real one on the
Qnap (this is the "hardwired" name+password we saw above), whereas
dummy@localhost is, as it sounds, a dummy UPS. You MAY want to remove the dummy
once you are happy. Probably commenting it out is better as you can switch it
back on for debugging without needing to keep "pulling the plug".
Now, that's enough, because; what happens now is, the power is cut, the Qnap shuts down
this makes the ups daemon (so so UPS) uncontactable, the Zotec then shuts down
because is senses the loss of the UPS.
One question is why does the Zotec shutdown just because it lost contact with
the UPS? The man page for upsmon(8):
Since the upsmon.conf on the Zotec is marked as "slave" my guess is loss ofWhen upsmon runs as a slave, it is relying on the distant system to tell it about the state of
the UPS. When that UPS goes critical (on battery and low battery), it immediately invokes the
local shutdown command. This needs to happen quickly. Once it disconnects from the distant
upsd(8) server, the master upsmon will start its own shutdown process. Your slaves must all shut
down before the master turns off the power or filesystem damage may result.
contact with the UPS is treated as "Critical" (since it can no longer detect low
battery). So we are rather abusing the system. The "master" just shuts down
after 1 minute (due to GUI setting on Qnap) the Zotec shuts down because it
belives itself to be a slave and is trying to shutdown before that master.
This is a little basic (and the wrong way around) version 2 of this document
will address doing it the right way round. However there are a few other things
to look at configuring because you might want a more nuanced configuration.
In /etc/nut/upssched.conf, some edits:
Code: Select all
CMDSCRIPT /bin/upssched-cmd
PIPEFN /var/run/nut/upssched.pipe
LOCKFN /var/run/nut/upssched.lock
AT COMMBAD * START-TIMER upsgone 10
AT ONBATT * START-TIMER upsbattery 10
AT FSD * START-TIMER upsforced 10
AT NOCOMM * START-TIMER upsnocomm 10
AT ONLINE qnapups@nas.mydomain CANCEL-TIMER upsgone
/bin/upssched-cmd based on various SNMP events (e.g running on battery)
take care, the system can run on battery during calibration.
Using this allows a lot of flexibility taking different actions based on
particular events. This will be revisited more in version 2. Additionally the
script /usr/lib/systemd/system-shutdown/nutshutdown (in certain circumstances)
uses UPSDRVCTL(8) to power down the UPS; at least I think that's what it
will do