After being home for weeks, I went away for business, the 1st night away there was a brief powercut and the firewall (on a UPS) seemed to get stuck.
So, that’s no DNS, DHCP, or connectivity between wifi and LAN… All due to (admittedly aging) hardware issue.
Since then my entire home system has had issues whilst it all settles down.
It made me think about getting some redundancy into the system to handle a single failure.
So,.can you give me any insights into High Availability like CARP (for pfSense), VM failover (on Incus?), mesh wifi, Home Assistant, etc?
Of course there are going to be single points, like ISP line, etc, but seems like something to test out.


There’s a lot of layers here, so let me work backwards from the edge, inward:
You lost power, so you probably lost internet if your endpoint hardware was not also on a UPS. Nothing is going to stop that unless you get a multi-WAN router, and an LTE backup on standby. Probably not worth the cost.
You shouldn’t have lost DNS or DHCP for your local network just because of a reboot. Something is wrong with your setup, and we’d need more info about said setup to say more, but generally these services are stateful for the most part, and shouldn’t lose state on reboot IF you have them configured properly for your local domains, like a DNS forwarded, and static reservations on DHCP for local devices.
You don’t need HA for all your services. You need to fix the issues with your services not running properly with interruptions. The specific services you mentioned don’t behave poorly of they die and come back in properly configured environments.
If you have a UPS in your home, all devices connected to UPS should be getting information about the status of said UPS and shutdown cleanly when thresholds are met. Install NUT somewhere, and upsmon on all your hosts to properly issue shutdown signals when you lose power, and the UPS starts discharging. The thresholds you set for this are up to you.
In general, you don’t need to overthink HA, you need to focus instead on your services recovering gracefully in these situations. Spending insane amounts of time and money to make highly available services for your media and home automation will only leave you having spent resources and realizing there is no way to ever get to 100% uptime without flaws somewhere.
Good points there.
For 1. The ISP router is a Fritz one set to bridge mode running over a PoE adapter from the same UPS the firewall is using. It stayed up all the time (looking back at the logs)
Not sure what happened here, but the firewall is the DNS resolver and when everything else powered back up, nothing got an IP address. Now, whether thw service failed or the WAPs took longer to start than the devices could wait, I’m not sure, but as Scotty said: it’s dead Jim.
Good point. I don’t need it ALL to be redundant.
Also good. The UPS is directly connected to the firewall (which has NUT in), but it doesn’t inform anything else… I’ll look into that too.
Nice mental reset for me about over thinking it… thanks
My hunch here is that you MIGHT be using a named host as your DNS resolves instead of an IP address in your network, OR, for some reason your DNS resolves doesn’t have a static address. Never use named hosts to point to network services, and all network services need a static IP, so go and check all of that.
I have a multi wan SMB router. 945mbit throughput. $60 new.
TPLink omada or Ubiquiti tier stuff is all you really need for small business.
Well…no, and this is what I’m saying.
Every downstream issue you try to solve with redundancy has a doubled and duplicate cost to it’s upstream. Internet links, load balancers for web services, and in this specific situation, UPS’s.
Throwing more servers at a homelab with no power is just wasting money without more UPS power in the mix. You have 4 servers, and want HA for everything on your network, expect to have two of everything, including UPS units.
This is the n* sunken cost of redundancy at its core, and in your example, you’re assuming this person even had a generator or whatever, but even if they did, they’d need an even BIGGER generator to run all this stuff.
That’s why my points deal with solving for what they have and making it work better than, instead, immediately jumping to adding more and more and more to the stack. It’s just not necessary when all they want is a graceful recovery to power loss.