After being home for weeks, I went away for business, the 1st night away there was a brief powercut and the firewall (on a UPS) seemed to get stuck.

So, that’s no DNS, DHCP, or connectivity between wifi and LAN… All due to (admittedly aging) hardware issue.

Since then my entire home system has had issues whilst it all settles down.

It made me think about getting some redundancy into the system to handle a single failure.

So,.can you give me any insights into High Availability like CARP (for pfSense), VM failover (on Incus?), mesh wifi, Home Assistant, etc?

Of course there are going to be single points, like ISP line, etc, but seems like something to test out.

  • neidu3@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 hours ago

    Both dhcpd and bind supports failover.

    If you want to have failover storage you might want to look into beegfs, as storage targets can be mirrored across hosts.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    13 hours ago

    There’s a lot of layers here, so let me work backwards from the edge, inward:

    1. You lost power, so you probably lost internet if your endpoint hardware was not also on a UPS. Nothing is going to stop that unless you get a multi-WAN router, and an LTE backup on standby. Probably not worth the cost.

    2. You shouldn’t have lost DNS or DHCP for your local network just because of a reboot. Something is wrong with your setup, and we’d need more info about said setup to say more, but generally these services are stateful for the most part, and shouldn’t lose state on reboot IF you have them configured properly for your local domains, like a DNS forwarded, and static reservations on DHCP for local devices.

    3. You don’t need HA for all your services. You need to fix the issues with your services not running properly with interruptions. The specific services you mentioned don’t behave poorly of they die and come back in properly configured environments.

    4. If you have a UPS in your home, all devices connected to UPS should be getting information about the status of said UPS and shutdown cleanly when thresholds are met. Install NUT somewhere, and upsmon on all your hosts to properly issue shutdown signals when you lose power, and the UPS starts discharging. The thresholds you set for this are up to you.

    In general, you don’t need to overthink HA, you need to focus instead on your services recovering gracefully in these situations. Spending insane amounts of time and money to make highly available services for your media and home automation will only leave you having spent resources and realizing there is no way to ever get to 100% uptime without flaws somewhere.

    • SayCyberOnceMore@feddit.ukOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      12 hours ago

      Good points there.

      For 1. The ISP router is a Fritz one set to bridge mode running over a PoE adapter from the same UPS the firewall is using. It stayed up all the time (looking back at the logs)

      1. Not sure what happened here, but the firewall is the DNS resolver and when everything else powered back up, nothing got an IP address. Now, whether thw service failed or the WAPs took longer to start than the devices could wait, I’m not sure, but as Scotty said: it’s dead Jim.

      2. Good point. I don’t need it ALL to be redundant.

      3. Also good. The UPS is directly connected to the firewall (which has NUT in), but it doesn’t inform anything else… I’ll look into that too.

      Nice mental reset for me about over thinking it… thanks

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 hours ago
        1. Okay, so no issues there
        2. DHCP handles the address assignments in your network, not DNS. DNS resolves to named host queries. If no devices got IP addresses, that’s one problem. If you couldn’t resolve public hosts like www.news.com, that’s a DNS problem. If you couldn’t resolve INTERNAL named hosts you refer to around your network, then that’s also DNS, but a different problem.

        My hunch here is that you MIGHT be using a named host as your DNS resolves instead of an IP address in your network, OR, for some reason your DNS resolves doesn’t have a static address. Never use named hosts to point to network services, and all network services need a static IP, so go and check all of that.

    • Prove_your_argument@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      13 hours ago

      I have a multi wan SMB router. 945mbit throughput. $60 new.

      TPLink omada or Ubiquiti tier stuff is all you really need for small business.

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        12 hours ago

        Well…no, and this is what I’m saying.

        Every downstream issue you try to solve with redundancy has a doubled and duplicate cost to it’s upstream. Internet links, load balancers for web services, and in this specific situation, UPS’s.

        Throwing more servers at a homelab with no power is just wasting money without more UPS power in the mix. You have 4 servers, and want HA for everything on your network, expect to have two of everything, including UPS units.

        This is the n* sunken cost of redundancy at its core, and in your example, you’re assuming this person even had a generator or whatever, but even if they did, they’d need an even BIGGER generator to run all this stuff.

        That’s why my points deal with solving for what they have and making it work better than, instead, immediately jumping to adding more and more and more to the stack. It’s just not necessary when all they want is a graceful recovery to power loss.

  • spaghettiwestern@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    11 hours ago

    I had a similar failure while I was out of the country for a month. My Raspberry Pi didn’t come back after a power blink. Home Assistant, Wireguard tunnels, security cameras, Jellyfin, Syncthing backup and DNS all failed until I returned. After looking at possible solutions I ruled out buying redundant hardware because of the cost, and more importantly the time and complexity of implementing and maintaining everything.

    Instead I bought a small, relatively inexpensive laptop and a router with plenty of processing power and memory. I moved my Wireguard endpoints, DHCP and DNS server to the router and everything else to the laptop and disconnected my UPS completely.

    If the router is up WG connectivity, DNS, DHCP and wifi are up. The router does reset on power failure, but my ISP has no local power backup so Internet is out until power is restored anyway.

    This laptop loafs along at 10 watts and costs about $2 per month to operate despite our high electric rates. My old UPS drew 75 watts most of the time even when there was nothing plugged in and cost more than $16/month to run. The laptop’s battery is firmware limited to a 70% charge so the battery will last years without degrading and making other battery issues unlikely. It provides 7 hours of operation if power fails compared to an optimistic 20 minutes for the UPS. Power blinks (and there have been plenty) have no effect on the laptop at all.

    I’ve been happy with this configuration. It has worked flawlessly for almost 2 years.

  • CompactFlax@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    5
    ·
    14 hours ago

    Low tech options: a smart plug that power cycles if it can’t ping eg google and have your edge devices plugged in there, or a timer that reboots the firewall at 0200 daily.

    I have decided dual firewalls are silly without dual internet and dual power, as both those things go down more often than my FW.

    I have two instances of pihole on two hosts, because I block dns outbound to the best of my ability.

  • plateee@piefed.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    12 hours ago

    For me, I have three proxmox nodes that are configured to restart VMs and LXC containers if a host goes offline. There’s a Palo Alto pa-440 for my fw/router and a brocade switch (they were something work gave me for practicing for a network exam).

    The nodes, Palo, brocade, and AT&T modem are all on two UPS 1500va systems along with my wifi ap. Run time in case of power loss is around an hour.

    I’m this close to getting a comprehensive shutdown script working from a raspberry pi that is triggered if there’s power loss (most UPS systems have some capability to trigger scripts on a host that’s connected to the UPS’s console port).

    If I can get that script working, the battery backup will run a PI for several days.

    Back on the redundancy side, I host two PowerDNS systems in the proxmox cluster along with a 3 node/LXC container Vault.

  • JovialSodium@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    13 hours ago

    Not a redundancy option, but I also had my routing setup go in to a bad state while traveling which was a hassle.

    I solved this by setting up nightly reboots while away. Both my routing PC and modem are rebooted by smart switches (zwave/ZigBee) controlled by home assistant. Which means they’ll operate without a working network. The routing PC is set to shutdown one minute before the smart switch turns off, and set to boot automatically when power is restored (smart switch turns back on). Which avoids any issues with hanging on a reboot.

  • Onomatopoeia@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    1
    ·
    14 hours ago

    There’s so many ways to skin this cat, you may want to start with identifying the most crucial single failure point that concerns you.

    Is it the router? Best you can really do is have good hardware and make sure it and your modem are on a UPS.

    I’d it’s an ISP-provided modem, some enable remote management via a phone app, which can be done from anywhere by signing in to your ISP account.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    10 hours ago

    Two firewalls in HA.

    You can also get a little Cradlepoint or something with a SIM card as a backup Internet connection if you need uplink redundancy.

    • yardratianSoma@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      9 hours ago

      thanks for this . . . .I keep worrying about security, hardening the system … etc, but forget about the essentials: power and networking. #1 priority for me is to get a UPS this year, once I find a job, that is

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    13 hours ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    DHCP Dynamic Host Configuration Protocol, automates assignment of IPs when connecting to a network
    DNS Domain Name Service/System
    HA Home Assistant automation software
    ~ High Availability
    SMB Server Message Block protocol for file and printer sharing; Windows-native

    [Thread #49 for this comm, first seen 31st Jan 2026, 18:40] [FAQ] [Full list] [Contact] [Source code]