System Redundancy

SayCyberOnceMore@feddit.uk · 23 days ago

System Redundancy

just_another_person@lemmy.world · 23 days ago

There’s a lot of layers here, so let me work backwards from the edge, inward:

You lost power, so you probably lost internet if your endpoint hardware was not also on a UPS. Nothing is going to stop that unless you get a multi-WAN router, and an LTE backup on standby. Probably not worth the cost.
You shouldn’t have lost DNS or DHCP for your local network just because of a reboot. Something is wrong with your setup, and we’d need more info about said setup to say more, but generally these services are stateful for the most part, and shouldn’t lose state on reboot IF you have them configured properly for your local domains, like a DNS forwarded, and static reservations on DHCP for local devices.
You don’t need HA for all your services. You need to fix the issues with your services not running properly with interruptions. The specific services you mentioned don’t behave poorly of they die and come back in properly configured environments.
If you have a UPS in your home, all devices connected to UPS should be getting information about the status of said UPS and shutdown cleanly when thresholds are met. Install NUT somewhere, and upsmon on all your hosts to properly issue shutdown signals when you lose power, and the UPS starts discharging. The thresholds you set for this are up to you.

In general, you don’t need to overthink HA, you need to focus instead on your services recovering gracefully in these situations. Spending insane amounts of time and money to make highly available services for your media and home automation will only leave you having spent resources and realizing there is no way to ever get to 100% uptime without flaws somewhere.

SayCyberOnceMore@feddit.uk · 23 days ago

Good points there.

For 1. The ISP router is a Fritz one set to bridge mode running over a PoE adapter from the same UPS the firewall is using. It stayed up all the time (looking back at the logs)

Not sure what happened here, but the firewall is the DNS resolver and when everything else powered back up, nothing got an IP address. Now, whether thw service failed or the WAPs took longer to start than the devices could wait, I’m not sure, but as Scotty said: it’s dead Jim.
Good point. I don’t need it ALL to be redundant.
Also good. The UPS is directly connected to the firewall (which has NUT in), but it doesn’t inform anything else… I’ll look into that too.

Nice mental reset for me about over thinking it… thanks

just_another_person@lemmy.world · 23 days ago

Okay, so no issues there
DHCP handles the address assignments in your network, not DNS. DNS resolves to named host queries. If no devices got IP addresses, that’s one problem. If you couldn’t resolve public hosts like www.news.com, that’s a DNS problem. If you couldn’t resolve INTERNAL named hosts you refer to around your network, then that’s also DNS, but a different problem.

My hunch here is that you MIGHT be using a named host as your DNS resolves instead of an IP address in your network, OR, for some reason your DNS resolves doesn’t have a static address. Never use named hosts to point to network services, and all network services need a static IP, so go and check all of that.

SayCyberOnceMore@feddit.uk · 23 days ago

Yep, all good with DHCP vs DNS… just my grammer was terrible.

Nothing was getting an IP from the DHCP, when the wifi returned…and… DNS was also not working for the few devices that still had an IP.

Sry bout the confusion there.

just_another_person@lemmy.world · 23 days ago

So then as a next step, I’d set Wireguard up on one of your regularly hosts, set it to filter for DHCP traffic, confirm you’re seeing regularly advertisements first, then reboot the device that’s responsible for DHCP and make sure it resumes sending those advertisements when it comes back.

If it’s the same device handling DNS, make sure it’s also immediately returning responses after the reboot as well with dig or nslookup.

towerful@programming.dev · 22 days ago

Wireshark*

just_another_person@lemmy.world · 22 days ago

Oops, yup

Prove_your_argument@piefed.social · 23 days ago

I have a multi wan SMB router. 945mbit throughput. $60 new.

TPLink omada or Ubiquiti tier stuff is all you really need for small business.

just_another_person@lemmy.world · 23 days ago

Well…no, and this is what I’m saying.

Every downstream issue you try to solve with redundancy has a doubled and duplicate cost to it’s upstream. Internet links, load balancers for web services, and in this specific situation, UPS’s.

Throwing more servers at a homelab with no power is just wasting money without more UPS power in the mix. You have 4 servers, and want HA for everything on your network, expect to have two of everything, including UPS units.

This is the n* sunken cost of redundancy at its core, and in your example, you’re assuming this person even had a generator or whatever, but even if they did, they’d need an even BIGGER generator to run all this stuff.

That’s why my points deal with solving for what they have and making it work better than, instead, immediately jumping to adding more and more and more to the stack. It’s just not necessary when all they want is a graceful recovery to power loss.