So I’m working on a server from home.
I do a cat /sys/class/net/eth0/operstate
and it says unknown
despite the interface being obviously up, since I’m SSH’ing into the box.
I try to explicitely set the interface up to force the status to say up
with ip link set eth0 up
. No joy, still unknown
.
Hmm… maybe I should bring it down and back up.
So I do ip link set eth0 down
and… I drive 15 miles to work to do the corresponding ip link set eth0 up
50 years using Unix and I’m still doing this… 😥
Remember what Bruce Lee said:
I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.
Did this once on a router in a datacenter that was a flight away. Have remembered to set the reboot in future command since. As I typed the fatal command I remember part of my brain screaming not to hit enter as my finger approached the keyboard. 🤦♂️
Have remembered to set the reboot in future command since
That’s not a bad idea actually. I’ll have to reuse that one. Thanks!
This.
Do it. This saved my life on more than one occasion.
You’ll think “nah, it’ll be fine” and then at 11pm when your brain’s fried on vending machine coffee you’ll be glad that you did it… 3 times over…
A few months ago I accidentally dd’d ~3GiB to the beginning of one of the drives in a 4 drive array… That was fun to rebuild.
Like 3 weeks ago on my (testing) server I accidentally DD’d a Linux ISO to the first drive in my storage array (I had some kind of jank manual “LVM” bullshit I set up with odd mountpoints to act as a NAS, do not recommend), no Timeshift, no Btrfs snapshot. It gave me the kick in the pants I needed to stop trying to use a macbook air with 6 external hard drives as a server though. Also gave me the kick in the pants I needed to stop using volatile naming conventions in my fstab.
Your 4 drive raid5 array, right?
Right?!
I wish.
It was a bcachefs array with data replicas being a mix of 1,2 & 4 depending on what was most important, but thankfully I had the foresight to set metadata to be mirrored for all 4 drives.
I didn’t get the good fortune of only having to do a resilver, but all I really had to do was fsck to remove references to non-existent nodes until the system would mount read-only, then back it up and rebuild it.
NixOS did save my bacon re: being able to get back to work on the same system by morning.
not RAID10 I hope…
Lol I’ve locked myself out of so many random cloud and remote instances like this that now I always make a sleep chain or a kill timer with tmux/screen.
Usually like:
./risky_dumb_script.sh ; sleep 30 ; ./undo.sh
Or
./risky_dumb.script.sh
Which starts with a 30 second sleep, and:
(tmux) sleep 300 ; kill PID
Lol; I’ve done this too. Thankfully not to anything important.
I’ve done this kind of thing remotely in screen with
ifdown eth0 ; sleep 10 ; ifup eth0 ;
There, but for the grace of god…
@ExtremeDullard@lemmy.sdf.org You’re doing it wrong. Just setup a KVM behind your server. So then you never need to leave home again.
At $DAYJOB, we’re currently setting up basically a way to bridge an interface over the internet, so it transports everything that enters on an interface across the aether. Well, and you already guessed it, I accidentally configured it for
eth0
and couldn’t SSH in anymore.Where it becomes fun, is that I actually was at work. I was setting it up on two raspis, which were connected to a router, everything placed right next to me. So, I figured, I’d just hook up another Ethernet cable, pick out the IP from the router’s management interface and SSH in that way.
Except I couldn’t reach the management interface anymore. Nothing in that network would respond.Eventually, I saw that the router’s activity lights were blinking like Christmas decoration. I’m guessing, I had built a loop and therefore something akin to a broadcast storm was overloading the router. Thankfully, the solution was then relatively straightforward, in that I had to unplug one of the raspis, SSH in via the second port, nuke our configuration and then repeat for the other raspi.
I formated an OS drive by mistake last night, thought it was my flash drive…
Almost did the same last night on a device that has its internal drive (flash) mounted as mmc and the USB drive was sda
That entire scenario scares me lol
I started to DBAN (wipe) my internal drive once instead of an attached drive. That was the last time I ran DBAN on a machine with any drives of value plugged in.
I knew a guy who did this and had to fly to Germany to fix it because he didn’t want to admit what he’d done.
This hits…
I have a failsafe service for one of my servers, it pings the router and if it hasn’t reached it once for an entire hour then it will reboot the server.
This won’t save me from all mistakes but it will prevent firewall, link state, routing and a few other issues when I’m not present.
Until you block ICMP one day and then wonder why the server keeps rebooting…
(Been there. Done it)
Every network engineer must lock themselves out of a node at some point, it is a rite of passage.
It’s not Unix, it’s you.
For clarity, I have done it myself - plenty, but not just on Unix boxes.
This is why IPMI is so important.
Can also use Pi KVM to add a similar capability to non server grade hardware that doesn’t have it. I did that for a workstation once.
Yup, I use PiKVM, too. Fun fact, PiKVM’s first content commit is a clone of my DIY IPMI repo 😉
Look, it’s’a me: https://github.com/pikvm/pikvm/commit/70eebd5c59da26dc3f6ad56730adbb616055f4e5#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R4
Awesome job!
Yea, it was a really fun project to make back before there were any real options. And I’m glad the PiKVM team could expand upon it.
Somewhere along the way I lost the “based on” credit, likely whenever they fully modernized the stack. I wasn’t really keeping track, but did find it humorous when LTT said the creator complained someone based another project on them. I was like “Hmmmmmmm…” but just laughed because I didn’t make it for it to stagnate like it had been with me.