Oops, something went wrong!

perishthethought@lemm.ee · edit-2 4 months ago

Oops, something went wrong!

unhrpetby@sh.itjust.works · edit-2 4 months ago

The error is unnecessarily vague.

If the message is supposed to mean “There is an internal error that is of little use to you, so you can only wait while we fix it. Try again in 10 minutes.” Then say that. That tells me a developer made a conscious decision to classify the failure mode as one which I cannot fix. They are explaining to you what type of error they perceive it to be.

Instead we have “Something went wrong. Try again later.” which doesn’t say that directly. This could just be them designing their systems as though every user is incompetent, and denying you the information to fix the issue yourself.

You wouldn’t know, because it doesn’t just tell you directly.

hperrin@lemmy.ca · 4 months ago

It is intentionally and, I would argue, necessarily vague.

First, there is no time frame for these kinds of errors. If it’s just a cache host that’s down, you could retry right now and the load balancer would probably have taken that host out of rotation already. If it’s a primary db that’s down, that may take 5 minutes. If there’s no replica to promote, it might take 30 minutes. If the whole db layer is down, it might take an hour or two. If an entire release needs to be rolled back, it might take a couple hours. There are just too many scenarios and too many variables to give a useful time frame.

Second, you might appreciate an error message like that, but these error messages aren’t written for you and they’re usually not even written by developers. They’re written by designers and translated into many languages. They need to be concise, easily understood, and not easily construed as derogatory or malicious in any language. They are written for the broadest audience. You are not the broadest audience.

Third, we have to design systems as if every user is incompetent and/or malicious, because many of them are. Let me give you an example. I once got an email from another engineer using an internal system my team wrote. He said, “hey I’m getting this error, can you help?” He attached a screenshot showing an error message that read, “Your auth token has expired. Please refresh the page.” He was a senior engineer.

Fourth, and I cannot stress this enough, there is almost always nothing you can do when you hit an error like this. Any information given to you for the vast majority of these kinds of errors would be entirely useless to you. You cannot promote a db shard yourself. You cannot bring up a cache host yourself. You cannot take a host out of load balancer rotation yourself. The only reason this information could possibly benefit you is to satisfy your curiosity.

unhrpetby@sh.itjust.works · edit-2 4 months ago

There is no time frame for these kinds of errors

If I was are able to isolate the issue to, for example, expired certs, I could absolutely give you a ballpark answer on how long it should take/when it might be back up. It doesn’t need to be very precise, but I have accessed websites only to be shown an error with zero idea whether this is a multi-day event or something I can wait five minutes and it be fixed.

…they are written by designers…

Cooperation with a developer would help here.

They are written for the broadest audience

If you write only for a child, your usefulness ceiling is that of what a child could understand. You could have your obvious boilerplate message, and then under that provide more information.

…not easily construed as derogatory or malicious in any language.

I feel as if this is a simple problem to avoid.

We have to design systems as if every user is incompetent…

See the bottom of this post

there is almost nothing you can do when you hit an error like this.

If the company believes so, then write that part in. Otherwise, it isn’t stated that such is the case. It would be one more sentence on the boilerplate section.

Overall this has to do with what you are optimizing for. Its clear to me that many businesses believe useless boilerplate error messages are most cost effective. If you want to be most cost-effective, then cutting corners on the error messages likely saves time with few financial downsides. But It doesn’t have to be this way.

Designing systems for the lowest person on the totem poll isn’t without downsides. I have used Linux systems that made the bootup hide all log messages. This means that people that can actually fix a broken system using the logs, are going to have a harder time, as you just hid away all the moving parts and complexity from the end user. Some machines I wouldn’t have been able to fix were it not for the detailed logs.

Or we could talk about privacy. Nearly everyone can use a computer. Great right!? But how many people actually understand the privacy implications of using a machine that is controlled by a closed source corporation. Of entering load of data into that machine? Very few.

You can design a system for idiots. But you don’t have to. There are things in life that have prerequisites. If someone comes over to my computer and asks “What’s that” on a kernel log output, I’ll ask them, “Do you know what a kernel is”. If they don’t, then I will tell them not to worry about it. My explanations are not for everyone. Neither are my software.

hperrin@lemmy.ca · edit-2 4 months ago

An expired cert means the browser would show an error message. I can’t send you any message if my cert is expired, because your browser won’t trust the connection.

UX designers have completely different skill sets than software engineers. At a small company, someone might do both roles, but at a company like Google or Microsoft, those are two different job titles. They do work together. In my experience, there’s a general consensus between both high level designers and high level engineers that giving the user useless information in an error message is a bad idea. There’s a reason these messages are similar across lots of companies. It’s because they are the best option for the business. If we need extra details from the user, we’ll have it printed in the console and tell them to open the console. That is incredibly rare, and basically only ever used for a network failure scenario in a service worker.

You can design your software for tech gurus, but you shouldn’t expect Microsoft Teams to be designed for tech gurus. Their customers are the general public (not super tech savvy), so they design for the general public.

You wrote “useless boilerplate error messages” in your comment, and I’m telling you that the useless part cannot be changed. You want useless detailed error messages. Good for you. Write software that gives you useless detailed error messages. Tell everyone about it and see how the general public reacts. I’ve been working in big tech for 17 years, and I am telling you from all of my experience that the general public will react poorly.

You’re upset that the information needed to fix the issue is not given to you, but you aren’t the one who needs that information. You’re not going to fix the issue. That information absolutely is provided to the people who need it, the engineers. In your metaphor of the Linux user not seeing the boot logs, you are not the Linux user. You don’t have access to the systems that need fixing, so what good would showing you the error log do? Again, the only benefit you would get from that is satisfying your curiosity. Tell me, how are you going to remove a downed host from a load balancer rotation at Google? Even if you had the ability to do that, you still don’t have permission.

Software devs need to make a choice. When we include details, people complain and post useless bug reports and forum posts. When we don’t include details, a much smaller number of people complain, and generally we don’t get useless bug reports and forum posts about it. Which one would you choose?

PS: the reason you feel that avoiding derogatory or abusive/malicious language in many different languages is easy to avoid is because you’re not a high level UX engineer. Fun fact, ChatGPT, when pronounced in French, sounds incredibly similar to “chat, j’ai pété” which translates to “cat, I farted”. Or, how about sending a “fatal error” message to a nurse?

unhrpetby@sh.itjust.works · 4 months ago

expired cert…

Yes. Bad example. Pick any other number of examples. You can probably put a useful time range.

Best option for the business

Already commented on that. They believe it to be so, I don’t agree with that choice.

You can design your software for tech gurus…

It doesn’t have to be either or. Error messages can have a baseline of mild computer knowledge, and stretch up to people who know what they are doing. You can cater to both.

Useless boiler plate error message

It doesn’t have to be utterly useless. Just because you can’t fix anything from where you are doesn’t mean you can’t benefit. If the error is deemed unfixable for customers, give a timeframe of when it should be fixed and the intended course of action (what should they do if its not back up soon and they need it to be up). Useless is a choice, but its also subjective. You may find “Something went wrong. Try again later” as not useless. I deem it so.

you are not going to fix the issue

Unfounded assertion. I have fixed server-client issues before as the client. Let me repeat it: I have fixed server-client issues as the client. There are of course issues I can’t fix

I think our disconnect partly comes from the fact that I am discussing this from a point of view of server operators being fallible. If in theory they always know what is fixable only on the server and never make a mistake in that regard, then we fall back to make a useless error message more useful. But they do make mistakes (or are purposefully hiding information so you don’t know how to get around the error). The Linux example. It would be very easy to justify that in the same way that companies could justify a useless error message for something which could actually be fixed. How many people are going to look at the initframfs logs and know how to chroot in, edit the initramfs init script, and then rebuild the cpio and shove it in boot? Probably less than those that don’t.

You could use this as a justification to hide it completely, but also harm those that could fix it, and also harm error reporting as the users machines just don’t boot the distro. I disagree with this decision.

PS

if that affected ChatGPTs popularity, I couldn’t tell.

So I’ll round it all off with this: improve the error messages as a whole. Add contact information, time till likely fix, course of action (try again later is vague crap). The messages feel like an unhelpful wall, the error equivalent of a chatbot responding to your pleads for support. Also, you might not always be correct in whether something is fixable or not. You could add the detailed error information near the bottom, if people don’t need it then no harm. If people do then its useful. Not adding it and then it being of use could be worse than adding it and it just never being necessary.

I think this topic is wrung out dry.

hperrin@lemmy.ca · 4 months ago

I think you’re trying very hard to ignore all the negative things I’ve told you users do when you include too much information. Maybe just go get a job at one of these big companies and submit a diff adding this information, then read why your diff gets rejected. I’m literally telling you the reasons big companies do this, and you just refuse to believe me. Maybe you’ll believe them.