This is a lookthrough and piece-by-piece response to one Rust developer’s rant about C/POSIX locales, inlined into a GitHub commit to some streaming tool project back in November of 2017. This rant is of interest as it shows the dark side of current age software development: how a human being who is tasked with controlling a computer responds to it being quite literally out of their control. The full, original text can be found here.
One of the recurring themes throughout this rant is a perception that also is quite common among non-angry programmers: evidence-free assertions that X approach is bad, or Y approach is obviously the way to go and everything else is terrible. We see the first of this as early as the fourth paragraph:
Everything uses UTF-8 for "char
", and what doesn't is broken and terrible anyway.
Besides being factually false, the claim doesn’t even hold true in the spirit of intention. Without knowing the context of how a software needs text to be encoded, it is outright ideological to make such a claim. This can be demonstrably to the detriment of real software projects, if enforced despite evidence that UTF-8 would be inferior to some 8-bit encoding or other.
There are other gross misperceptions about how software is supposed to work. For example, here they exhibit an irrational hatred of the existence of global state, an unavoidable reality about computers that is, in truth, both deliberate and necessary.
The locale (via setlocale()
) is global state, and global state is not a reasonable way to do anything.
The truth about state is, at some point code must interact with the outside world. It has to deal with state that it did not create, to some extent or another, because that is what Turing machines do. If the computer were a closed system, it would either be too trivial to be useful, or it wouldn’t be a Turing machine at all. This may also be related to the also ideological perception Rust developers have internalised in the form of “safety”, a concept once again misplaced onto a machine that will never be haltable. These are simply facts of life. Nonetheless, a volume of fallaciously-borne conclusions are drawn about this.
But the badness doesn't stop here. At some point, they invented threads.
And they put absolutely no thought into how threads should interact with
locales. So they kept locales as global state. Because obviously, you
want to be able to change the semantics of basic string processing
functions while they're running, right? (Any thread can call
setlocale()
at any time, and it's supposed to change the locale of all
other threads.)
Applications that want to change the user’s locale are the ones that need to call setlocale()
. The rest of the system should then update its priors and start using the new locale at its earliest convenience.
This part barely qualifies as questions, but does the person who wrote this actually understand how a system is supposed to work? “Because obviously, you want to be able to change the semantics of basic string processing functions while they're running, right?” Yes! Yes I do. Is there some difficulty about that I am missing?
You can't even temporarily switch the locale with setlocale()
, because it would asynchronously fuckup the other threads.
No, it wouldn’t, unless you created those threads improperly. But in that case, locales are the least of your concerns; your whole system is on fire. Ignorance about threading is not the fault of the people who created POSIX.
If the author bothered to honestly ask any of the questions they chose to screed down and never investigate, confirming their priors in a way that makes them feel better in the moment, and dooming them to continue suffering under this forever, they could learn quite a lot about how the systems they’re operating with actually work, and solve quite a lot of their problems. Unfortunately, this is a common occurence.
Why would you make locales process global? Who even wanted it to work this way?
This is a cruel trick to play on the mind. When you ask a question like this, you trick your brain into thinking that you want to know something, that you are actually asking, when really you’re asking rhetorically and immediately answering it yourself with your own preconceived idea. Their ‘guess’, “Was it just a fucked up psychopath?”, is obviously not the correct answer to these questions. The users of the operating systems wanted it to be this way, and they wanted it to be global, because users don’t interact with their PCs in multiple locales at once.
libarchive intentionally uses the locale API and all the broken crap around it to "convert" UTF-8 or UTF-16 (as contained in reasonably sane archive formats) to "char*
". This is a good start!
Here at this point we get to see the tone shift. The author finds that they themselves have all of the correct answers to their problems, and are merely inconvenienced by POSIX or threading or whathaveyou from walking the path of righteousness and showing the whole world and all those incompetent standards committees How It’s Really Done. Many people will surely disagree with them, and many more than that will have no opinion whatsoever and simply do whatever works best. This person will stand in the way of every single one of them, and arrogantly argue until the end of time that their way is best, even though they don’t know what they’re talking about. This is worse for everyone than a standards body making some decisions that others don’t even have to listen to, because it’s a decision made by a person who actually implements software and has quite clearly made up their mind about how things should be done. If someone is so averse to evidence to begin with, why would they want to hear it after they’ve made up their minds?
It has become more important than ever for people to pelt out their frustrations into healthier outlets. Software has become more complex than ever, and the last thing anybody needs are angry developers with ideological convictions about code striving for some utopia that they can’t even comprehend, let alone implement. Beliefs will solve absolutely nothing about software. Beliefs will provide an easy way to blame others when software doesn’t work as you expect. Now is the time for patience and understanding, because it is the only path through which simpler systems, and thus better code, can be realised.
Until next time,
Άλέξανδερ Νιχολί