asveikau 3 hours ago

Sorting is language specific even if you're restricted to languages using Latin characters. Eg. How do you sort N relative to Ñ? How do you treat the Turkish variations on the letter I?

Doing a dumb sort by character or byte values is obviously the wrong call for any diacritics, but the right call may also depend on the language.

  • dmurray 2 hours ago

    And that's why there are a hundred different possible values for LC_COLLATE, and it's completely normal that two popular Unix distributions picked different default values for that setting...right?

    It would have been reasonable to conclude the article a third of the way through, and say "sorting is locale-dependent, if what you value is consistent behaviour between different OSs (instead of sorting based on the user's preferences) you need to implement the sorting yourself."

  • encom 44 minutes ago

    Before the Danish language adopted the letter "å" (in 1948), the vowel was written as "aa". In the Danish alphabet, "å" is the last letter. Therefore a list of three Danish city names would be correctly sorted as:

      * Albertslund
      * Odense
      * Aarhus
    
    This feels like material for another Tom Scott video.
o11c 43 minutes ago

Minor note: on Debian (and possibly other distros), you don't have to use `locale-gen` to dynamically build things into `$complocaledir/locale-archive` (which, incidentally, can cause random breakage for programs that happen to start during system upgrades).

The `locales-all` package works more like macOS. It's only a ~10MB download but unpacks to take ~250MB of disk space (these numbers will vary based on your libc version and packaging format).

There are a lot of sparse arrays and UTF32 character data in compiled locales.

Incidentally, the command to dump a locale's data is:

  LC_ALL=whatever locale -ck `locale | sed 's/=.*//; /LANG\|LC_ALL/d'`
skopje 3 hours ago

So the ISO way is the right way, right?

  • dataflow 2 hours ago

    I wondered the same. What's the right ordering?

greesil an hour ago

It's not a stable sort?

loeg 3 hours ago

(2020)