A few words about collation
I’d like to introduce newcomers to the topic - what is the collation and why do we need it - eg. - why not just strcmp.
In the simplest form - comparing English words - we don’t need collation at all (save case differences) - the binary character encodings (called codepoints in unicode) are all we need. However, when we have to deal with, for example, accents, our task is more difficult - differences in accents should be ignored in most languages if there are any differences in the base letters - even in the base letters which are _later_ in the string. Then there come the differences in case, which should be even less important than differences in accent, and at the end are differences in punctuation.
This way we end with 3 or 4 comparison levels, the first one is always conducted, and the others conditional, only if the earlier level showed no difference in string. Add to this contractions: when two characters have to be treated as one - and expansions - when one character should behave in sorting as two - and you have some basic idea of what collation is.
On top of this, each language off course has it’s own rules, so we need to tailor the collation to the current locale - we basically have to have data files for all supported languages.