Post History
For debugging purposes, I made a little utility that can show the exact bytes (as a hex dump) of each command line argument fed to it: $ cat cmdline #!/bin/bash mapfile -d '' array < /proc/$$...
#1: Initial revision
Temporarily making \u escapes use a different encoding on the command line
For debugging purposes, I made a little utility that can show the exact bytes (as a hex dump) of each command line argument fed to it: ``` $ cat cmdline #!/bin/bash mapfile -d '' array < /proc/$$/cmdline for arg in "${array[@]:2}"; do printf '%s' "$arg" | xxd -; echo; done ``` Now I can create complex test data and everything seems to work as expected. For example: ``` $ ./cmdline $'\u00ff' $'\xff' 00000000: c3bf .. 00000000: ff . ``` As can be seen, the `$'\u00ff'` sequence produces the bytes corresponding to a UTF-8 encoding of Unicode code point 255; and the `$'\xff'` sequence produces the single byte with the value 255. I expect the UTF-8 encoded result given my locale settings. But I would like to be able to get different bytes instead, corresponding to different text encodings of Unicode code point 255. Conceptually, to my mind, `$'\u00ff'` represents the *Unicode character* `ÿ` (lowercase y with diaresis), not any particular sequence of bytes. For example, I would like to be able to specify a UTF-16 encoding and get either `00ff` or `ff00` in my resulting hex dump (according to the chosen endianness), or use the `cp1252` encoding and get `ff`, or use the `cp437` encoding and get `98`. I tried setting locale environment variables, but they seem to have no effect: ``` $ echo $LANG en_CA.UTF-8 $ locale -m | grep 1252 # verify that the locale is supported CP1252 $ LANG="en_CA.CP1252" ./cmdline $'\u00ff' # try to use it 00000000: c3bf .. ``` Similarly setting `LC_ALL` gives a warning from `setlocale` that the locale cannot be changed, and there is otherwise no effect. How can I change the behaviour of the `$'\uNNNN'` style escape sequences at the command line?