Post History

50%

+0 −0

Q&A Temporarily making \u escapes use a different encoding on the command line

For debugging purposes, I made a little utility that can show the exact bytes (as a hex dump) of each command line argument fed to it: $ cat cmdline #!/bin/bash mapfile -d '' array < /proc/$$...

1 answer · posted 4d ago by Karl Knechtel‭ · last activity 4d ago by r~~‭

Question shell bash unicode escaping

#1: Initial revision by

Karl Knechtel‭ · 2025-05-30T22:19:12Z (4 days ago)

Copy Link

Raw

Markdown

Temporarily making \u escapes use a different encoding on the command line

For debugging purposes, I made a little utility that can show the exact bytes (as a hex dump) of each command line argument fed to it:

```
$ cat cmdline
#!/bin/bash
mapfile -d '' array < /proc/$$/cmdline
for arg in "${array[@]:2}"; do printf '%s' "$arg" | xxd -; echo; done
```

Now I can create complex test data and everything seems to work as expected. For example:

```
$ ./cmdline $'\u00ff' $'\xff'
00000000: c3bf ..

00000000: ff .
```

As can be seen, the `$'\u00ff'` sequence produces the bytes corresponding to a UTF-8 encoding of Unicode code point 255; and the `$'\xff'` sequence produces the single byte with the value 255.

I expect the UTF-8 encoded result given my locale settings. But I would like to be able to get different bytes instead, corresponding to different text encodings of Unicode code point 255. Conceptually, to my mind, `$'\u00ff'` represents the *Unicode character* `ÿ` (lowercase y with diaresis), not any particular sequence of bytes. For example, I would like to be able to specify a UTF-16 encoding and get either `00ff` or `ff00` in my resulting hex dump (according to the chosen endianness), or use the `cp1252` encoding and get `ff`, or use the `cp437` encoding and get `98`.

I tried setting locale environment variables, but they seem to have no effect:

```
$ echo $LANG
en_CA.UTF-8
$ locale -m | grep 1252 # verify that the locale is supported
CP1252
$ LANG="en_CA.CP1252" ./cmdline $'\u00ff' # try to use it
00000000: c3bf ..
```

Similarly setting `LC_ALL` gives a warning from `setlocale` that the locale cannot be changed, and there is otherwise no effect.

How can I change the behaviour of the `$'\uNNNN'` style escape sequences at the command line?

shell bash unicode escaping

Communities

Post History