Post History

50%

+0 −0

Q&A Temporarily making \u escapes use a different encoding on the command line

I don't think you can persuade Bash to use non-UTF-8 characters internally—I expect much of its code assumes that strings are ASCII-compatible. So $'\u00FF' will always translate, inside Bash, to t...

posted 4d ago by r~~‭ · edited 4d ago by r~~‭

Answer

#2: Post edited by

r~~‭ · 2025-05-31T17:19:04Z (4 days ago)

Copy Link

Raw

Markdown

I don't think you can persuade Bash to use non-UTF-8 characters internally—I expect much of its code assumes that strings are ASCII-compatible. So `$'\u00FF'` will always translate, inside Bash, to the UTF-8 byte sequence `c3 bf`, and that's what
If you want to take a UTF-8 byte sequence that Bash (or another utility) has produced and translate it to UTF-16, use iconv:
```
$ printf %s $'\u1234\u00FF' | iconv -t utf-16 | xxd -g2 -e
00000000: feff 1234 00ff ..4...
```
As you can see, you'll have to contend with the `\uFEFF` BOM if you do this.
Note that this will barf if you give it `$'\xFF'`, which isn't valid UTF-8. Mixing the two encodings is a recipe for sadness.

I don't think you can persuade Bash to use non-UTF-8 characters internally—I expect much of its code assumes that strings are ASCII-compatible. So `$'\u00FF'` will always translate, inside Bash, to the UTF-8 byte sequence `c3 bf`.
If you want to take a UTF-8 byte sequence that Bash (or another utility) has produced and translate it to UTF-16, use iconv:
```
$ printf %s $'\u1234\u00FF' | iconv -t utf-16 | xxd -g2 -e
00000000: feff 1234 00ff ..4...
```
As you can see, you'll have to contend with the `\uFEFF` BOM if you do this.
Note that this will barf if you give it `$'\xFF'`, which isn't valid UTF-8. Mixing the two encodings is a recipe for sadness.

#1: Initial revision by

r~~‭ · 2025-05-31T17:18:41Z (4 days ago)

Copy Link

Raw

Markdown

I don't think you can persuade Bash to use non-UTF-8 characters internally—I expect much of its code assumes that strings are ASCII-compatible. So `$'\u00FF'` will always translate, inside Bash, to the UTF-8 byte sequence `c3 bf`, and that's what

If you want to take a UTF-8 byte sequence that Bash (or another utility) has produced and translate it to UTF-16, use iconv:

```
$ printf %s $'\u1234\u00FF' | iconv -t utf-16 | xxd -g2 -e
00000000: feff 1234 00ff                            ..4...
```

As you can see, you'll have to contend with the `\uFEFF` BOM if you do this.

Note that this will barf if you give it `$'\xFF'`, which isn't valid UTF-8. Mixing the two encodings is a recipe for sadness.

Communities

Post History