Can I enter raw strings in fish to avoid escaping regexes for sed?
When running sed
through fish
, I often encounter a problem with regexes. Many commonly used regex control characters like []{}().+
need to be escaped, even if I type the regex in a single quoted string. For example:
$ echo abc | sed 's/b+/X/'
abc
$ echo abc | sed 's/b\+/X/'
aXc
This constant \
in front of many characters makes my regexes absolutely unreadable. It gets even worse if the regex itself needs to escape characters, the whole thing becomes a jumble of backslashes.
Surely there's a better way to enter complex strings in the shell? I don't know if this is a fish problem, or a generic shell problem, or a sed problem, but I would love to know how to avoid it.
As an example of what I expect: Python has an elegant solution to this, called "raw strings". r"b+"
is therefore interpreted without any formatting or templating at all. Is there a raw string option in fish, or other shells? Does sed have an alternate input mode to get around this annoying problem?
2 answers
What do you want to find though? Is the +
a quantifier, meaning you are looking for one or more b
? Or are you looking for the literal string b+
? If the latter, you don't need to escape at all and if the former, you can just use sed -E 's/b+/X/'
directly.
The reason you need to escape in single quoted strings has nothing to do with the shell and is all about the regular expression language used. By default, sed
will use BRE (basic regular expressions) and in this regex flavor, +
is just a literal +
sign and you need \+
for "one or more". However, most sed
implementations, including GNU sed
, the default on Linux, have an -E
switch which enables ERE (extended regular expressions) and in this regex flavor, +
has a special meaning and so only needs to be escaped as \+
if you want to search for a literal +
.
All this to say that there won't be a magic bullet here because sometimes you will want to escape special characters and other times you'll want to use them in all their special glory. So if you're coming from more powerful regex flavors such as ERE or PCRE (versions of which are the default in perl and python and many other places), you can just use -E
to make the regex behave as you likely expect it to:
$ echo "foo" | sed -E 's/o+/A/'
fa
$ echo abcdefg | sed -E 's/[b-f]+/A/'
aAg
$ echo foooooo | sed -E 's/o{3,5}/A/'
fAo
0 comment threads
It is not the most elegant solution, but you may be able to use the string escape
function of fish
, as in:
echo abc | sed -E (string escape 's/b+/X/')
This would still escape the special characters, but in a "hidden" way - the user-visible RegEx is not cluttered with backslashes and the sed
command sees the result of the command substitution, which is properly escaped.
So far, I don't know of any shell that has a feature to turn off interpretation of special characters (if anyone knows, it would likely be Stéphane Chazelas over at U&L SE).
Note that the +
metacharacter is extended regular expression syntax, which sed
doesn't understand unless invoked with the -E
option. When called "bare", it uses basic regular expressions, where the "one or more" would need to be expressed explicitly using range indicators, so you would need to write
echo abc | sed (string escape 's/b\{1,\}/X/')
where the \
are now an actual part of the RegEx and therefore indispensable.
Update: A comment by @Quasímodo notified me that the GNU extension of BRE as used by GNU sed
actually does implement the +
quantifier, but - similar to {
and }
- it must be given its special meaning by preceding it with a \
. So in this case the \
is an integral part of the RegEx and must be literally passed to sed
in the form b\+
if you don't want to invoke it with -E
to make use of ERE.
1 comment thread