Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Can I enter raw strings in fish to avoid escaping regexes for sed?

+4
−0

When running sed through fish, I often encounter a problem with regexes. Many commonly used regex control characters like []{}().+ need to be escaped, even if I type the regex in a single quoted string. For example:

$ echo abc | sed 's/b+/X/'
abc

$ echo abc | sed 's/b\+/X/'
aXc

This constant \ in front of many characters makes my regexes absolutely unreadable. It gets even worse if the regex itself needs to escape characters, the whole thing becomes a jumble of backslashes.

Surely there's a better way to enter complex strings in the shell? I don't know if this is a fish problem, or a generic shell problem, or a sed problem, but I would love to know how to avoid it.

As an example of what I expect: Python has an elegant solution to this, called "raw strings". r"b+" is therefore interpreted without any formatting or templating at all. Is there a raw string option in fish, or other shells? Does sed have an alternate input mode to get around this annoying problem?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Regarding new tag (2 comments)

2 answers

+3
−0

What do you want to find though? Is the + a quantifier, meaning you are looking for one or more b? Or are you looking for the literal string b+? If the latter, you don't need to escape at all and if the former, you can just use sed -E 's/b+/X/' directly.

The reason you need to escape in single quoted strings has nothing to do with the shell and is all about the regular expression language used. By default, sed will use BRE (basic regular expressions) and in this regex flavor, + is just a literal + sign and you need \+ for "one or more". However, most sed implementations, including GNU sed, the default on Linux, have an -E switch which enables ERE (extended regular expressions) and in this regex flavor, + has a special meaning and so only needs to be escaped as \+ if you want to search for a literal +.

All this to say that there won't be a magic bullet here because sometimes you will want to escape special characters and other times you'll want to use them in all their special glory. So if you're coming from more powerful regex flavors such as ERE or PCRE (versions of which are the default in perl and python and many other places), you can just use -E to make the regex behave as you likely expect it to:

$ echo "foo" | sed -E 's/o+/A/'
fa
$ echo abcdefg | sed -E 's/[b-f]+/A/'
aAg
$ echo foooooo | sed -E 's/o{3,5}/A/'
fAo
History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

+2
−0

It is not the most elegant solution, but you may be able to use the string escape function of fish, as in:

echo abc | sed -E (string escape 's/b+/X/')

This would still escape the special characters, but in a "hidden" way - the user-visible RegEx is not cluttered with backslashes and the sed command sees the result of the command substitution, which is properly escaped.

So far, I don't know of any shell that has a feature to turn off interpretation of special characters (if anyone knows, it would likely be Stéphane Chazelas over at U&L SE).

Note that the + metacharacter is extended regular expression syntax, which sed doesn't understand unless invoked with the -E option. When called "bare", it uses basic regular expressions, where the "one or more" would need to be expressed explicitly using range indicators, so you would need to write

echo abc | sed (string escape 's/b\{1,\}/X/')

where the \ are now an actual part of the RegEx and therefore indispensable.

Update: A comment by @Quasímodo notified me that the GNU extension of BRE as used by GNU sed actually does implement the + quantifier, but - similar to { and } - it must be given its special meaning by preceding it with a \. So in this case the \ is an integral part of the RegEx and must be literally passed to sed in the form b\+ if you don't want to invoke it with -E to make use of ERE.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Not working for me? (5 comments)

Sign up to answer this question »