Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on Can I enter raw strings in fish to avoid escaping regexes for sed?

Parent

Can I enter raw strings in fish to avoid escaping regexes for sed?

+4
−0

When running sed through fish, I often encounter a problem with regexes. Many commonly used regex control characters like []{}().+ need to be escaped, even if I type the regex in a single quoted string. For example:

$ echo abc | sed 's/b+/X/'
abc

$ echo abc | sed 's/b\+/X/'
aXc

This constant \ in front of many characters makes my regexes absolutely unreadable. It gets even worse if the regex itself needs to escape characters, the whole thing becomes a jumble of backslashes.

Surely there's a better way to enter complex strings in the shell? I don't know if this is a fish problem, or a generic shell problem, or a sed problem, but I would love to know how to avoid it.

As an example of what I expect: Python has an elegant solution to this, called "raw strings". r"b+" is therefore interpreted without any formatting or templating at all. Is there a raw string option in fish, or other shells? Does sed have an alternate input mode to get around this annoying problem?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

Regarding new tag (2 comments)
Post
+2
−0

It is not the most elegant solution, but you may be able to use the string escape function of fish, as in:

echo abc | sed -E (string escape 's/b+/X/')

This would still escape the special characters, but in a "hidden" way - the user-visible RegEx is not cluttered with backslashes and the sed command sees the result of the command substitution, which is properly escaped.

So far, I don't know of any shell that has a feature to turn off interpretation of special characters (if anyone knows, it would likely be Stéphane Chazelas over at U&L SE).

Note that the + metacharacter is extended regular expression syntax, which sed doesn't understand unless invoked with the -E option. When called "bare", it uses basic regular expressions, where the "one or more" would need to be expressed explicitly using range indicators, so you would need to write

echo abc | sed (string escape 's/b\{1,\}/X/')

where the \ are now an actual part of the RegEx and therefore indispensable.

Update: A comment by @Quasímodo notified me that the GNU extension of BRE as used by GNU sed actually does implement the + quantifier, but - similar to { and } - it must be given its special meaning by preceding it with a \. So in this case the \ is an integral part of the RegEx and must be literally passed to sed in the form b\+ if you don't want to invoke it with -E to make use of ERE.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Not working for me? (5 comments)
Not working for me?
matthewsnyder‭ wrote 10 months ago

That syntax would actually solve my problem, except that I couldn't get it to work. b+ is a valid regex meaning "1 or more b's" and the s/b+/X/ should instruct sed to replace each occurence of b+ with X. My input abc should therefore be transformed to aXc.

But I get:

$ echo abc | sed (string escape 's/b+/X/')
abc
AdminBee‭ wrote 10 months ago · edited 10 months ago

Be careful, + is ERE syntax. However, when invoked without the -E flag, sed will use BRE, which doesn't know +. There, you would need to use explicit range indicators as in \{1,\} (and here, the \ would actually be part of the RegEx syntax).

Quasímodo‭ wrote 10 months ago

Although the answer is accurate, I think the asker is confused because there is a misunderstanding in the question itself.

b\+ is actually the correct way to match "one or more 'b'" for that default GNU Sed invocation, that is literally what Sed must "see" — note however this is not standard, as neither + nor \+ have a special meaning in the POSIX basic regular expressions (as mentioned in the previous comment).

So the asker must actually always write that backslash in that case, it's unavoidable. The backslash in that case is not for the shell, and is not processed by the shell, as

$ fish
> echo 's/b\+/X/'
s/b\+/X/

shows.

And off-topic: Happy to see you over here, AdminBee!

AdminBee‭ wrote 10 months ago

@Quasimodo Funny, I had never noticed that GNU sed implements something like \+. Now that you notified me (thank you, btw), I looked up the documentation and - of course it's written there. I should include a notice on that in the answer. And nice to see you (and many others, as I found out), too!

matthewsnyder‭ wrote 10 months ago

Quasímodo‭ You are right that part of the question stems from a confusion over what's due to missing shell escapes, what's due to missing regex escapes, and what's because of whatever flavor of regex is being used for the sed invocation. Naturally, an answer should address this point as well.

To clarify, I am not looking for technical details here as much as for practical advice. So if multiple ways to do regex are possible, I am most interested in whichever one requires the least escaping.