greedy capture with sed
I am trying to greedily capture text with sed
.
For example, I have the string abbbc
, and I want to capture all of the repeated b
characters, so that my result is bbb
.
Here's an attempt at a solution:
$ sed -n 's/.*\(b\+\).*/\1/p' <<< abbbc
b
As shown in the output of the command, the capture only obtains a single b
rather than my desired result bbb
.
I know I could prepend and append the "not b" pattern ([^b]
) to my capture, which would give me the desired result:
$ sed -n 's/.*[^b]\(b\+\)[^b].*/\1/p' <<< abbbc
bbb
However, this solution is a bit inelegant, and may become much more complicated when the match is not as simple. So I'm hoping there's another way to force the capture to be greedy.
1 answer
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
Trevor | (no comment) | Jun 1, 2025 at 02:27 |
The b\+
part of the regex is already greedy. In sed, all repetitions are greedy. Your problem is that the initial .*
is also greedy, and so that's gobbling up both the a
and as many b
s as it can. For this example, you can change that part to [^b]*
:
$ sed -n 's/[^b]*\(b\+\).*/\1/p' <<< abbbc
bbb
For more complicated situations, sed is unlikely to cut it. grep might be a more natural fit for what you're trying to do anyway.
$ grep -o 'b\+' <<< abbbc
bbb
0 comment threads