Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Post History

66%
+2 −0
Q&A How to extract string from file, run filter, and replace in file with new value?

iterate through all chapter-*.xhtml files in a directory Assuming bash, and assuming that at least one such file exists in the current directory (otherwise adjust the path and/or shopt -s ...

posted 6mo ago by Canina‭

Answer
#1: Initial revision by user avatar Canina‭ · 2023-11-25T13:23:35Z (6 months ago)
> - iterate through all `chapter-*.xhtml` files in a directory

Assuming bash, and assuming that at least one such file exists in the current directory (otherwise adjust the path and/or `shopt -s nullglob`), you can use a simple `for` loop to do this.

    for filename in chapter-*.xhtml; do
        ...
    done

> - extract my string (ALWAYS line 12 in the file, only string on line, between `<p>...</p>` tags)

Since you know that this will always be line 12, the easiest-to-read way to do this is probably awk, in which it becomes:

    line12=$(awk 'NR==12{print;exit;}' "$filename")

Do note that the resulting `$line12` will include whitespace and tags in addition to the textual content of the tag. If this is a problem, you can use [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) to extract only the text from within the `<p>` tag:

    line12=$(awk 'NR==12{print;exit;}' "$filename" | pup p text{})

in which case of course you will need to adjust the replacement step accordingly.

(`pup` is a tool to parse HTML and extract portions of it based on CSS selectors.)

> - run my "external" `titlecase` filter on that string

Assuming that `titlecase` is executable, accepts the old title on standard input, and emits the new title on standard output, you can pipe the output from `awk` above into `titlecase`, as in:

    newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)

> - replace the new string for the original one in the source file

There are many ways to do this, but assuming that the replacement doesn't contain special characters, you can do something similar to:

    sed -i '12s#^.*$#'"$newline12"'#' "$filename"

This will replace the entirety of line 12 in the file with the contents of the `$newline12` environment variable. Adjust the `12` if you need to replace a differently numbered line. I use `#` as delimeters here because the traditional `/` will conflict with the end-tag marker in `</p>`.

`-i` is inline editing mode; if you omit it, `sed` will print the result on standard output, which you can redirect to another file:

    sed '12s#^.*$#'"$newline12"'#' "$filename" >"$filename".new

---

## Putting it all together:

    for filename in chapter-*.xhtml; do
        newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)
        sed -i '12s#^.*$#'"$newline12"'#' "$filename"
    done

## Example:

### Input `chapter-1.xhtml`

    <html>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
      <p>HERE IS MY TITLE</p>
    
    </html>

### Execution

I use an alias in place of your likely actual `titlecase` here, but the principle is exactly the same:

    $ alias titlecase='tr A-Z a-z'
    $ for filename in chapter-*.xhtml; do
        newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)
        sed -i '12s#^.*$#'"$newline12"'#' "$filename"
      done

### Output `chapter-1.xhtml`

    <html>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
    <xx>
      <p>here is my title</p>
    
    </html>