Linux Systems

−0

iterate through all chapter-*.xhtml files in a directory

Assuming bash, and assuming that at least one such file exists in the current directory (otherwise adjust the path and/or shopt -s nullglob), you can use a simple for loop to do this.

for filename in chapter-*.xhtml; do
    ...
done

extract my string (ALWAYS line 12 in the file, only string on line, between <p>...</p> tags)

Since you know that this will always be line 12, the easiest-to-read way to do this is probably awk, in which it becomes:

line12=$(awk 'NR==12{print;exit;}' "$filename")

Do note that the resulting $line12 will include whitespace and tags in addition to the textual content of the tag. If this is a problem, you can use pup to extract only the text from within the <p> tag:

line12=$(awk 'NR==12{print;exit;}' "$filename" | pup p text{})

in which case of course you will need to adjust the replacement step accordingly.

(pup is a tool to parse HTML and extract portions of it based on CSS selectors.)

run my "external" titlecase filter on that string

Assuming that titlecase is executable, accepts the old title on standard input, and emits the new title on standard output, you can pipe the output from awk above into titlecase, as in:

newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)

replace the new string for the original one in the source file

There are many ways to do this, but assuming that the replacement doesn't contain special characters, you can do something similar to:

sed -i '12s#^.*$#'"$newline12"'#' "$filename"

This will replace the entirety of line 12 in the file with the contents of the $newline12 environment variable. Adjust the 12 if you need to replace a differently numbered line. I use # as delimeters here because the traditional / will conflict with the end-tag marker in </p>.

-i is inline editing mode; if you omit it, sed will print the result on standard output, which you can redirect to another file:

sed '12s#^.*$#'"$newline12"'#' "$filename" >"$filename".new

Putting it all together:

for filename in chapter-*.xhtml; do
    newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)
    sed -i '12s#^.*$#'"$newline12"'#' "$filename"
done

Example:

Input `chapter-1.xhtml`

<html>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
  <p>HERE IS MY TITLE</p>

</html>

Execution

I use an alias in place of your likely actual titlecase here, but the principle is exactly the same:

$ alias titlecase='tr A-Z a-z'
$ for filename in chapter-*.xhtml; do
    newline12=$(awk 'NR==12{print;exit;}' "$filename" | titlecase)
    sed -i '12s#^.*$#'"$newline12"'#' "$filename"
  done

Output `chapter-1.xhtml`

<html>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
<xx>
  <p>here is my title</p>

</html>

posted over 1 year ago

CC BY-SA 4.0

Canina‭

1186 reputation 4 29 146 37

Copy Link

Raw

Markdown

History

1 comment thread

Thanks @#8049 - helpful! I see from your answer (and even more the other one) that I need to edit my ... (4 comments)

Communities

Comments on How to extract string from file, run filter, and replace in file with new value?

How to extract string from file, run filter, and replace in file with new value?

0 comment threads

Putting it all together:

Example:

Input `chapter-1.xhtml`

Execution

Output `chapter-1.xhtml`

1 comment thread

Communities

Comments on How to extract string from file, run filter, and replace in file with new value?

How to extract string from file, run filter, and replace in file with new value?

0 comment threads

Putting it all together:

Example:

Input chapter-1.xhtml

Execution

Output chapter-1.xhtml

1 comment thread

Input `chapter-1.xhtml`

Output `chapter-1.xhtml`