Linux Systems

#5: Post edited by

David‭ · 2023-11-27T12:36:18Z (over 1 year ago)
typo tweak

Copy Link

Raw

Markdown

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
1. iterate through all `chapter-*.xhtml` files in a directory,
1. extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
1. **run my "external" `titlecase` filter on that string**,
1. replace the new string for the original one in the source file,
1. for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
**UPDATE**: Note that for my titlecase filter, ONLY the string between the tags) can be used, so that step #2 (extracting the string) is **mandatory**. Both the answers so far look very promising, but is it possible to do something like e.g. a regex on `sed -n '12p'` in one answer?
The other answer suggests using [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) although it would be helpful not to need extra packages if a simple regex would do.
**UPDATE 2**: for "real" data, one could download the ZIP of [this commit in a Github repo](https://github.com/standardebooks/r-d-blackmore_lorna-doone/tree/cde77dba9b8e85536fd262c3e44ecee82b6c3ded) - the files in question are found at: `/src/epub/text/chapter-*.xhtml` = the 12th line of every "chapter-nn.xhtml" file.

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
1. iterate through all `chapter-*.xhtml` files in a directory,
1. extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
1. **run my "external" `titlecase` filter on that string**,
1. replace the new string for the original one in the source file,
1. for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
**UPDATE**: Note that for my titlecase filter, ONLY the string between the tags can be used, so that step #2 (*extracting* the string) is **mandatory**. Both the answers so far look very promising, but is it possible to do something like e.g. a regex on `sed -n '12p'` in one answer?
The other answer suggests using [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) although it would be helpful not to need extra packages if a simple regex would do.
**UPDATE 2**: for "real" data, one could download the ZIP of [this commit in a Github repo](https://github.com/standardebooks/r-d-blackmore_lorna-doone/tree/cde77dba9b8e85536fd262c3e44ecee82b6c3ded) - the files in question are found at: `/src/epub/text/chapter-*.xhtml` = the 12th line of every "chapter-nn.xhtml" file.

#4: Post edited by

David‭ · 2023-11-27T12:34:33Z (over 1 year ago)
add location of test data

Copy Link

Raw

Markdown

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
1. iterate through all `chapter-*.xhtml` files in a directory,
1. extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
1. **run my "external" `titlecase` filter on that string**,
1. replace the new string for the original one in the source file,
1. for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
**UPDATE**: Note that for my titlecase filter, ONLY the string between the tags) can be used, so that step #2 (extracting the string) is **mandatory**. Both the answers so far look very promising, but is it possible to do something like e.g. a regex on `sed -n '12p'` in one answer?
~~The other answer suggests using [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) although it would be helpful not to need extra packages if a simple regex would do.~~

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
1. iterate through all `chapter-*.xhtml` files in a directory,
1. extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
1. **run my "external" `titlecase` filter on that string**,
1. replace the new string for the original one in the source file,
1. for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
**UPDATE**: Note that for my titlecase filter, ONLY the string between the tags) can be used, so that step #2 (extracting the string) is **mandatory**. Both the answers so far look very promising, but is it possible to do something like e.g. a regex on `sed -n '12p'` in one answer?
The other answer suggests using [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) although it would be helpful not to need extra packages if a simple regex would do.
**UPDATE 2**: for "real" data, one could download the ZIP of [this commit in a Github repo](https://github.com/standardebooks/r-d-blackmore_lorna-doone/tree/cde77dba9b8e85536fd262c3e44ecee82b6c3ded) - the files in question are found at: `/src/epub/text/chapter-*.xhtml` = the 12th line of every "chapter-nn.xhtml" file.

#3: Post edited by

David‭ · 2023-11-27T12:31:10Z (over 1 year ago)
update with further clarification

Copy Link

Raw

Markdown

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
~~- iterate through all `chapter-*.xhtml` files in a directory,~~
~~- extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),~~
~~- **run my "external" `titlecase` filter on that string**,~~
~~- replace the new string for the original one in the source file,~~
~~- for all those files. :)~~
(The step in bold is the one that is my biggest stumbling block.)

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
1. iterate through all `chapter-*.xhtml` files in a directory,
1. extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
1. **run my "external" `titlecase` filter on that string**,
1. replace the new string for the original one in the source file,
1. for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
**UPDATE**: Note that for my titlecase filter, ONLY the string between the tags) can be used, so that step #2 (extracting the string) is **mandatory**. Both the answers so far look very promising, but is it possible to do something like e.g. a regex on `sed -n '12p'` in one answer?
The other answer suggests using [`pup`](https://github.com/ericchiang/pup/blob/master/README.md) although it would be helpful not to need extra packages if a simple regex would do.

#2: Post edited by

Andreas demands justice for humanity‭ · 2023-11-27T09:46:44Z (over 1 year ago)
Remove fluff

Copy Link

Raw

Markdown

How to extract string from file, run filter, and replace in file with new value?

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
- iterate through all `chapter-*.xhtml` files in a directory,
- extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
- **run my "external" `titlecase` filter on that string**,
- replace the new string for the original one in the source file,
- for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)
~~I would be grateful for any help with this! If I've omitted any relevant information, please say and I'll remedy that a.s.a.p.~~
~~David / Fife, UK~~

**TASK**
I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string:
&emsp;&emsp;&emsp;`HERE IS MY TITLE`
Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with
&emsp;&emsp;&emsp;`Here Is My Title`
(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.
If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.
**QUESTION**
Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:
- iterate through all `chapter-*.xhtml` files in a directory,
- extract my string (ALWAYS line 12 in the file, only string on line, between `...` tags),
- **run my "external" `titlecase` filter on that string**,
- replace the new string for the original one in the source file,
- for all those files. :)
(The step in bold is the one that is my biggest stumbling block.)

scripting sed text-processing

#1: Initial revision by

David‭ · 2023-11-25T12:12:31Z (over 1 year ago)

Copy Link

Raw

Markdown

How to extract string from file, run filter, and replace in file with new value?

**TASK**

I am coding up ebooks to a specific standard, and have a script that converts a string into the correct titlecase for this publisher. When working with some public domain source files, one often gets this for a chapter title string: 

&emsp;&emsp;&emsp;`<p>HERE IS MY TITLE</p>`

Using VSCodium (FOSS VS Code alternative), I can open each file, select the string between the `p` tags, then run the `titlecase` script with a hotkey that I've assigned it to. I end up with 

&emsp;&emsp;&emsp;`<p>Here Is My Title</p>`

(VSCodium's native titlecase filter isn't up to this job.) I save the file, and go on to the next one.

If you only have a few of these to do, that's fine. But sometimes there can be dozens, and it gets very tedious.

**QUESTION**

Is there a way that I can script this? I have scratched my head over both `awk` and `sed`, thinking that these are my prime options. But (as a rank amateur) I cannot work out how to:

- iterate through all `chapter-*.xhtml` files in a directory,
- extract my string (ALWAYS line 12 in the file, only string on line, between `<p>...</p>` tags),
- **run my "external" `titlecase` filter on that string**,
- replace the new string for the original one in the source file,
- for all those files. :)

(The step in bold is the one that is my biggest stumbling block.)

I would be grateful for any help with this! If I've omitted any relevant information, please say and I'll remedy that a.s.a.p.

David / Fife, UK

scripting sed text-processing

Communities

Post History