modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters by mshannon-sil · Pull Request #285 · sillsdev/machine.py

mshannon-sil · 2026-03-26T21:46:59Z

This PR addresses issue #284.

Mostly looking for high-level feedback about the approach at the moment. As we were discussing, is the right place for this functionality in the get_usfm() method as essentially a post-processing step? Or should we look to implement this feature in process_tokens() (and maybe move the remark logic here as well)?

Some initial thoughts:
Pros for putting it in get_usfm():

The code is together in a cohesive unit making it potentially easier to maintain, rather than spread across process_token().
If it's just for the purposes of importing, then it can be thought of as a kind of "view" that Paratext needs to avoid import issues while the true model is kept unmodified in handler._tokens. This allows for the option to access the unmodified usfm if needed in the future.

Pros for putting it in process_token():

Faster execution time since it's all part of the same iteration
If thought of as an essential change to the usfm structure such that alternative views are unnecessary, it could make more structural sense to include it here.

This change is

…emarks to chapters

ddaspit

@ddaspit reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93 and mshannon-sil).

machine/corpora/update_usfm_parser_handler.py line 345 at r1 (raw file):

        tokens = list(self._tokens)
        if chapters is not None:
            tokens = self._get_incremental_draft_tokens(tokens, chapters)

I think we can do something similar, but before we parse instead of after. Instead of calling parse_usfm in update_usfm, we can do something like this:

tokenizer = UsfmTokenizer(self._settings.stylesheet)
tokens = tokenizer.tokenize(usfm)
tokens = filter_tokens_by_chapter(tokens, chapters)
parser = UsfmParser(tokens, handler, self._settings.stylesheet, self._settings.versification)
parser.process_tokens()

This would avoid updating the whole book.

mshannon-sil · 2026-04-15T21:09:50Z

machine/corpora/update_usfm_parser_handler.py line 345 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I think we can do something similar, but before we parse instead of after. Instead of calling parse_usfm in update_usfm, we can do something like this:
tokenizer = UsfmTokenizer(self._settings.stylesheet)
tokens = tokenizer.tokenize(usfm)
tokens = filter_tokens_by_chapter(tokens, chapters)
parser = UsfmParser(tokens, handler, self._settings.stylesheet, self._settings.versification)
parser.process_tokens()
This would avoid updating the whole book.

I updated it accordingly, how does it look now?

If we change parse_usfm to accept a Sequence[UsfmToken] as well as str, we could call it here and let it instantiate the parser and process the tokens to avoid some code duplication. But not sure if there's a reason parse_usfm only accepts str.

…g chapter remarks

Enkidu93

@Enkidu93 reviewed 4 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on ddaspit and mshannon-sil).

machine/corpora/paratext_project_text_updater_base.py line 97 at r2 (raw file):

            in_id_marker = False
        elif token.type == UsfmTokenType.CHAPTER:
            if token.data and int(token.data) in chapters:

I think this may be safe now after some recent changes, but you may want to double check what happens with bad chapter references like \c 1. if you haven't already/isn't already covered by tests.

machine/corpora/update_usfm_parser_handler.py line 348 at r2 (raw file):

                remark_tokens.append(UsfmToken(UsfmTokenType.TEXT, text=remark))
            if len(tokens) > 0:
                for index, token in enumerate(tokens):

Don't we want to preserve the ability to add book-level remarks? Am I reading this correctly that we'd no longer be able to do so with this change? Peter has also put in a PR related to the per-chapter remarks; are you guys coordinating on this sillsdev/machine#408? I think if we did something like he has there, we would have a bit more flexibility.

modify usfm for chapter-level drafting to avoid import issues; move r…

d8ca02d

…emarks to chapters

mshannon-sil requested review from Enkidu93 and ddaspit March 26, 2026 21:47

ddaspit reviewed Mar 30, 2026

View reviewed changes

mshannon-sil added 2 commits April 15, 2026 14:02

move filtering before token processing

aef5d5d

add test case for chapter filtering

e423708

mshannon-sil added 3 commits April 16, 2026 10:02

make sure all text in \id is included

1e2e999

update remark test and ensure remarks are added at the end of existin…

707119c

…g chapter remarks

add test case for including chapter 1 and header information

e1865ea

Enkidu93 mentioned this pull request Apr 17, 2026

Add support for per-chapter remarks sillsdev/machine#408

Open

Enkidu93 reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285

modify usfm for chapter-level drafting to avoid import issues; move remarks to chapters#285
mshannon-sil wants to merge 6 commits intomainfrom
incremental_draft

mshannon-sil commented Mar 26, 2026 •

edited by ddaspit

Loading

Uh oh!

ddaspit left a comment

Uh oh!

mshannon-sil commented Apr 15, 2026

Uh oh!

Enkidu93 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mshannon-sil commented Mar 26, 2026 • edited by ddaspit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddaspit left a comment

Choose a reason for hiding this comment

Uh oh!

mshannon-sil commented Apr 15, 2026

Uh oh!

Enkidu93 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mshannon-sil commented Mar 26, 2026 •

edited by ddaspit

Loading