Add abstract `__len__` method to `MultilingualCorpus` by menicacci · Pull Request #39 · translated/larakit

menicacci · 2026-04-17T15:21:19Z

No description provided.

Copilot

Pull request overview

This PR introduces a standardized __len__ contract for MultilingualCorpus and adds an implementation + tests for ParallelCorpus to support len(corpus).

Changes:

Add an abstract __len__ method to MultilingualCorpus.
Implement ParallelCorpus.__len__ using a line-counting helper.
Add unit tests validating len(ParallelCorpus) for empty and populated corpora.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`src/larakit/corpus/_base.py`	Adds `__len__` as an abstract requirement for all corpora.
`src/larakit/corpus/_parallel.py`	Implements `ParallelCorpus.__len__` using a cached line count.
`tests/corpus/parallel.py`	Adds tests asserting `len()` behavior for empty and written corpora.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

davidecaroselli · 2026-04-19T09:29:08Z

    def writer(self) -> ParallelCorpusWriter:
        return ParallelCorpusWriter(self._language, self._source, self._target)
+
+    def __len__(self) -> int:


You cannot cache length like this: if the user writes lines to the parallel corpus, the _size field becomes obsolete.

That's true, but JTMCorpus and TMXCorpus both assume the file won't change after the corpus object is constructed — they cache length the same way and go stale after a write too. I agree it's not a good implementation, but if we want to change this behavior we should plan to do it for all of them.
Do you agree?

As we discussed offline, let's put a cache invalidation at writer creation

Add abstract method __len__ to MultilingualCorpus

a274f4e

menicacci self-assigned this Apr 17, 2026

menicacci requested a review from Copilot April 17, 2026 15:21

Copilot started reviewing on behalf of menicacci April 17, 2026 15:21 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread src/larakit/corpus/_base.py Outdated

Comment thread src/larakit/corpus/_parallel.py

Comment thread src/larakit/corpus/_parallel.py

Comment thread src/larakit/corpus/_parallel.py

menicacci added 2 commits April 17, 2026 17:30

minor

f40bfa2

add return types to len

5cdcb02

menicacci requested a review from davidecaroselli April 17, 2026 15:34

davidecaroselli requested changes Apr 19, 2026

View reviewed changes

menicacci changed the title ~~Add abstract method __len__ to MultilingualCorpus~~ Add abstract __len__ method to MultilingualCorpus Apr 20, 2026

Bugfix on corpus size/properties caching

fc58ef5

menicacci requested a review from davidecaroselli April 20, 2026 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add abstract `len` method to `MultilingualCorpus`#39

Add abstract `len` method to `MultilingualCorpus`#39
menicacci wants to merge 4 commits intomainfrom
features/corpus-len

menicacci commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidecaroselli Apr 19, 2026

Uh oh!

menicacci Apr 20, 2026

Uh oh!

davidecaroselli Apr 20, 2026

Uh oh!

menicacci Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

menicacci commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidecaroselli Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

menicacci Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

davidecaroselli Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

menicacci Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants