Conversation
This added a fair bit of complexity, and notable made the Phase classes dynamically generated. However, by doing this, we no longer include "process the token using the rules for" phases in the debug log.
This allows us to define the argument as an int in Cython
This is in preparation for Cython using C function pointers for _state
The current _ascii module is a placeholder, because I accidentally deleted the original implementation of it (but I needed to rewrite it to be even quicker anyway!)
This makes duplicate checking much quicker, and avoids the conversion to a dict at the end
| # then stop | ||
| if self.chunkOffset != self.chunkSize: | ||
| while True: | ||
| # this really should be a slice of self.chunk, but https://github.com/cython/cython/issues/3536 |
| with cython.boundscheck(False), cython.wraparound(False): | ||
| c = self.chunk[i] | ||
| if c > 0x7F and opposite or c <= 0x7F and (bitmap[index(c)] & bit(c)): | ||
| cyrv += self.chunk[self.chunkOffset:i] |
There was a problem hiding this comment.
ref cython/cython#3887 (IIRC, this was a significant perf issue)
|
Thanks @gsnedders - I see that the attribute names/values are no longer accessed via tuple/list indexes (a nice improvement), and that the element attributes are initialized as attribute maps (rather than being converted into them at the end of the process). Those are similar to the approach taken in #521 as well. What's the general approach to review, quality/performance assurance and merge for each pull request? It looks tricky to review this changeset as one combined change (given the size of the diff) - could we create a list of the proposed changes (with dependencies / ordering between them) and queue them up for review? |
Yeah, I was mostly just throwing this up as one PR so at least the branch exists somewhere that isn't my local disk! Also not sure why it fails on 2.7 or 3.6 on CI; it doesn't fail locally! |
Use Cython to make the parser quicker; see #445. This builds on top of #272.
This is a long way from ready to land, but shows potential. We probably also want to split this up so many of the earlier changes land first if they make performance sense without Cython.
The change to attribute representation especially might be of interest to #521 (cc @jayaddison).
There's also some API changes towards the end of the branch which we may well want to delay landing even beyond the rest of the Cython stuff.