Skip to content

Update standford-corenlp #974

@kwalcock

Description

@kwalcock

We're thinking of changing from 3.9.2 to 4.2.0. The previous wanted lucene 4.10.3 and the latter wants lucene 7.5.0. timenorm would like 6.6.6 for the record.

Andrew,

Regarding

Keith: creates an Eidos branch that uses the processors branch that uses corenlp 4.2.0

Andrew: starts looking at the unit tests that fail

“nmod” and “nmod_*” become “obl” and “obl_*”

“dobj” becomes “obj”


A number of caveats will follow, but making branches and compiling them are fairly straightforward.  There is already a branch of processors called updateStanford that you would need to "git checkout updateStanford".  That needs to be published locally with something like "sbt publishLocal".  The results is processors 8.2.7-SNAPSHOT on your local drive in an ~/.ivy2/local directory..

Then there is an eidos branch also called updateStanfard.  In an eidos directory that is up to date, "git checkout updateStanford" should take care of that.  It is configured to use the snapshot version of processors.  It should build and try to run.

It seems like the major concern with the tags was that our odin rules keep up.  If those tags have been used in Actions and Finders, some code might need to change.  There is a TagSet class that was meant to handle some differences like that.

Caveats

It will not pass all the tests, but not only because of our stuff.  stanford-corenlp 3.9.2 depended on lucene 4.10.3.  Our geonorm wants 6.6.6 and that "evicted" the earlier version.  Some touching up of the assembly process took care of that.

stanford-corenlp 4.2.0 depends on lucene 7.5.0, or at least prefers to.  This overrides geonorm's preference of 6.6.6.  That may or may not be a problem directly, but the version change does result in a crash elsewhere that will prevent a full evaluation of our rules.

I don't remember the details of the geonorm dependency on lucene and maybe that's a deal breaker.

I think the crash is related to yet another version of lucene being included, under a different name, because some project had a license issue and there was a fork, etc.  There is some resource file in a top level directory that conflicts and might need to be patched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions