Skip to content

Hyphen space needs to be handled differently #65

@kwalcock

Description

@kwalcock

We're seeing documents in which parts of hyphenated words are not separated by \n but instead by a space. Something in a pipeline has tried to put an entire paragraph into a single line and seems to have just replaced \n by a space without taking into account the hyphens. The converter does not expect this and a special pass needs to be made over the text to look for these. Here is a list with a lot of the suspicious instances:

wrong right notes
ac- counting accounting  
appro- priately appropriately  
com- partmental compartmental  
com- partments compartments  
com- putational computational  
cov- erage coverage  
COVID- 19 COVID-19 remove space only
cu- mulative cumulative  
cumu- lative cumulative  
Death- Only Death-Only remove space only
develop- ment development  
distri- bution distribution  
Dormand- Prince Dormand-Prince remove space only
effec- tively effectively  
epi- demic epidemic  
epidemi- ological epidemiological  
ev- idence evidence  
Fixed- Detection Fixed-Detection remove space only
Fore- cast Forecast  
fore- casts forecasts  
forecast- ing corecasting  
im- plementation implementation  
includ- ing including  
loca- tion location  
log- odds log-odds remove space only
Maclau- rin Maclaurin  
Mech- Bayes MechBayes  
nonpara- metric nonparametric  
observ- able observable  
one- to one- to do not remove even space
param- eters parameters  
population- wide population-wide remove space only
pos- terior posterior  
pre- diction prediction  
Prepa- ration Preparation  
prereq- uisite prerequisite  
prob- abilistic probabilistic  
probabil- ity probability  
proper- ties properties  
ra- tio ratio  
re- productive reproductive  
rea- son reason  
rel- ative relative  
report- ing reporting  
res- piratory respiratory  
respon- sibility responsibility  
set- ting setting  
strate- gies strategies  
time- varying time-varying remove space only
un- certainty uncertainty  
vari- ables variables  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions