Wikipedia Matrix

Extracting Wikipedia tables into CSV files. Once the git is cloned:

cd wikimatrix 
mvn test

An "OutOfMemory Java heap space" can occur when trying to do all the tests, so we would recommand you to try first:

mvn -Dtest=SingleTest test

We give 300+ Wikipedia URLs and the challenge is to:

integrate the extractors' code (HTML)
extract as many relevant tables as possible
serialize the results into CSV files (within output1/ and output2/, a folder for each of the two extractors)

Notes

Java Jvm 1.8 minimum is required

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Rapport		Rapport
Sujet		Sujet
wikimatrix		wikimatrix
README.md		README.md