Extracting Wikipedia tables into CSV files. Once the git is cloned:
cd wikimatrix
mvn test
An "OutOfMemory Java heap space" can occur when trying to do all the tests, so we would recommand you to try first:
mvn -Dtest=SingleTest test
which will launch the extractors only on this page: https://en.wikipedia.org/wiki/Comparison_of_Canon_EOS_digital_cameras
We give 300+ Wikipedia URLs and the challenge is to:
- integrate the extractors' code (HTML)
- extract as many relevant tables as possible
- serialize the results into CSV files (within
output1/andoutput2/, a folder for each of the two extractors)
Java Jvm 1.8 minimum is required