- [ ] Download from http://opencitations.net/download - [ ] Add to HDFS - [ ] Preprocess files into a Parquet