This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.
conda create -n flow python==3.11.11 -y
conda activate flowgit clone https://github.com/EPPI-Centre/flowchart-data-extraction.git
cd flowchart-data-extraction
pip install -e .In the root of this repo, you must create a file called .env. In this file you
will register your OpenAI API key as so:
OPENAI_API_KEY=COPY_AND_PASTE_YOUR_API_KEY_HEREWindows (Powershell):
$Env:OUTPUT_IMAGE_FORMAT = "PNG"
marker --output_dir OUTPUT_DIR INPUT_DIRMac/Linux:
export OUTPUT_IMAGE_FORMAT="PNG"
marker --output_dir OUTPUT_DIR INPUT_DIRpython classify_images_as_flowchart.pypython parse_flowchart_images.py