Skip to content

EPPI-Centre/flowchart-data-extraction

Repository files navigation

flowchart-data-extraction

Tests Pre-commit Python Coverage License

This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.

Setup

Create environment

conda create -n flow python==3.11.11 -y
conda activate flow

Install

git clone https://github.com/EPPI-Centre/flowchart-data-extraction.git
cd flowchart-data-extraction
pip install -e .

Assign OpenAI API Key

In the root of this repo, you must create a file called .env. In this file you will register your OpenAI API key as so:

OPENAI_API_KEY=COPY_AND_PASTE_YOUR_API_KEY_HERE

Quickstart

Extract Figures From PDF

Windows (Powershell):

$Env:OUTPUT_IMAGE_FORMAT = "PNG"
marker --output_dir OUTPUT_DIR INPUT_DIR

Mac/Linux:

export OUTPUT_IMAGE_FORMAT="PNG"
marker --output_dir OUTPUT_DIR INPUT_DIR

Extract CONSORT From Images Dir

python classify_images_as_flowchart.py

Parse CONSORT From Images Dir

python parse_flowchart_images.py

About

This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors