Skip to content

GitHub workflow: generate people JSON + Astro MD files from BibTeX entries #29

@gokceuludogan

Description

@gokceuludogan

📌 Summary

We currently have publication metadata under data/bib/*.bib (one file per PI).
We want to automatically infer people info from these BibTeX files and:

  1. Export a JSON file with basic people info.
  2. Generate Astro content files under src/content/people/*.md with a minimal frontmatter.

This should be implemented as a GitHub Actions workflow.


🎯 Goals

  • Parse all data/bib/*.bib files.

  • Extract unique people (authors) across all publications.

  • Infer an advisor for each person (based on co-authorship frequency with PIs).

  • Export:

    • A machine-readable JSON file with people info.
    • Astro-compatible markdown files in src/content/people/ with minimal frontmatter.

📂 Inputs

  • BibTeX files:

    data/bib/*.bib
    

    Each file corresponds to a PI (e.g., arzucan-ozgur.bib, tunga-gungor.bib, suzan-uskudarli.bib) and contains multiple @article, @inproceedings, etc. entries with author fields.

  • PI mapping (explicit or inferred):

    • Names / slugs of PIs (e.g., from data/googlescholar.json or filename):

      • arzucan-ozgur
      • tunga-gungor
      • suzan-uskudarli

🧠 Logic / Requirements

1. People extraction

  • For each BibTeX entry in data/bib/*.bib:

    • Parse the author field.
    • Split into individual author names (e.g., "A. Özgür and T. Güngör and A. Köksal" → 3 people).
  • Normalize names:

    • Preserve accents (e.g., Özgür, Köksal).
    • Use full names as they appear in BibTeX when possible.
  • Build a global map:

    {
      "abdullatif-koksal": {
        "name": "Abdullatif Köksal",
        "advisor": "Arzucan Özgür" | null,
        "category": "student" | "alumni" | etc. (initially just one value),
        // other fields can be added later
      },
      ...
    }
  • Slug generation:

    • Lowercase
    • Replace spaces with hyphens
    • Strip punctuation
    • E.g., "Abdullatif Köksal"abdullatif-koksal.

2. Advisor inference

  • A set of PIs is known (from filenames or config), e.g.:

    arzucan-ozgur
    tunga-gungor
    suzan-uskudarli
    
  • For each non-PI person:

    • Look at all papers where they are a co-author.
    • Count co-occurrences with each PI.
    • Advisor = PI with highest co-authorship count.
    • If no PI appears with that person, leave advisor empty.

3. JSON export

  • Write a JSON file, e.g.:

    data/people.generated.json
    
  • JSON schema (minimal for now):

    {
      "abdullatif-koksal": {
        "name": "Abdullatif Köksal",
        "advisor": "Arzucan Özgür",
        "category": "student"
      },
      "tunga-gungor": {
        "name": "Tunga Güngör",
        "advisor": "",
        "category": "pi"
      }
    }
  • Category:

    • For now, set only one categorical field:

      • PIs → "pi" (or "faculty")
      • Non-PIs → "student" by default (can be refined later).

4. Astro markdown generation

For each person (key = slug), create a file:

src/content/people/<slug>.md
  • Example: src/content/people/abdullatif-koksal.md

  • Frontmatter rules:

    • Fill only:

      • name
      • advisor (if inferred)
      • category
    • Leave the rest EMPTY/blank for now (title, photo, bio, email, order, degree, body content).

  • Template format:

    ---
    name: "Abdullatif Köksal"
    title: ""
    photo: ""
    bio: ""
    email: ""
    category: "student"
    order: 
    advisor: "Arzucan Özgür"
    degree: ""
    ---
    
  • If advisor is unknown:

    advisor: ""
  • Do not auto-generate description text for now; keep the body empty.

  • The example currently used in the site for reference (not to be fully filled now):

    ---
    name: "Abdullatif Köksal"
    title: "MS Student"
    photo: "/images/people/abdullatif-koksal.jpg"
    bio: "MS student at Boğaziçi University, working on natural language processing under the supervision of Arzucan Özgür."
    email: "abdullatif.koksal@boun.edu.tr"
    category: "student"
    order: 10
    advisor: "Arzucan Özgür"
    degree: "MS"
    ---
    
    Abdullatif Köksal is an MS student at the Computer Engineering Department of Boğaziçi University, working under the supervision of Arzucan Özgür.
    
    ## Research Interests
    
    - Natural Language Processing
    - Machine Learning
    - Cross-lingual NLP
    - Text Classification
    
    ## Advisor
    
    - **Advisor:** Arzucan Özgür

    👉 In this issue, we only want name, advisor, and category populated.


⚙️ GitHub Actions workflow

Implement a workflow, e.g. .github/workflows/generate-people-from-bib.yml:

  • Trigger:

    • workflow_dispatch (manual for now; later optionally schedule).
  • Steps:

    1. Checkout repo.

    2. Set up Python.

    3. Install dependencies (bibtexparser or similar).

    4. Run a Python script, e.g. scripts/generate_people_from_bib.py, that:

      • Parses data/bib/*.bib.

      • Builds the people map.

      • Writes:

        • data/people.generated.json
        • src/content/people/*.md
    5. Optionally:

      • Commit changes on a new branch and open a PR (similar style to existing Scholar workflows).

✅ Acceptance Criteria

  • Running the workflow on existing data/bib/*.bib generates:

    • data/people.generated.json
    • A set of files src/content/people/<slug>.md for all discovered people.
  • Each .md file has:

    • Correct name.
    • advisor populated with the most frequent co-author PI, if any.
    • category set appropriately (at least pi vs student).
    • Other fields exist but are empty ("" or omitted as agreed).
  • Existing Astro site builds successfully using these generated people files.

  • Workflow is documented in the repo (short note in README or CONTRIBUTING.md).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions