GitHub workflow: generate people JSON + Astro MD files from BibTeX entries

### 📌 Summary

We currently have publication metadata under `data/bib/*.bib` (one file per PI).
We want to **automatically infer people info from these BibTeX files** and:

1. Export a **JSON file** with basic people info.
2. Generate **Astro content files** under `src/content/people/*.md` with a minimal frontmatter.

This should be implemented as a **GitHub Actions workflow**.

---

### 🎯 Goals

* Parse all `data/bib/*.bib` files.
* Extract unique people (authors) across all publications.
* Infer an **advisor** for each person (based on co-authorship frequency with PIs).
* Export:

  * A machine-readable JSON file with people info.
  * Astro-compatible markdown files in `src/content/people/` with minimal frontmatter.

---

### 📂 Inputs

* BibTeX files:

  ```text
  data/bib/*.bib
  ```

  Each file corresponds to a PI (e.g., `arzucan-ozgur.bib`, `tunga-gungor.bib`, `suzan-uskudarli.bib`) and contains multiple `@article`, `@inproceedings`, etc. entries with `author` fields.

* PI mapping (explicit or inferred):

  * Names / slugs of PIs (e.g., from `data/googlescholar.json` or filename):

    * `arzucan-ozgur`
    * `tunga-gungor`
    * `suzan-uskudarli`

---

### 🧠 Logic / Requirements

#### 1. People extraction

* For each BibTeX entry in `data/bib/*.bib`:

  * Parse the `author` field.
  * Split into individual author names (e.g., `"A. Özgür and T. Güngör and A. Köksal"` → 3 people).

* Normalize names:

  * Preserve accents (e.g., `Özgür`, `Köksal`).
  * Use full names as they appear in BibTeX when possible.

* Build a global map:

  ```jsonc
  {
    "abdullatif-koksal": {
      "name": "Abdullatif Köksal",
      "advisor": "Arzucan Özgür" | null,
      "category": "student" | "alumni" | etc. (initially just one value),
      // other fields can be added later
    },
    ...
  }
  ```

* Slug generation:

  * Lowercase
  * Replace spaces with hyphens
  * Strip punctuation
  * E.g., `"Abdullatif Köksal"` → `abdullatif-koksal`.

#### 2. Advisor inference

* A set of PIs is known (from filenames or config), e.g.:

  ```text
  arzucan-ozgur
  tunga-gungor
  suzan-uskudarli
  ```

* For each non-PI person:

  * Look at all papers where they are a co-author.
  * Count co-occurrences with each PI.
  * **Advisor** = PI with highest co-authorship count.
  * If no PI appears with that person, leave `advisor` empty.

#### 3. JSON export

* Write a JSON file, e.g.:

  ```text
  data/people.generated.json
  ```

* JSON schema (minimal for now):

  ```jsonc
  {
    "abdullatif-koksal": {
      "name": "Abdullatif Köksal",
      "advisor": "Arzucan Özgür",
      "category": "student"
    },
    "tunga-gungor": {
      "name": "Tunga Güngör",
      "advisor": "",
      "category": "pi"
    }
  }
  ```

* **Category**:

  * For now, set **only one categorical field**:

    * PIs → `"pi"` (or `"faculty"`)
    * Non-PIs → `"student"` by default (can be refined later).

#### 4. Astro markdown generation

For each person (key = slug), create a file:

```text
src/content/people/<slug>.md
```

* Example: `src/content/people/abdullatif-koksal.md`

* **Frontmatter rules**:

  * Fill **only**:

    * `name`
    * `advisor` (if inferred)
    * `category`
  * Leave the rest EMPTY/blank for now (`title`, `photo`, `bio`, `email`, `order`, `degree`, body content).

* Template format:

  ```md
  ---
  name: "Abdullatif Köksal"
  title: ""
  photo: ""
  bio: ""
  email: ""
  category: "student"
  order: 
  advisor: "Arzucan Özgür"
  degree: ""
  ---

  ```

* If advisor is unknown:

  ```md
  advisor: ""
  ```

* Do **not** auto-generate description text for now; keep the body empty.

* The example currently used in the site for reference (not to be fully filled now):

  ```md
  ---
  name: "Abdullatif Köksal"
  title: "MS Student"
  photo: "/images/people/abdullatif-koksal.jpg"
  bio: "MS student at Boğaziçi University, working on natural language processing under the supervision of Arzucan Özgür."
  email: "abdullatif.koksal@boun.edu.tr"
  category: "student"
  order: 10
  advisor: "Arzucan Özgür"
  degree: "MS"
  ---

  Abdullatif Köksal is an MS student at the Computer Engineering Department of Boğaziçi University, working under the supervision of Arzucan Özgür.

  ## Research Interests

  - Natural Language Processing
  - Machine Learning
  - Cross-lingual NLP
  - Text Classification

  ## Advisor

  - **Advisor:** Arzucan Özgür
  ```

  👉 In this issue, we only want **name, advisor, and category** populated.

---

### ⚙️ GitHub Actions workflow

Implement a workflow, e.g. `.github/workflows/generate-people-from-bib.yml`:

* Trigger:

  * `workflow_dispatch` (manual for now; later optionally `schedule`).
* Steps:

  1. Checkout repo.
  2. Set up Python.
  3. Install dependencies (`bibtexparser` or similar).
  4. Run a Python script, e.g. `scripts/generate_people_from_bib.py`, that:

     * Parses `data/bib/*.bib`.
     * Builds the people map.
     * Writes:

       * `data/people.generated.json`
       * `src/content/people/*.md`
  5. Optionally:

     * Commit changes on a new branch and open a PR (similar style to existing Scholar workflows).

---

### ✅ Acceptance Criteria

* [ ] Running the workflow on existing `data/bib/*.bib` generates:

  * [ ] `data/people.generated.json`
  * [ ] A set of files `src/content/people/<slug>.md` for all discovered people.
* [ ] Each `.md` file has:

  * [ ] Correct `name`.
  * [ ] `advisor` populated with the **most frequent co-author PI**, if any.
  * [ ] `category` set appropriately (at least `pi` vs `student`).
  * [ ] Other fields exist but are empty (`""` or omitted as agreed).
* [ ] Existing Astro site builds successfully using these generated people files.
* [ ] Workflow is documented in the repo (short note in README or `CONTRIBUTING.md`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub workflow: generate people JSON + Astro MD files from BibTeX entries #29

📌 Summary

🎯 Goals

📂 Inputs

🧠 Logic / Requirements

1. People extraction

2. Advisor inference

3. JSON export

4. Astro markdown generation

⚙️ GitHub Actions workflow

✅ Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GitHub workflow: generate people JSON + Astro MD files from BibTeX entries #29

Description

📌 Summary

🎯 Goals

📂 Inputs

🧠 Logic / Requirements

1. People extraction

2. Advisor inference

3. JSON export

4. Astro markdown generation

⚙️ GitHub Actions workflow

✅ Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions