Skip to content

jdy8739/flaskraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flaskraper

Flaskraper is a Flask web app that searches job boards and displays scraped listings in the browser. Pick a source, enter a keyword, and get back job titles, companies, descriptions, and links — all fetched live from the target site.

Features

  • Search-engine-style home page with centered title and search bar
  • Multi-site support via a scraper registry
  • Results table with title, company, description, and a link to the original posting
  • Pluggable scraper architecture — each site defines its own URL builder and HTML parsers

Supported sources

Source Example query
Berlin Startup Jobs python, javascript, typescript
Web3 Careers python, rust, solidity
We Work Remotely python, react, design

Tech stack

Project structure

flaskraper/
├── app.py                  # WSGI entry point for gunicorn
├── main.py                 # Local dev entry point
├── requirements.txt
├── render.yaml             # Render deployment config
├── templates/
│   ├── home.html           # Home / search form
│   └── search.html         # Search results
└── flaskraper/
    ├── __init__.py         # create_app() and routes
    ├── pages/
    │   ├── home.py         # Home page context
    │   └── search.py       # Search page context
    └── scrapers/
        ├── registry.py     # Scraper registry and config
        ├── scrapper.py     # Generic pagination scraper
        ├── runner.py       # Runs a scraper with error handling
        ├── berlin_startup_jobs.py
        ├── web3_careers.py
        └── we_work_remotely.py

How it works

  1. The user submits a keyword and source from the home page (/).
  2. Flask routes the request to /search?q=...&site=....
  3. The selected scraper builds a target URL from the query.
  4. The Scrapper class fetches the page, detects pagination, and collects job listings.
  5. Each job is normalized to a dict with title, company, description, and link.
  6. Results are rendered in search.html.

Getting started

Prerequisites

  • Python 3.11+

Install

git clone <repository-url>
cd flaskraper
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run locally

python3 main.py

Open http://localhost:3333.

The dev server binds to 0.0.0.0 and uses port 3333 by default. Override with the PORT environment variable:

PORT=8000 python3 main.py

Run with gunicorn

gunicorn --bind 0.0.0.0:3333 app:app

Deployment

The repo includes a Render blueprint in render.yaml. Render installs dependencies and starts the app with:

gunicorn --bind 0.0.0.0:$PORT app:app

Adding a new scraper

  1. Create a module under flaskraper/scrapers/ with two functions:
    • scrap_pages_in_<site>(document) — returns pagination elements
    • scrap_jobs_in_<site>(scrapper) — returns a list of job dicts
  2. Register the scraper in flaskraper/scrapers/registry.py:
"mysite": ScraperConfig(
    id="mysite",
    name="My Site",
    build_url=lambda query: f"https://example.com/jobs?q={query}",
    scrap_pages=scrap_pages_in_mysite,
    scrap_jobs=scrap_jobs_in_mysite,
    search_placeholder="Try keyword, role, or skill",
),

Each job dict should include:

{
    "title": "...",
    "company": "...",
    "description": "...",
    "link": "...",
}

License

MIT

About

Flask web scraper

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors