Flaskraper is a Flask web app that searches job boards and displays scraped listings in the browser. Pick a source, enter a keyword, and get back job titles, companies, descriptions, and links — all fetched live from the target site.
- Search-engine-style home page with centered title and search bar
- Multi-site support via a scraper registry
- Results table with title, company, description, and a link to the original posting
- Pluggable scraper architecture — each site defines its own URL builder and HTML parsers
| Source | Example query |
|---|---|
| Berlin Startup Jobs | python, javascript, typescript |
| Web3 Careers | python, rust, solidity |
| We Work Remotely | python, react, design |
- Flask — web framework
- BeautifulSoup + lxml — HTML parsing
- requests — HTTP client
- gunicorn — production WSGI server
flaskraper/
├── app.py # WSGI entry point for gunicorn
├── main.py # Local dev entry point
├── requirements.txt
├── render.yaml # Render deployment config
├── templates/
│ ├── home.html # Home / search form
│ └── search.html # Search results
└── flaskraper/
├── __init__.py # create_app() and routes
├── pages/
│ ├── home.py # Home page context
│ └── search.py # Search page context
└── scrapers/
├── registry.py # Scraper registry and config
├── scrapper.py # Generic pagination scraper
├── runner.py # Runs a scraper with error handling
├── berlin_startup_jobs.py
├── web3_careers.py
└── we_work_remotely.py
- The user submits a keyword and source from the home page (
/). - Flask routes the request to
/search?q=...&site=.... - The selected scraper builds a target URL from the query.
- The
Scrapperclass fetches the page, detects pagination, and collects job listings. - Each job is normalized to a dict with
title,company,description, andlink. - Results are rendered in
search.html.
- Python 3.11+
git clone <repository-url>
cd flaskraper
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython3 main.pyOpen http://localhost:3333.
The dev server binds to 0.0.0.0 and uses port 3333 by default. Override with the PORT environment variable:
PORT=8000 python3 main.pygunicorn --bind 0.0.0.0:3333 app:appThe repo includes a Render blueprint in render.yaml. Render installs dependencies and starts the app with:
gunicorn --bind 0.0.0.0:$PORT app:app- Create a module under
flaskraper/scrapers/with two functions:scrap_pages_in_<site>(document)— returns pagination elementsscrap_jobs_in_<site>(scrapper)— returns a list of job dicts
- Register the scraper in
flaskraper/scrapers/registry.py:
"mysite": ScraperConfig(
id="mysite",
name="My Site",
build_url=lambda query: f"https://example.com/jobs?q={query}",
scrap_pages=scrap_pages_in_mysite,
scrap_jobs=scrap_jobs_in_mysite,
search_placeholder="Try keyword, role, or skill",
),Each job dict should include:
{
"title": "...",
"company": "...",
"description": "...",
"link": "...",
}MIT