A desktop GUI application for searching New Zealand news articles by keyword — both current articles via RSS feeds and historical articles archived by the Wayback Machine.
Built with Python and Tkinter. No browser required.
- Search current NZ news articles by keyword in real time
- Searches across multiple NZ news sources simultaneously
- Results displayed as clickable cards with headline and snippet
- Click any card to open the full article in your browser
- Comma-separate multiple keywords (e.g.
housing, labour, budget) - Built-in debug log to diagnose feed issues
Sources searched:
| Source | Coverage |
|---|---|
| NZ Herald | Top Stories, NZ, Business, Sport, World, Entertainment, Tech, Lifestyle, Travel, Rugby |
| RNZ | National, Political, Business, World, Sport, Te Ao Māori, Pacific |
| The Spinoff | All articles |
| Scoop | General, Politics, Business, Science & Tech |
- Search archived NZ news articles going back to the early 2000s
- Powered by the Wayback Machine CDX API
- Set a custom date range (from/to year)
- Control how many results to fetch per site
- Results link directly to the archived snapshot — see the page exactly as it appeared on that date
Sources searched:
| Source | Notes |
|---|---|
| Stuff | Full domain archive |
| NZ Herald | Full domain archive |
| RNZ | Full domain archive |
| Scoop | Full domain archive |
| Newstalk ZB | Full domain archive |
| The Spinoff | Full domain archive |
| Newsroom | Full domain archive |
| 1 News | Full domain archive |
| NZ Govt | govt.nz domain |
- Python 3.9 or later (tested on Python 3.14)
tkinter(included with most Python installations)requestsbeautifulsoup4
1. Clone the repository
git clone https://github.com/TheZeekA/nz-news-scraper.git
cd nz-news-scraper2. Install dependencies
pip install requests beautifulsoup43. Run the script
python NZ-News-Scraper.pyYou can package the app into a standalone .exe using PyInstaller — no Python installation required on the target machine.
1. Install PyInstaller
pip install pyinstaller2. Build the executable
pyinstaller --onefile --windowed --name "NZ News Scraper" --icon="icon.ico" NZ-News-Scraper.py3. Find your exe
The finished executable will be at:
dist/NZ News Scraper.exe
Note: Windows Defender may flag the exe as a false positive — this is common with PyInstaller-built apps. You can whitelist it in Windows Security settings.
- RSS feeds only carry the most recent 20–50 articles per section
- For best results use short, broad keywords —
housingworks better thanAuckland housing crisis 2024 - Use the Debug button to see exactly which feeds responded and how many articles were fetched
- The Wayback Machine matches your keyword against the article URL slug — so place names and proper nouns work best (e.g.
waihi,christchurch,ardern) - Wider date ranges return more results
- Not every article was crawled — Wayback Machine coverage improves significantly after 2005
- Use the Debug button to see how many URLs were scanned per site
The app fetches RSS/Atom feeds directly from each news source and parses them using Python's built-in xml.etree.ElementTree. Articles are keyword-matched against the title and description fields.
The app queries the Wayback Machine CDX API for each NZ news domain. It fetches up to 500 archived URLs per site, then filters them in Python by checking whether any keyword appears in the URL path. Matched results link to the Wayback Machine playback URL for that snapshot.
The CDX API's built-in regex filtering (
filter=original:.*keyword.*) does not work reliably and returns empty results — URL filtering is therefore done client-side.
nz-news-scraper/
├── NZ-News-Scraper.py # Main application
├── icon.ico # App icon (optional)
├── README.md # This file
└── dist/ # Generated by PyInstaller (not committed)
└── NZ News Scraper.exe
- Live RSS feeds are limited to recent articles only — for anything older, use the Historical tab
- Some NZ Herald articles are paywalled and only headlines/snippets are available via RSS
- Stuff.co.nz does not currently expose a working public RSS feed and is only available via the Historical tab
- Historical search matches keywords in URLs only, not in article body text
- The Wayback Machine CDX API can be slow under heavy load — allow 10–30 seconds for historical searches
Pull requests welcome. If an RSS feed URL breaks or a new NZ news source should be added, open an issue or submit a PR updating the SOURCES or search_targets dictionaries in the script.
MIT License — free to use, modify, and distribute.
- Wayback Machine CDX API by the Internet Archive
- RNZ, NZ Herald, The Spinoff, Scoop for providing public RSS feeds
© 2026 Teddy Jones