Skip to content

TheZeekA/NZ-News-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📰 NZ News Keyword Scraper

A desktop GUI application for searching New Zealand news articles by keyword — both current articles via RSS feeds and historical articles archived by the Wayback Machine.

Built with Python and Tkinter. No browser required.

Features

🔴 Live RSS Tab

  • Search current NZ news articles by keyword in real time
  • Searches across multiple NZ news sources simultaneously
  • Results displayed as clickable cards with headline and snippet
  • Click any card to open the full article in your browser
  • Comma-separate multiple keywords (e.g. housing, labour, budget)
  • Built-in debug log to diagnose feed issues

Sources searched:

Source Coverage
NZ Herald Top Stories, NZ, Business, Sport, World, Entertainment, Tech, Lifestyle, Travel, Rugby
RNZ National, Political, Business, World, Sport, Te Ao Māori, Pacific
The Spinoff All articles
Scoop General, Politics, Business, Science & Tech

🔵 Historical Archive Tab

  • Search archived NZ news articles going back to the early 2000s
  • Powered by the Wayback Machine CDX API
  • Set a custom date range (from/to year)
  • Control how many results to fetch per site
  • Results link directly to the archived snapshot — see the page exactly as it appeared on that date

Sources searched:

Source Notes
Stuff Full domain archive
NZ Herald Full domain archive
RNZ Full domain archive
Scoop Full domain archive
Newstalk ZB Full domain archive
The Spinoff Full domain archive
Newsroom Full domain archive
1 News Full domain archive
NZ Govt govt.nz domain

Requirements

  • Python 3.9 or later (tested on Python 3.14)
  • tkinter (included with most Python installations)
  • requests
  • beautifulsoup4

Installation

1. Clone the repository

git clone https://github.com/TheZeekA/nz-news-scraper.git
cd nz-news-scraper

2. Install dependencies

pip install requests beautifulsoup4

3. Run the script

python NZ-News-Scraper.py

Building a Windows Executable

You can package the app into a standalone .exe using PyInstaller — no Python installation required on the target machine.

1. Install PyInstaller

pip install pyinstaller

2. Build the executable

pyinstaller --onefile --windowed --name "NZ News Scraper" --icon="icon.ico" NZ-News-Scraper.py

3. Find your exe

The finished executable will be at:

dist/NZ News Scraper.exe

Note: Windows Defender may flag the exe as a false positive — this is common with PyInstaller-built apps. You can whitelist it in Windows Security settings.


Usage Tips

Live RSS Search

  • RSS feeds only carry the most recent 20–50 articles per section
  • For best results use short, broad keywordshousing works better than Auckland housing crisis 2024
  • Use the Debug button to see exactly which feeds responded and how many articles were fetched

Historical Archive Search

  • The Wayback Machine matches your keyword against the article URL slug — so place names and proper nouns work best (e.g. waihi, christchurch, ardern)
  • Wider date ranges return more results
  • Not every article was crawled — Wayback Machine coverage improves significantly after 2005
  • Use the Debug button to see how many URLs were scanned per site

How It Works

Live RSS

The app fetches RSS/Atom feeds directly from each news source and parses them using Python's built-in xml.etree.ElementTree. Articles are keyword-matched against the title and description fields.

Historical Archive

The app queries the Wayback Machine CDX API for each NZ news domain. It fetches up to 500 archived URLs per site, then filters them in Python by checking whether any keyword appears in the URL path. Matched results link to the Wayback Machine playback URL for that snapshot.

The CDX API's built-in regex filtering (filter=original:.*keyword.*) does not work reliably and returns empty results — URL filtering is therefore done client-side.


Project Structure

nz-news-scraper/
├── NZ-News-Scraper.py   # Main application
├── icon.ico             # App icon (optional)
├── README.md            # This file
└── dist/                # Generated by PyInstaller (not committed)
    └── NZ News Scraper.exe

Known Limitations

  • Live RSS feeds are limited to recent articles only — for anything older, use the Historical tab
  • Some NZ Herald articles are paywalled and only headlines/snippets are available via RSS
  • Stuff.co.nz does not currently expose a working public RSS feed and is only available via the Historical tab
  • Historical search matches keywords in URLs only, not in article body text
  • The Wayback Machine CDX API can be slow under heavy load — allow 10–30 seconds for historical searches

Contributing

Pull requests welcome. If an RSS feed URL breaks or a new NZ news source should be added, open an issue or submit a PR updating the SOURCES or search_targets dictionaries in the script.


License

MIT License — free to use, modify, and distribute.


Acknowledgements


© 2026 Teddy Jones

About

A simple script that searches all NZ News RSS feeds and the Wayback machine for Articles based on keywords. Great for research purposes.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages