This project is an advanced Python web scraping script that extracts detailed book information from multiple pages of the website:
The script automatically navigates through all catalogue pages and collects structured data from individual book detail pages.
- Pagination scraping
- Multi-page data extraction
- Scrapes individual book detail pages
- Extracts structured product information
- Handles missing descriptions safely
- Saves all data into CSV format
- Uses urljoin for safe URL handling
This script extracts:
- Book Name
- Star Rating
- Product Description
- UPC
- Product Type
- Price (excl. tax)
- Price (incl. tax)
- Tax
- Availability
- Number of Reviews
- Python
- Requests
- BeautifulSoup
- CSV Module
- urllib.parse (urljoin)
The script generates:
book_pagination_data.csv
This CSV file contains structured information about all books from the website.
Install required libraries:
pip install requests beautifulsoup4
Run the script:
python books_pagination_scraper.py
This project demonstrates advanced web scraping techniques including:
- Pagination scraping
- Nested page scraping
- Structured table data extraction
It was created as part of learning real-world data extraction workflows.
