Skip to content

sudip-python-dev/books-pagination-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Books Pagination Scraper

This project is an advanced Python web scraping script that extracts detailed book information from multiple pages of the website:

https://books.toscrape.com

The script automatically navigates through all catalogue pages and collects structured data from individual book detail pages.


Features

  • Pagination scraping
  • Multi-page data extraction
  • Scrapes individual book detail pages
  • Extracts structured product information
  • Handles missing descriptions safely
  • Saves all data into CSV format
  • Uses urljoin for safe URL handling

Data Extracted

This script extracts:

  • Book Name
  • Star Rating
  • Product Description
  • UPC
  • Product Type
  • Price (excl. tax)
  • Price (incl. tax)
  • Tax
  • Availability
  • Number of Reviews

Tools Used

  • Python
  • Requests
  • BeautifulSoup
  • CSV Module
  • urllib.parse (urljoin)

Output

The script generates:

book_pagination_data.csv

This CSV file contains structured information about all books from the website.


How to Run

Install required libraries:

pip install requests beautifulsoup4

Run the script:

python books_pagination_scraper.py


Project Purpose

This project demonstrates advanced web scraping techniques including:

  • Pagination scraping
  • Nested page scraping
  • Structured table data extraction

It was created as part of learning real-world data extraction workflows.

Sample Output

Sample Output

About

Advanced Python web scraping project that performs pagination scraping and extracts detailed book information from multiple pages using Requests and BeautifulSoup, saving structured data into CSV format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages