A Web API in C# (.NET 8) that performs web scraping on the Books to Scrape sandbox site.
Users can query books by category (e.g., Travel, Mystery, Fiction) with optional filters such as maximum price, minimum rating, and item limits.
This project demonstrates clean architecture, scraping best practices, and a production-style API while being entirely educational and safe for portfolio use.
- REST endpoint (
/api/books) to retrieve books. - Select multiple categories in one request.
- Optional filters:
- Minimum rating (
minRating) - Maximum price (
maxPrice) - Limit of items per category (
maxItemsPerCategory)
- Minimum rating (
- Automatic pagination (scrapes all pages of a category).
- Random delay between requests (polite scraping).
- Safe cancellation with
CancellationToken. - Returns clean JSON (
BookDto).
- .NET 8 – ASP.NET Core Web API
- AngleSharp – HTML parsing and DOM traversal
- Swagger / OpenAPI – interactive documentation and testing
GET /api/books?categories=Travel,Mystery&minRating=3&maxItemsPerCategory=5
[
{
"title": "Sharp Objects",
"category": "Mystery",
"price": 47.82,
"inStock": true,
"rating": 4,
"detailUrl": "https://books.toscrape.com/catalogue/sharp-objects_997/index.html",
"imageUrl": "https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg"
},
{
"title": "In the Woods",
"category": "Mystery",
"price": 36.95,
"inStock": true,
"rating": 3,
"detailUrl": "https://books.toscrape.com/catalogue/in-the-woods_979/index.html",
"imageUrl": "https://books.toscrape.com/media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg"
}
]# clone the repo
git clone https://github.com/your-username/books-scraper-api.git
cd books-scraper-api
# restore packages
dotnet restore
# run the API
dotnet runOpen Swagger UI at:
👉 http://localhost:5087/swagger
GET /api/books?categories=Travel→ all Travel booksGET /api/books?categories=Fiction&maxPrice=20→ Fiction books under £20GET /api/books?categories=Poetry,Classics&minRating=4→ Poetry & Classics with rating ≥ 4
- Controllers → HTTP endpoints (
BooksController) - Services → Scraping logic (
BooksScraper) - Models → DTOs (
BookDto,ScrapeRequest) - Infrastructure → Configurable scraping options (delays, concurrency)
- Concurrency limit: prevents too many parallel requests.
- Random delays: mimics human browsing and avoids stressing the server.
- Cancellation support: aborts long operations cleanly.
- Configurable filters: user controls what to scrape.
- Separation of concerns: controllers, services, and models clearly separated.
This project was built for educational and portfolio purposes.
Books to Scrape is a public sandbox site created specifically for practicing scraping — no real data is involved.
- Persist scraped data in SQLite/PostgreSQL.
- Add caching to avoid repeated requests.
- Write unit tests for the scraping service.
- Create a Go version scraping Scrape This Site.
This project is intended for educational use only.
Do not use scraping techniques against websites without permission.