Skip to content

simpsearch/simpsearch-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SimpSearch

SimpSearch β€” Lightweight Python library for extracting clean text and structured search results.

SimpSearch is a developer‑friendly Python library that allows you to perform web searches and extract readable text from webpages using the W3C HTML‑to‑Text service.

The library converts webpages into clean plain text and also supports structured search result parsing.


✨ Features

  • πŸ”Ž Web search using Brave
  • 🌐 Website text extraction
  • πŸ“„ Raw search output
  • 🧹 Clean text extraction
  • πŸ”— Link extraction
  • πŸ“Š Structured parsed search results
  • πŸ“‘ Pagination support
  • ⚑ Lightweight and fast
  • 🧩 Simple Python API
  • πŸͺΆ Minimal dependencies

πŸ“¦ Installation

Install from PyPI:

pip install simpsearch

Or install from source:

git clone https://github.com/simpsearch/simpsearch-python.git
cd simpsearch
pip install -e .

πŸš€ Quick Start

from simpsearch import SimpSearch

client = SimpSearch()

results = client.parsed("hello world")

print(results)

🧠 How It Works

SimpSearch works in two stages:

  1. Brave Search generates search results
  2. The W3C HTML2TXT service converts HTML pages into readable text

This makes it possible to extract clean text without needing a full HTML parser.


πŸ“š Usage Examples

Below are multiple small examples showing different ways to use SimpSearch.


Example 1 β€” Basic Search

from simpsearch import SimpSearch

client = SimpSearch()

print(client.raw("python programming"))

Example 2 β€” Clean Text Search Results

Remove URLs from the search results.

client.text("python programming")

Example 3 β€” Extract Only Links

links = client.links("machine learning")

print(links)

Example output:

[
 "https://example.com",
 "https://another-site.com"
]

Example 4 β€” Structured Search Results

results = client.parsed("open source projects")

for r in results:
    print(r["title"])
    print(r["link"])
    print(r["snippet"])

Example 5 β€” Extract Website Text

text = client.extract("https://example.com")

print(text)

Example 6 β€” Extract Text Without Inline References

client.extract(
    "https://example.com",
    no_inline_refs=True
)

Example 7 β€” Extract Text Without Numbers

client.extract(
    "https://example.com",
    remove_numbers=True
)

Example 8 β€” Pagination Search

Retrieve the next page of search results.

client.parsed("python tutorials", offset=10)

Example 9 β€” Iterate Over Links

for link in client.links("ai tools"):
    print(link)

Example 10 β€” Build a Simple Search Tool

query = "best python libraries"

results = client.parsed(query)

for r in results:
    print(f"{r['title']} -> {r['link']}")

πŸ“„ API Reference

Main Class

SimpSearch

Create a client instance.

client = SimpSearch()

Methods

extract()

Extract readable text from any webpage.

client.extract(url)

Optional parameters:

  • no_inline_refs=True
  • remove_numbers=True

raw()

Return the full raw text search result.

client.raw("query")

text()

Return search results with links removed.

client.text("query")

links()

Return only links extracted from results.

client.links("query")

parsed()

Return structured search results.

client.parsed("query")

Example output:

[
 {
  "title": "Example",
  "link": "https://example.com",
  "snippet": "Example snippet"
 }
]

πŸ“‚ Project Structure

simpsearch/

simpsearch/
    __init__.py
    client.py
    search.py
    extract.py
    parser.py
    utils.py
    exceptions.py

examples/
    search_example.py
    extract_example.py

tests/
    test_search.py
    test_extract.py

βš™οΈ Dependencies

SimpSearch intentionally uses minimal dependencies.

Required:

requests
re
urllib.parse

πŸ§ͺ Running Tests

pytest

πŸ›  Development Setup

git clone https://github.com/yourusername/simpsearch
cd simpsearch
pip install -e .

🀝 Contributing

Contributions are welcome!

Steps:

  1. Fork the repository
  2. Create a new feature branch
  3. Commit your changes
  4. Push your branch
  5. Open a Pull Request

πŸ“ˆ Future Roadmap

Possible improvements for the project:

  • Async support
  • CLI tool
  • Multiple search engines
  • Built‑in caching
  • Rate limiting
  • AI‑ready structured outputs

πŸ“œ License

MIT License

You are free to use, modify, and distribute this software.


⭐ Support the Project

If you like this project:

  • Give it a ⭐ on GitHub
  • Share it with developers
  • Contribute improvements

Open‑source projects grow through community support.

About

A lightweight Python library for fast and simple search, filtering, and data retrieval.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages