SimpSearch

SimpSearch — Lightweight Python library for extracting clean text and structured search results.

SimpSearch is a developer‑friendly Python library that allows you to perform web searches and extract readable text from webpages using the W3C HTML‑to‑Text service.

The library converts webpages into clean plain text and also supports structured search result parsing.

✨ Features

🔎 Web search using Brave
🌐 Website text extraction
📄 Raw search output
🧹 Clean text extraction
🔗 Link extraction
📊 Structured parsed search results
📑 Pagination support
⚡ Lightweight and fast
🧩 Simple Python API
🪶 Minimal dependencies

📦 Installation

Install from PyPI:

pip install simpsearch

Or install from source:

git clone https://github.com/simpsearch/simpsearch-python.git
cd simpsearch
pip install -e .

🚀 Quick Start

from simpsearch import SimpSearch

client = SimpSearch()

results = client.parsed("hello world")

print(results)

🧠 How It Works

SimpSearch works in two stages:

Brave Search generates search results
The W3C HTML2TXT service converts HTML pages into readable text

This makes it possible to extract clean text without needing a full HTML parser.

📚 Usage Examples

Below are multiple small examples showing different ways to use SimpSearch.

Example 1 — Basic Search

from simpsearch import SimpSearch

client = SimpSearch()

print(client.raw("python programming"))

Example 2 — Clean Text Search Results

Remove URLs from the search results.

client.text("python programming")

Example 3 — Extract Only Links

links = client.links("machine learning")

print(links)

Example output:

[
 "https://example.com",
 "https://another-site.com"
]

Example 4 — Structured Search Results

results = client.parsed("open source projects")

for r in results:
    print(r["title"])
    print(r["link"])
    print(r["snippet"])

Example 5 — Extract Website Text

text = client.extract("https://example.com")

print(text)

Example 6 — Extract Text Without Inline References

client.extract(
    "https://example.com",
    no_inline_refs=True
)

Example 7 — Extract Text Without Numbers

client.extract(
    "https://example.com",
    remove_numbers=True
)

Example 8 — Pagination Search

Retrieve the next page of search results.

client.parsed("python tutorials", offset=10)

Example 9 — Iterate Over Links

for link in client.links("ai tools"):
    print(link)

Example 10 — Build a Simple Search Tool

query = "best python libraries"

results = client.parsed(query)

for r in results:
    print(f"{r['title']} -> {r['link']}")

📄 API Reference

Main Class

SimpSearch

Create a client instance.

client = SimpSearch()

Methods

extract()

Extract readable text from any webpage.

client.extract(url)

Optional parameters:

no_inline_refs=True
remove_numbers=True

raw()

Return the full raw text search result.

client.raw("query")

text()

Return search results with links removed.

client.text("query")

links()

Return only links extracted from results.

client.links("query")

parsed()

Return structured search results.

client.parsed("query")

Example output:

[
 {
  "title": "Example",
  "link": "https://example.com",
  "snippet": "Example snippet"
 }
]

📂 Project Structure

simpsearch/

simpsearch/
    __init__.py
    client.py
    search.py
    extract.py
    parser.py
    utils.py
    exceptions.py

examples/
    search_example.py
    extract_example.py

tests/
    test_search.py
    test_extract.py

⚙️ Dependencies

SimpSearch intentionally uses minimal dependencies.

Required:

requests
re
urllib.parse

🧪 Running Tests

pytest

🛠 Development Setup

git clone https://github.com/yourusername/simpsearch
cd simpsearch
pip install -e .

🤝 Contributing

Contributions are welcome!

Steps:

Fork the repository
Create a new feature branch
Commit your changes
Push your branch
Open a Pull Request

📈 Future Roadmap

Possible improvements for the project:

Async support
CLI tool
Multiple search engines
Built‑in caching
Rate limiting
AI‑ready structured outputs

📜 License

MIT License

You are free to use, modify, and distribute this software.

⭐ Support the Project

If you like this project:

Give it a ⭐ on GitHub
Share it with developers
Contribute improvements

Open‑source projects grow through community support.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
examples		examples
simpsearch		simpsearch
tests		tests
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

SimpSearch

✨ Features

📦 Installation

🚀 Quick Start

🧠 How It Works

📚 Usage Examples

Example 1 — Basic Search

Example 2 — Clean Text Search Results

Example 3 — Extract Only Links

Example 4 — Structured Search Results

Example 5 — Extract Website Text

Example 6 — Extract Text Without Inline References

Example 7 — Extract Text Without Numbers

Example 8 — Pagination Search

Example 9 — Iterate Over Links

Example 10 — Build a Simple Search Tool

📄 API Reference

Main Class

Methods

extract()

raw()

text()

links()

parsed()

📂 Project Structure

⚙️ Dependencies

🧪 Running Tests

🛠 Development Setup

🤝 Contributing

📈 Future Roadmap

📜 License

⭐ Support the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages