SimpSearch β Lightweight Python library for extracting clean text and structured search results.
SimpSearch is a developerβfriendly Python library that allows you to perform web searches and extract readable text from webpages using the W3C HTMLβtoβText service.
The library converts webpages into clean plain text and also supports structured search result parsing.
- π Web search using Brave
- π Website text extraction
- π Raw search output
- π§Ή Clean text extraction
- π Link extraction
- π Structured parsed search results
- π Pagination support
- β‘ Lightweight and fast
- π§© Simple Python API
- πͺΆ Minimal dependencies
Install from PyPI:
pip install simpsearchOr install from source:
git clone https://github.com/simpsearch/simpsearch-python.git
cd simpsearch
pip install -e .from simpsearch import SimpSearch
client = SimpSearch()
results = client.parsed("hello world")
print(results)SimpSearch works in two stages:
- Brave Search generates search results
- The W3C HTML2TXT service converts HTML pages into readable text
This makes it possible to extract clean text without needing a full HTML parser.
Below are multiple small examples showing different ways to use SimpSearch.
from simpsearch import SimpSearch
client = SimpSearch()
print(client.raw("python programming"))Remove URLs from the search results.
client.text("python programming")links = client.links("machine learning")
print(links)Example output:
[
"https://example.com",
"https://another-site.com"
]results = client.parsed("open source projects")
for r in results:
print(r["title"])
print(r["link"])
print(r["snippet"])text = client.extract("https://example.com")
print(text)client.extract(
"https://example.com",
no_inline_refs=True
)client.extract(
"https://example.com",
remove_numbers=True
)Retrieve the next page of search results.
client.parsed("python tutorials", offset=10)for link in client.links("ai tools"):
print(link)query = "best python libraries"
results = client.parsed(query)
for r in results:
print(f"{r['title']} -> {r['link']}")SimpSearchCreate a client instance.
client = SimpSearch()Extract readable text from any webpage.
client.extract(url)Optional parameters:
no_inline_refs=Trueremove_numbers=True
Return the full raw text search result.
client.raw("query")Return search results with links removed.
client.text("query")Return only links extracted from results.
client.links("query")Return structured search results.
client.parsed("query")Example output:
[
{
"title": "Example",
"link": "https://example.com",
"snippet": "Example snippet"
}
]simpsearch/
simpsearch/
__init__.py
client.py
search.py
extract.py
parser.py
utils.py
exceptions.py
examples/
search_example.py
extract_example.py
tests/
test_search.py
test_extract.py
SimpSearch intentionally uses minimal dependencies.
Required:
requests
re
urllib.parse
pytestgit clone https://github.com/yourusername/simpsearch
cd simpsearch
pip install -e .Contributions are welcome!
Steps:
- Fork the repository
- Create a new feature branch
- Commit your changes
- Push your branch
- Open a Pull Request
Possible improvements for the project:
- Async support
- CLI tool
- Multiple search engines
- Builtβin caching
- Rate limiting
- AIβready structured outputs
MIT License
You are free to use, modify, and distribute this software.
If you like this project:
- Give it a β on GitHub
- Share it with developers
- Contribute improvements
Openβsource projects grow through community support.