PDFChatAPI

Don't have time to read through all your book or any PDF document? Can't find time to read through the pages with a lot of details? Well, this API has got your back.

The API uses an LLM (commonly GPT 3.5 Turbo) to summarize the document according to your own queries. You first provide the API with your PDF document. Then once the model has been trained on the document, you can query the document regarding any topic you want summarized. The prompt best practices apply here too, as a sidenote.

This API uses Retrieval Augmented Generation (RAG) methodology to overcome the shortcomings inherent in LLMs (hallucinations, cut-off date beyond which LLM lacks knowledge of the newer data).

LangChain, which is an open-source library for developing generative AI applications, has been used in this API (Python version). It provides various utilities and APIs out-of-the-box for parsing PDF documents (and various other ones, for that matter), API for storing chat history, and provides modules for using LLMs as an API. The REST API has been created in FastAPI, an asynchronous, modern, fast Python framework for creating APIs.

How to install

You can use the API yourself on your local system.

Clone the repository on your system.
Install the requirements for running this API and powering the endpoints on your localhost using pip install -r requirements.txt This will install all the required modules and packages by itself. Its better to create a Python virtual environment and install the packages in that environment.

You will need to create a .env file on your system in the project folder (containing the main.py script file). You will need to add these 5 fields in the file.

OPENAI_API_KEY (your API Key for interacting with the LLM provided by the vendor).
EMBEDDINGS_MODEL (the model you want to use for generating embeddings for text, default is text-embedding-ada-002
CHAT_MODEL (by default we used gpt-3.5-turbo)
MONGO_CONNECTION_STRING (for storing chat history, for using local client, it is simply: mongodb://localhost:27017)
DB_PATH (use: ./trained_db for storing in main project directory)

Once all the packages have been installed and the environment variables defined, navigate to the project directory, run the script using: python main.py and power up the local client using the link provided in the terminal or command prompt window. Append /docs to the provided URL to find all the API endpoints.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
routers		routers
.gitignore		.gitignore
README.md		README.md
init.py		init.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFChatAPI

How to install

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDFChatAPI

How to install

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages