SmartDoc AI 🤖
A self-hosted AI document summarizer and Q&A system that processes documents locally - no API keys needed.
A self-hosted AI document summarizer and Q&A system that processes documents locally - no API keys needed.
- 🔎 Document upload and text extraction (PDF/TXT)
- 📝 Automatic document summarization
- ❓ Question answering system
- 🔍 Semantic search using FAISS
- 💻 Local processing with no external APIs
- ⚡ FastAPI backend ready for React frontend
- Frontend: React/Next.js, TailwindCSS
- Backend: FastAPI
- AI Models:
- Summarization:
t5-small
(~300MB)summarizer = pipeline( "summarization", model="t5-small", tokenizer="t5-small", framework="pt" )
- Q&A:
distilbert-base-uncased-distilled-squad
(~250MB)qa_model = pipeline( "question-answering", model="distilbert-base-uncased-distilled-squad", framework="pt" )
- Embeddings:
all-MiniLM-L6-v2
(~90MB)embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
- Vector Search: FAISS
- Summarization:
Total model size: ~640MB
- Create virtual environment:
python -m venv venv
- Activate virtual environment:
venv\Scripts\activate
- Install required packages:
pip install fastapi uvicorn python-multipart PyPDF2 transformers sentence-transformers faiss-cpu torch numpy
or run pip install -r requirements.txt
to install all the dependencies
uvicorn smartdoc_backend:app --reload
Server will be available at http://127.0.0.1:8000
. If by chance it isn't that IP, check your CLI it will display the available IP it chose and port
Chute (end) the virtual environment using
deactivate
command
-
Navigate to frontend directory: cd
.\frontend\
-
Install Node.js dependencies:
npm install
-
Start development server:
npm run dev
Frontend available at: http://localhost:3000
Models are cached in:
- Windows:
C:\Users\<YourUsername>\.cache\huggingface\hub
- Linux/MacOS:
~/.cache/huggingface/hub
POST /upload
- Upload PDF/text documentsGET /documents
- List all documentsGET /document/{doc_id}
- Get document metadata
GET /document/{doc_id}/summary
- Generate document summaryPOST /document/{doc_id}/query
- Ask questions about document contentGET /document/{doc_id}/chunks
- Get document chunks (debug)
import requests
# Upload a document
files = {'file': open('document.pdf', 'rb')}
response = requests.post('http://127.0.0.1:8000/upload', files=files)
doc_id = response.json()['doc_id']
# Get a summary
summary = requests.get(f'http://127.0.0.1:8000/document/{doc_id}/summary')
# Ask a question
query = {'query': 'What is this document about?'}
answer = requests.post(f'http://127.0.0.1:8000/document/{doc_id}/query', json=query)
MIT License - See LICENSE for more details.