Skip to content

Open Source Alternative to NotebookLM / Perplexity / Glean, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more.

License

Notifications You must be signed in to change notification settings

MODSetter/SurfSense

Repository files navigation

new_header

SurfSense

While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more to come.

Video

Surfsense_v006.mp4

Key Features

1. Latest

💡 Idea:

Have your own highly customizable private NotebookLM and Perplexity integrated with external sources.

📁 Multiple File Format Uploading Support

Save content from your own personal files (Documents, images and supports 27 file extensions) to your own personal knowledge base .

🔍 Powerful Search

Quickly research or find anything in your saved content .

💬 Chat with your Saved Content

Interact in Natural Language and get cited answers.

📄 Cited Answers

Get Cited answers just like Perplexity.

🔔 Privacy & Local LLM Support

Works Flawlessly with Ollama local LLMs.

🏠 Self Hostable

Open source and easy to deploy locally.

📊 Advanced RAG Techniques

  • Supports 150+ LLM's
  • Supports 6000+ Embedding Models.
  • Supports all major Rerankers (Pinecode, Cohere, Flashrank etc)
  • Uses Hierarchical Indices (2 tiered RAG setup).
  • Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).
  • RAG as a Service API Backend.

ℹ️ External Sources

  • Search Engines (Tavily)
  • Slack
  • Linear
  • Notion
  • Youtube Videos
  • GitHub
  • and more to come.....

🔖 Cross Browser Extension

  • The SurfSense extension can be used to save any webpage you like.
  • Its main usecase is to save any webpages protected beyond authentication.

2. Temporarily Deprecated

Podcasts

  • The SurfSense Podcast feature is currently being reworked for better UI and stability. Expect it soon.

FEATURE REQUESTS AND FUTURE

SurfSense is actively being developed. While it's not yet production-ready, you can help us speed up the process.

Join the SurfSense Discord and help shape the future of SurfSense!

How to get started?

Installation Options

SurfSense provides two installation methods:

  1. Docker Installation (Recommended) - The easiest way to get SurfSense up and running with all dependencies containerized.

  2. Manual Installation - For users who prefer more control over their setup or need to customize their deployment.

Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.

Before installation, make sure to complete the prerequisite setup steps including:

  • PGVector setup
  • Google OAuth configuration
  • Unstructured.io API key
  • Other required API keys

Screenshots

Search Spaces

search_spaces

Manage Documents documents

Research Agent

researcher

Agent Chat

chat

Browser Extension

ext1

ext2

Tech Stack

BackEnd

  • FastAPI: Modern, fast web framework for building APIs with Python

  • PostgreSQL with pgvector: Database with vector search capabilities for similarity searches

  • SQLAlchemy: SQL toolkit and ORM (Object-Relational Mapping) for database interactions

  • Alembic: A database migrations tool for SQLAlchemy.

  • FastAPI Users: Authentication and user management with JWT and OAuth support

  • LangGraph: Framework for developing AI-agents.

  • LangChain: Framework for developing AI-powered applications.

  • LLM Integration: Integration with LLM models through LiteLLM

  • Rerankers: Advanced result ranking for improved search relevance

  • Hybrid Search: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)

  • Vector Embeddings: Document and text embeddings for semantic search

  • pgvector: PostgreSQL extension for efficient vector similarity operations

  • Chonkie: Advanced document chunking and embedding library

  • Uses AutoEmbeddings for flexible embedding model selection

  • LateChunker for optimized document chunking based on embedding model's max sequence length


FrontEnd

  • Next.js 15.2.3: React framework featuring App Router, server components, automatic code-splitting, and optimized rendering.

  • React 19.0.0: JavaScript library for building user interfaces.

  • TypeScript: Static type-checking for JavaScript, enhancing code quality and developer experience.

  • Vercel AI SDK Kit UI Stream Protocol: To create scalable chat UI.

  • Tailwind CSS 4.x: Utility-first CSS framework for building custom UI designs.

  • Shadcn: Headless components library.

  • Lucide React: Icon set implemented as React components.

  • Framer Motion: Animation library for React.

  • Sonner: Toast notification library.

  • Geist: Font family from Vercel.

  • React Hook Form: Form state management and validation.

  • Zod: TypeScript-first schema validation with static type inference.

  • @hookform/resolvers: Resolvers for using validation libraries with React Hook Form.

  • @tanstack/react-table: Headless UI for building powerful tables & datagrids.

Extension

Manifest v3 on Plasmo

Future Work

  • Add More Connectors.
  • Patch minor bugs.
  • Implement Canvas.
  • Complete Hybrid Search. [Done]
  • Add support for file uploads QA. [Done]
  • Shift to WebSockets for Streaming responses. [Deprecated in favor of AI SDK Stream Protocol]
  • Based on feedback, I will work on making it compatible with local models. [Done]
  • Cross Browser Extension [Done]
  • Critical Notifications [Done | PAUSED]
  • Saving Chats [Done]
  • Basic keyword search page for saved sessions [Done]
  • Multi & Single Document Chat [Done]

Contribute

Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues. Fine-tuning the Backend is always desired.