Search That Thinks, Summaries That Matter
BM25 ranking, RRF fusion, and executive summaries transform how you find and understand bookmarks.
Finding a bookmark shouldn’t feel like searching for a needle in a haystack. And reading an AI summary shouldn’t feel like reading filler text generated to hit a word count.
Today’s update tackles both problems with a complete overhaul of Arivu’s search algorithm and a reimagined approach to AI summaries.
The search engine now uses the same ranking techniques as major search engines—BM25 scoring combined with semantic understanding through Reciprocal Rank Fusion. The AI summaries have been restructured around what actually matters: a sharp one-sentence takeaway and an executive summary you’d actually want to read.
Search That Understands Context
The previous search algorithm was simple: match keywords in titles and descriptions, then rerank with semantic similarity. It worked, but it had blind spots. A search for “react hooks” might miss an article titled “Custom State Management Patterns” even if the content was exactly what you needed.
The new search engine uses BM25, the same algorithm that powers Elasticsearch and most enterprise search systems. BM25 doesn’t just count keyword matches—it weighs term frequency against document length and corpus statistics. An article that mentions “machine learning” once in a 10,000-word essay gets a different score than one that’s densely focused on the topic.
But keyword matching alone isn’t enough. That’s where Reciprocal Rank Fusion comes in. RRF combines multiple ranked lists—BM25 scores, semantic similarity scores, and entity overlap scores—into a single unified ranking. Each algorithm votes on which results matter, and the fusion balances their strengths.
The result: searches feel smarter without you having to think about query syntax. Type natural language, get relevant results. The system figures out whether you’re looking for exact matches or conceptual relatives.
Query-Adaptive Weighting
Not all queries are equal. A search for RFC 7519 is clearly looking for exact matches—you want the JWT specification, not articles about authentication philosophy. A search for “how to handle user sessions” is semantic—you want conceptual matches even if they don’t use those exact words.
Arivu now detects query type automatically. Technical queries with version numbers, file extensions, or code patterns get boosted keyword weight. Natural language questions get boosted semantic weight. The thresholds adapt to your collection’s score distribution, so results stay relevant whether you have 50 bookmarks or 5,000.
Summaries Worth Reading
AI summaries have a reputation problem. Too often they’re bloated rewrites that take longer to read than scanning the original. Or they’re so vague they could describe any article on the topic.
We rebuilt the summary system from scratch with a new philosophy: density over length.
The old system generated five separate outputs: a one-sentence summary, three bullet points, a long-form summary, highlights, and tags. That’s a lot of AI calls, and frankly, the bullet points and long-form summary often said the same things in different formats.
The new system generates four outputs that each serve a distinct purpose:
One-Sentence TL;DR — Not “this article discusses machine learning.” Instead: “Transformer architectures outperform RNNs on sequence tasks by processing tokens in parallel rather than sequentially.” Specific. Factual. The one thing you’d remember if you could only remember one thing.
Executive Summary — Two to three paragraphs written for a busy professional. The first paragraph states the core argument. The second provides key evidence. The third (if needed) covers implications. No filler phrases. No meta-commentary about what the article “explores” or “discusses.”
Key Highlights — Four to six standalone insights worth remembering. Each one should make sense without context. These are the facts, quotes, or findings you’d highlight if you were reading with a marker.
Smart Tags — Precise, hyphenated tags that actually help you find things. “react-hooks” not “javascript”. “distributed-systems” not “technology”. Mix of topic tags, format tags (tutorial, opinion, research), and domain tags.
Bigger Context, Better Summaries
The previous system truncated content at 4,000 characters before sending it to the AI. That’s roughly 800 words—often less than half of a typical article. The AI was summarizing an excerpt, not the full piece.
Now we send up to 50,000 characters to the AI (roughly 10,000 tokens). Gemini 2.5 Flash handles million-token contexts, so this is well within its capabilities. The summaries now capture ideas from the conclusion, not just the introduction.
We also store the full extracted content in the database. This enables future features like offline reading and full-text search within bookmarks—your content, always available.
Knowledge Graph Intelligence
The Knowledge Graph got smarter too. The /explore-knowledge-graph endpoint now calculates entity and concept importance using IDF-weighted scoring. Terms that appear in many bookmarks get lower importance (they’re generic). Terms that appear in just a few get higher importance (they’re distinctive).
The graph also identifies related bookmarks using embedding similarity. Each bookmark now shows its top 3 most similar neighbors—not based on shared tags or folders, but on semantic meaning. An article about “distributed consensus” might surface as related to one about “database replication” because the concepts overlap, even if you never connected them manually.
A new /knowledge-graph/expand-query endpoint helps you discover related searches. Type a query, and Arivu shows you entities and concepts from your collection that might refine or expand your search. Looking for “authentication”? The expansion might suggest “JWT”, “OAuth 2.0”, and “session management” based on what’s in your bookmarks.
Streamlined UI
The bookmark detail page now shows just two tabs instead of three: Summary and Highlights. The Summary tab combines the TL;DR with the executive summary—everything you need to understand a bookmark at a glance. The Highlights tab shows the extracted insights in a cleaner format, each one in a visually distinct card.
We removed the separate “Quick,” “Detailed,” and “Highlights” tabs because they created artificial separation. Now the summary flows naturally: a one-liner to orient you, then the deeper context if you want it.
Other Changes
Full content storage — Arivu now stores the complete extracted text from bookmarked pages (up from 10K to unlimited characters). This enables future offline reading and improves search accuracy.
Improved tag parsing — Tags now preserve hyphenation for multi-word concepts like “machine-learning” and “api-design”. The parser handles edge cases better and maintains order while removing duplicates.
Backward compatibility — Existing bookmarks with the old summary format continue to work. The
long_formfield maps toexec_summaryfor older data, so nothing breaks.Performance optimizations — BM25 scoring happens in-memory with precomputed document frequencies. Search latency stays under 100ms even on large collections.
Search smarter. Read less. Know more.