Reading Feed

Articles I've read with my notes and highlights

Agentic Search over Graphs of Long Documents (or LAD-RAG++) by Pierce Lamb
  • had recently read this blog: The RAG Obituary where the author argued that retrieving (or investigating) over long documents was better suited to a Claude Code approach: provide the raw document data in a file system and just give an agent some foundational tools to interact with that raw document data.
  • Luckily, LAD-RAG’s approach to inference was exactly this: process the document into this graph structure, then provide an “agent” a set of tools to retrieve/explore that graph system and let it decide how it wants to proceed. So LAD-RAG was ticking a lot of boxes:Use the chunking mechanism the author intended via layoutMaintain semantic connections across pagesProvide this data via a set of tools to an agent to answer questions
Data Engineering After AI by Ananth Packkildurai
How I got Claude to teach me dbt
  • LLMs are rightly infamous for confidently asserting complete nonsense, and whilst it’s got a lot better in recent months, Claude is still not perfect, as I found when I challenged another aspect of its implementation ideas:

But then…Claude saves itself by owning its error, and then going to check what the actual values of the field are for itself…nice!

  • Could I edit a file by hand by figuring out the ASCII byte values to write to disk with dd? Umm, I guess? Does that mean I don’t use a text editor? Of course not. It’s about understanding the abstraction, the capability of the tools, and making an active, conscious, and educated decision about how to use them.
  • The risks? Plenty. Getting distracted and taking Claude on a flight of fantasy that may be fun but ultimately a waste of time. Working with technology which is at the edges (or beyond) Claude’s training dataset. Not having enough context for the area and trusting blindly what Claude tells you.
Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest by Pinterest Engineering
Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17% by Steef-Jan Wiggers
  • Anthropic recently published a randomized controlled trial showing developers using AI coding assistance scored 17% lower on comprehension tests than those coding manually, with productivity gains failing to reach statistical significance
  • wonder if we’re going to have a future where the juniors never gain the skills and experience to work well by themselves, and instead become entirely reliant on AI.
  • AI is incredibly useful as a personal tutor.
  • AI can reduce task completion time by 80% for tasks where developers already have relevant skills.
AI “Vibe Coding” Threatens Open Source as Maintainers Face Crisis by Steef-Jan Wiggers
Linear walkthroughs by Simon Willison
Bruteforcing the Bitwarden master password I forgor
The AI Vampire by Steve Yegge
Introduction to PostgreSQL Indexes
  • Although it gracefully handles hash conflicts, it works better for even distribution of hash values and is most suited to unique or mostly unique data
  • Nodes in BRIN indexes store the minimum and maximum values of a range of values present in the page referred by the index. This makes the index more compact and cache friendly, but restricts the use cases for it.
  • Generalized inverted index is appropriate for when you want to search for an item in composite data, such as finding a word in a blob of text, an item in an array or an object in a JSON
Your agents need runbooks, not bigger context windows by Ben Lorica 罗瑞卡
  • Context File System (CFS). You might also hear this more broadly categorized as an Operational Skill Store. This architecture separates the expensive reasoning of a large language model from the actual storage of operational knowledge. It mirrors the way a mature engineering team works.
Lance table format explained simply
Context Management for Deep Agents by LangChain Accounts
  • Context compression refers to techniques that reduce the volume of information in an agent’s working memory while preserving the details relevant to completing the task.
  • Offloading large tool results: We offload large tool responses to the filesystem whenever they occur.
  • Offloading large tool inputs: When the context size crosses a threshold, we offload old write/edit arguments from tool calls to the filesystem.
  • Summarization: When the context size crosses the threshold, and there is no more context eligible for offloading, we perform a summarization step to compress the message history.
Inside OpenAI’s in-house data agent
Performance Tips Using Postgres and pgvector | Crunchy Data Blog
  • Have enough RAM to build new indexes. Building indexes with larger lists requires higher settings for maintenance_work_mem — if you do not have the enough memory you’ll get an error. When building the lists = 2000 index above, the the maintenance_work_mem required 1.3GB of RAM.
Demystifying evals for AI agents
  • The agent shouldn’t be able to easily “cheat” the eval. Tasks and graders should be designed so that passing genuinely requires solving the problem rather than exploiting unintended loopholes.
  • Like the Swiss Cheese Model from safety engineering, no single evaluation layer catches every issue. With multiple methods combined, failures that slip through one layer are caught by another.
  • The patterns vary by agent type, but the fundamentals described here are constant. Start early and don’t wait for the perfect suite. Source realistic tasks from the failures you see. Define unambiguous, robust success criteria. Design graders thoughtfully and combine multiple types. Make sure the problems are hard enough for the model. Iterate on the evaluations to improve their signal-to-noise ratio. Read the transcripts!
Deep Blue by Simon Willison
  • ve even faced accusations from my peers that I am actively harming their future careers through my work helping people understand how well AI-assisted programming can work.
A Single Reason To Not Vibe Code
  • Atrophy risk  of cognitive skills amongst vibe coders is something IMHO that should be looked at more closely.
AI Doesn’t Reduce Work—It Intensifies It by Simon Willison
AI fatigue is real and nobody talks about it by Siddhant Khare
  • Before AI, I might spend a full day on one design problem. I’d sketch on paper, think in the shower, go for a walk, come back with clarity. The pace was slow but the cognitive load was manageable. One problem. One day. Deep focus. Now? I might touch six different problems in a day. Each one “only takes an hour with AI.” But context-switching between six problems is brutally expensive for the human brain. The AI doesn’t get tired between problems. I do.
  • The cruel irony is that AI-generated code requires more careful review than human-written code.
  • If we can’t review everything AI produces - and we can’t, not at scale - then we need systems that constrain what agents can do in the first place.
  • The engineers I’ve talked to who handle this best are the ones who’ve made peace with it. They treat AI output like a first draft from a smart but unreliable intern. They expect to rewrite 30% of it. They budget time for that rewriting. They don’t get frustrated when the output is wrong because they never expected it to be right. They expected it to be useful. There’s a difference
  • . I now treat every AI output as a rough draft. A starting point. Raw material. I mentally label it “draft” the moment it appears, and that framing change alone reduced my frustration by half.
  • ‘d been outsourcing my first-draft thinking to AI for so long that my ability to think from scratch had degraded.
Eight more months of agents
  • Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese. Don’t worry, it’s only for a few years.
Beyond agentic coding
  • One of those design principles is my personal “master cue”, which is:

A good tool or interface should keep the user in a flow state as long as possible

Continuous AI in practice: What developers can automate today with agentic CI by GitHub Staff
  • Pattern 4: Debuggability will win over complexity

Developers will adopt agentic patterns that are transparent, auditable, and diff-based—not opaque systems that act without visibility.

Mitchell Hashimoto: My AI Adoption Journey by Simon Willison
A sane but extremely bull case on Clawdbot / OpenClaw | Brandon Wang
How AI assistance impacts the formation of coding skills
  • The participants who showed stronger mastery used AI assistance not just to produce code but to build comprehension while doing so—whether by asking follow-up questions, requesting explanations, or posing conceptual questions while coding independently.
  • w. It is possible that AI both accelerates productivity on well-developed skills and hinders the acquisition of new ones, though more research is needed to understand this relationship.
Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys
Google DeepMind Introduces ATLAS Scaling Laws for Multilingual Language Models by Robert Krzaczyński
  • Google DeepMind researchers have introduced ATLAS, a set of scaling laws for multilingual language models that formalize how model size, training data volume, and language mixtures interact as the number of supported languages increases.
  • Results show that fine-tuning is more compute-efficient at lower token budgets, while pre-training becomes advantageous once training data and compute exceed a language-dependent threshold. For 2B-parameter models, this crossover typically occurs between about 144B and 283B tokens, providing a practical guideline for selecting an approach based on available resources
  • Rather than an enormous model that is trained on redundant data from every language, how large would a purely translation model need to be, and how much smaller would it make the base model?
Google Introduces TranslateGemma Open Models for Multilingual Translation by Daniel Dominguez
Why DuckDB is my first choice for data processing
How to parametrize exception testing in PyTest? by Kacper Borucki
Tips for getting coding agents to write good Python tests by Simon Willison
Why Stoicism is one of the best mind-hacks ever devised by Lary Wallace
  • Only by envisioning the bad can we truly appreciate the good; gratitude does not arrive when we take things for granted. It’s precisely this gratitude that leaves us content to cede control of what the world has already removed from our control anyway.
AI’s trillion-dollar opportunity: Context graphs by Ashu Garg, Jaya Gupta
  • We call the accumulated structure formed by those traces a context graph: not “the model’s chain-of-thought,” but a living record of decision traces stitched across entities and time so precedent becomes searchable. Over time, that context graph becomes the real source of truth for autonomy – because it explains not just what happened, but why it was allowed to happen.
  • Once you have decision records, the “why” becomes first-class data. Over time, these records naturally form a context graph: the entities the business already cares about (accounts, renewals, tickets, incidents, policies, approvers, agent runs) connected by decision events (the moments that matter) and “why” links. Companies can now audit and debug autonomy and turn exceptions into precedent instead of re-learning the same edge case in Slack every quarter.
  • The orchestration layer sees the full picture: what inputs were gathered, what policies applied, what exceptions were granted, and why. Because it’s executing the workflow, it can capture that context at decision time – not after the fact via ETL, but in the moment, as a first-class record.

That’s the context graph, and that will be the single most valuable asset for companies in the era of AI.

  • High headcount. If a company has 50 people doing a workflow manually (routing tickets, triaging requests, or reconciling data between systems), that’s a signal. The labor exists because the decision logic is too complex to automate with traditional tooling.
Memory: How Agents Learn
  • Here’s the dirty secret: when building agents with the API, we’ve made them capable, but we haven’t yet figured out how to make them learn.
  • Pattern 1: Session Memory Store messages in a database, retrieve them before every response, add them to the context. Agno gives you this out of the box — just give your agent a database.
  • Pattern 2: User Memory Remember facts about the user across sessions. The MemoryManager extracts preferences automatically and stores them in the database.
  • Pattern 3: Learned Memory Now let’s add learned memory: insights that apply beyond just one user. The key is a custom tool that saves learnings to a knowledge base
  • The quality of your knowledge base determines the quality of learning. Garbage in, garbage out. The solution: the agent proposes learnings, but only saves with explicit user approval.
  • A learning is worth saving if it’s:

Specific: “Tech P/E ratios typically range 20-35x” not “P/E varies” Actionable: Can be applied to future queries Generalizable: Useful beyond this one conversation