This entry of Machine Readable is a collaboration between Alexander Huras, Founding Engineer, Halcyon, and Jeff Fisher, Communications Lead, Halcyon. Jeff's Twitter addiction contributed to the inspiration for this post, and Alexander's subject-matter expertise and domain knowledge contributed to its substance.
Information platforms that interact with large language models (LLMs) are built with a fundamental advantage: users can engage with them in natural language through written queries and requests. They can do so not just quickly and iteratively, but creatively as well. Asking a question through generative AI invites users to try novel phrasing, to try different tones, and to rephrase rote questions into something unexpected (e.g., "respond in Spanish!").
The ideal results of an LLM query can be both informative and intricate. It contains the information that a user seeks, and it also includes a lot of information if that is what the user wants. An ideal result is also delightful, in a way. It brings information to a user more quickly than they would otherwise get it, and with a welcomed clarity.
But - and this is a big “but” - large language models on their own have no context. They are a machine built for broad area coverage, not for specific depth. By design, they pull towards general purposes, and often respond accordingly, with general or generalist answers.
This is why Halcyon focuses on retrieval-augmented generation (RAG), a technique for grounding large language models in specific areas of knowledge. RAG changes the type of task that an LLM can be used for, from an essentially free-form response generation to a constrained generation that provides more value to a user with specialized interests. In Halcyon’s case, we are grounded in today’s publicly available policy, regulatory, finance and research information relevant to energy transition. Without RAG, in the words of Box CEO Aaron Levie,
“You're at the mercy of how good that computer is at finding the right information to give the AI to answer the question. Which means, of course, you're also at the mercy of how good, accurate, up-to-date, and authoritative your underlying information is that you're feeding the AI model in the prompt.”
If LLMs pull towards general answers, RAG pulls queries back in specific directions thanks to the underlying data that it uses. As Rick Merritt of Nvidia describes RAG, it is the court clerk of generative AI, working hard to deliver authoritative sources that informed judges can use to craft an opinion. Retrieval-augmented generation is fairly new; the research paper that coined the term was published in 2021. But for Halcyon, and for those seeking to bring focus and specificity to LLM output, it is very important.
RAG involves three steps, but it is the first - retrieval - that is the overwhelming majority of the world that a RAG implementor faces. The construction of a meaningful and relevant catalog - retrieval - is heavy lifting. But, it is where the informational value in a system is created, and it is also where the value resides. Optimizing for relevance, specificity, and correctness all happens upstream of generating a response. In RAG, the large language model which provides generation is essentially a last mile.
We are constantly adding content and data to our information catalog, which naturally pulls large language models towards more general responses. A growing catalog then requires more need for RAG’s benefits. The ideal outcome is a virtuous cycle of an increasing volume of accessible and interconnected information, and an increasing understanding of how that information relates to itself.
It’s not hard to imagine how software platforms’ native network effects can jumpstart this flywheel: as more users contribute both new information sources and new syntax with which to query those sources, the system gets bigger, smarter, and more useful. While we’re starting with data and information focused on the energy transition, our vision is to expand to more topics and surface areas broadly relevant to decarbonization while still maintaining our ability to preserve context and specificity.
There is another key element of Halcyon’s system at play in enabling this, one that works to provide the best possible information with the highest, most-informed relevance: our knowledge graph. More to come on that in future posts.
A final thought for our readers. At Halcyon we have an internal debate about not just ideal LLM responses, but also the ideal LLM respondent. Should it act more like search software with a search bar, or more like a “smart/ever-improving human,” as we describe it in internal chats? We are constantly testing responses today; we look forward to testing respondents in the future too.