Peering Into The Black Box of Large Language Models

3 Julio 2024 at 14:00

Large Language Models (LLMs) can produce extremely human-like communication, but their inner workings are something of a mystery. Not a mystery in the sense that we don’t know how an LLM works, but a mystery in the sense that the exact process of turning a particular input into a particular output is something of a black box.

This “black box” trait is common to neural networks in general, and LLMs are very deep neural networks. It is not really possible to explain precisely why a specific input produces a particular output, and not something else.

Why? Because neural networks are neither databases, nor lookup tables. In a neural network, discrete activation of neurons cannot be meaningfully mapped to specific concepts or words. The connections are complex, numerous, and multidimensional to the point that trying to tease out their relationships in any straightforward way simply does not make sense.

Neural Networks are a Black Box

In a way, this shouldn’t be surprising. After all, the entire umbrella of “AI” is about using software to solve the sorts of problems humans are in general not good at figuring out how to write a program to solve. It’s maybe no wonder that the end product has some level of inscrutability.

This isn’t what most of us expect from software, but as humans we can relate to the black box aspect more than we might realize. Take, for example, the process of elegantly translating a phrase from one language to another.

I’d like to use as an example of this an idea from an article by Lance Fortnow in Quanta magazine about the ubiquity of computation in our world. Lance asks us to imagine a woman named Sophie who grew up speaking French and English and works as a translator. Sophie can easily take any English text and produce a sentence of equivalent meaning in French. Sophie’s brain follows some kind of process to perform this conversion, but Sophie likely doesn’t understand the entire process. She might not even think of it as a process at all. It’s something that just happens. Sophie, like most of us, is intimately familiar with black box functionality.

The difference is that while many of us (perhaps grudgingly) accept this aspect of our own existence, we are understandably dissatisfied with it as a feature of our software. New research has made progress towards changing this.

Identifying Conceptual Features in Language Models

We know perfectly well how LLMs work, but that doesn’t help us pick apart individual transactions. Opening the black box while it’s working yields only a mess of discrete neural activations that cannot be meaningfully mapped to particular concepts, words, or whatever else. Until now, that is.

A small sample of features activated when an LLM is prompted with questions such as “What is it like to be you?” and “What’s going on in your head?” (source: *Extracting Interpretable Features from Claude 3 Sonnet*)

Recent developments have made the black box much less opaque, thanks to tools that can map and visualize LLM internal states during computation. This creates a conceptual snapshot of what the LLM is — for lack of a better term — thinking in the process of putting together its response to a prompt.

Anthropic have recently shared details on their success in mapping the mind of their Claude 3.0 Sonnet model by finding a way to match patterns of neuron activations to concrete, human-understandable concepts called features.

A feature can be just about anything; a person, a place, an object, or more abstract things like the idea of upper case, or function calls. The existence of a feature being activated does not mean it factors directly into the output, but it does mean it played some role in the road the output took.

With a way to map groups of activations to features — a significant engineering challenge — one can meaningfully interpret the contents of the black box. It is also possible to measure a sort of relational “distance” between features, and therefore get an even better idea of what a given state of neural activation represents in conceptual terms.

Making Sense of it all

One way this can be used is to produce a heat map that highlights how heavily different features were involved in Claude’s responses. Artificially manipulating the weighting of different concepts changes Claude’s responses in predictable ways (video), demonstrating that the features are indeed reasonably accurate representations of the LLM’s internal state. More details on this process are available in the paper Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.

Mapping the mind of a state-of-the-art LLM like Claude may be a nontrivial undertaking, but that doesn’t mean the process is entirely the domain of tech companies with loads of resources. Inspectus by [labml.ai] is a visualization tool that works similarly to provide insight into the behavior of LLMs during processing. There is a tutorial on using it with a GPT-2 model, but don’t let that turn you off. GPT-2 may be older, but it is still relevant.

Research like this offers new ways to understand (and potentially manipulate, or fine-tune) these powerful tools., making LLMs more transparent and more useful, especially in applications where lack of operational clarity is hard to accept.

Mistral AI

Easy With AI

EasyWithAI

11 Enero 2024 at 14:42

Mistral AI is a large language model and chat assistant tool. You can access the chatbot via the Mitral website by clicking on “Talk to le Chat“, or if you prefer a local setup then you can download and run the model files on your own hardware. The creators of Mistral describe it as an […]

Source

Athina AI

Easy With AI

EasyWithAI

7 Marzo 2024 at 13:36

Athina is a powerful monitoring and evaluation platform designed for companies deploying large language models (LLMs) in production environments. Its main use case is to allow users to detect hallucinations, analyze their LLM accuracy, and debug outputs through features like prompt management, performance tracking over time, and custom evaluation metrics. Athina integrates seamlessly with popular […]

Source

Stable Beluga 2

Easy With AI

EasyWithAI

31 Julio 2023 at 19:31

Stable Beluga 2 is a new open-source LLM developed by Stability AI and is based off of the LLamA-2 model by Meta AI with 70 billion parameters. This LLM is currently leading the chart on Hugging Face’s Open LLM Leaderboard. Like most other LLMs, you’ll need an interface installed to run Stable Beluga 2 on […]

Source

Stability AI, Team Behind Stable Diffusion Announces First LLM With ChatGPT-Like Capabilities

Easy With AI

EasyWithAI

20 Abril 2023 at 00:11

Stability AI, the team behind the popular AI art tool Stable Diffusion, has announced the launch of its latest creation: StableLM, a suite of text-generating AI models designed to rival systems like OpenAI’s GPT-4 and ChatGPT. Available in “alpha” on GitHub and Hugging Face, StableLM can generate both code and text and has been trained […]

Source

Code Llama

Easy With AI

EasyWithAI

19 Septiembre 2023 at 13:50

Code Llama is a suite of large language models released by Meta AI for generating and enhancing code. It includes foundation models for general coding, Python specializations, and models tailored for following instructions. Key features include state-of-the-art performance, code infilling, large context support up to 100K tokens, and zero-shot ability to follow instructions for programming […]

Source

ChatGLM-6B

Easy With AI

EasyWithAI

18 Septiembre 2023 at 18:02

ChatGLM-6B is an open-source, bilingual conversational AI LLM based on the General Language Model (GLM) framework. It has 6.2 billion parameters and can be deployed locally with only 6GB of GPU memory. This model allows for natural language processing in both Chinese and English, question answering, task-oriented dialogue, and easy integration via API and demo […]

Source

Infermatic

Easy With AI

EasyWithAI

19 Enero 2024 at 14:26

Infermatic offers developers and researchers seamless access to leading large language models through a unified platform. Its user-friendly design makes AI experimentation easy for anyone while still providing advanced users with enterprise-scale capabilities. Infermatic’s free version, TotalGPT Free, offers up to 300 requests per day with a 60 token limit. You can check out the […]

Source

Perplexity AI

Easy With AI

EasyWithAI

4 Mayo 2023 at 01:25

Perplexity AI is an AI chat and search engine that uses advanced technology to provide direct answers to your queries. It delivers accurate answers using large language models and even includes links to citations and related topics. It is available for free via web browser and also on mobile via the Apple App Store. Using […]

Source

NetBSD Bans AI-Generated Code From Commits

Hackaday

Maya Posch

18 Mayo 2024 at 08:00

A recent change was announced to the NetBSD commit guidelines which amends these to state that code which was generated by Large Language Models (LLMs) or similar technologies, such as ChatGPT, Microsoft’s Copilot or Meta’s Code Llama is presumed to be tainted code. This amendment was to the existing section about tainted code, which originally referred to any code that was not written directly by the person committing the code, and was due to licensing concerns. The obvious reason behind this is that otherwise code may be copied into the NetBSD codebase which may have been licensed under an incompatible (or proprietary) license.

In the case of LLM-based code generators like the above-mentioned, the problem stems from the fact that they are trained on millions of lines of code from all over the internet, which are naturally released under a wide variety of licenses. Invariably, some of that code will be covered by a license that’s not acceptable for the NetBSD codebase. Although the guideline mentions that these auto-generated code commits may still be admissible, they require written permission from core developers, and presumably an in-depth audit of the code’s heritage. This should leave non-trivial commits that got churned out by ChatGPT and kin out in the cold.

The debate about the validity of works produced by current-gen “artificial intelligence” software is only just beginning, but there’s little question that NetBSD has made the right call here. From a legal and software engineering perspective this policy makes perfect sense, as LLM-generated code simply doesn’t meet the project’s standards. That said, code produced by humans brings with it a whole different set of potential problems.

Train a GPT-2 LLM, Using Only Pure C Code

Hackaday

Donald Papp

28 Abril 2024 at 08:00

[Andrej Karpathy] recently released llm.c, a project that focuses on LLM training in pure C, once again showing that working with these tools isn’t necessarily reliant on sprawling development environments. GPT-2 may be older but is perfectly relevant, being the granddaddy of modern LLMs (large language models) with a clear heritage to more modern offerings.

LLMs are fantastically good at communicating despite not actually knowing what they are saying, and training them usually relies on PyTorch deep learning library, itself written in Python. llm.c takes a simpler approach by implementing the neural network training algorithm for GPT-2 directly. The result is highly focused and surprisingly short: about a thousand lines of C in a single file. It is a highly elegant process that does the same thing the bigger, clunkier methods accomplish. It can run entirely on a CPU, or it can take advantage of GPU acceleration, where available.

This isn’t the first time [Andrej Karpathy] has bent his considerable skills and understanding towards boiling down these sorts of concepts into bare-bones implementations. We previously covered a project of his that is the “hello world” of GPT, a tiny model that predicts the next bit in a given sequence and offers low-level insight into just how GPT (generative pre-trained transformer) models work.

Vista de Lectura

Neural Networks are a Black Box

Identifying Conceptual Features in Language Models

Making Sense of it all