Vista Normal

Hay nuevos artículos disponibles. Pincha para refrescar la página.
AnteayerSalida Principal

Llama.ttf is AI, in a Font

27 Junio 2024 at 02:00

It’s a great joke, and like all great jokes it makes you think. [Søren Fuglede Jørgensen] managed to cram a 15 M parameter large language model into a completely valid TrueType font: llama.ttf. Being an LLM-in-a-font means that it’ll do its magic across applications – in your photo editor as well as in your text editor.

What magic, we hear you ask? Say you have some text, written in some non-AI-enabled font. Highlight that, and swap over to llama.ttf. The first thing it does is to change all “o” characters to “ø”s, just like [Søren]’s parents did with his name. But the real magic comes when you type a length of exclamation points. In any normal font, they’re just exclamation points, but llama.ttf replaces them with the output of the TinyStories LLM, run locally in the font. Switching back to another font reveals them to be exclamation points after all. Bønkers!

This is all made possible by the HarfBuzz font extensions library. In the name of making custom ligatures and other text shaping possible, HarfBuzz allows fonts to contain Web Assembly code and runs it in a virtual machine at rendering time. This gives font designers the flexibility to render various Unicode combinations as unique glyphs, which is useful for languages like Persian. But it can just as well turn all “o”s into “ø”s or run all exclamation points through an LLM.

Something screams mischief about running arbitrary WASM while you type, but we remind you that since PostScript, font rendering engines have been able to run code in order to help with the formatting problem. This ability was inherited by PDF, and has kept malicious PDFs in the top-10 infiltration vectors for the last fifteen years. [Citation needed.] So if you can model a CPU in PDF, why not an LLM in TTF? Or a Pokemon clone in an OpenType font?

We don’t think [Søren] was making a security point here, we think he was just having fun. You can see how much fun in his video demo embedded below.

Testing Large Language Models for Circuit Board Design Aid

Por: Maya Posch
24 Junio 2024 at 11:00

Beyond bothering large language models (LLMs) with funny questions, there’s the general idea that they can act as supporting tools. Theoretically they should be able to assist with parsing and summarizing documents, while answering questions about e.g. electronic design. To test this assumption, [Duncan Haldane] employed three of the more highly praised LLMs to assist with circuit board design. These LLMs were GPT-4o (OpenAI), Claude 3 Opus (Anthropic) and Gemini 1.5 (Google).

The tasks ranged from ‘stupid questions’, like asking the delay per unit length of a trace on a PCB, to finding parts for a design, to designing an entire circuit. Of these tasks, only the ‘parsing datasheets’ task could be considered to be successful. This involved uploading the datasheet for a component (nRF5340) and asking the LLM to make a symbol and footprint, in this case for the text-centric JITX format but KiCad/Altium should be possible too. This did require a few passes, as there were glitches and omissions in the generated footprint.

When it came to picking components for a design, it’s clear that you’re out of luck here unless you’re trying to create a design that a million others have made before you in exactly the same way. This problem got worse when trying to design a circuit and ultimately spit out a netlist, with the best LLM (Claude 3 Opus) giving nonsensical suggestions for filter designs and mucking up even basic amplifier designs, including by sticking decoupling capacitors and random resistors just about everywhere.

Effectively, as a text searching tool it would seem that LLMs can have some use for engineers who are tired of digging through yet another few hundred pages of poorly formatted and non-indexed PDF datasheets, but you still need to be on your toes with every step of the way, as the output from the LLM will all too often be slightly to hilariously wrong.

Uncovering ChatGPT Usage in Academic Papers Through Excess Vocabulary

Por: Maya Posch
22 Junio 2024 at 20:00
Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison. (Credit: Kobak et al., 2024)
Frequencies of PubMed abstracts containing certain words. Black lines show counterfactual extrapolations from 2021–22 to 2023–24. The first six words are affected by ChatGPT; the last three relate to major events that influenced scientific writing and are shown for comparison. (Credit: Kobak et al., 2024)

That students these days love to use ChatGPT for assistance with reports and other writing tasks is hardly a secret, but in academics it’s becoming ever more prevalent as well. This raises the question of whether ChatGPT-assisted academic writings can be distinguished somehow. According to [Dmitry Kobak] and colleagues this is the case, with a strong sign of ChatGPT use being the presence of a lot of flowery excess vocabulary in the text. As detailed in their prepublication paper, the frequency of certain style words is a remarkable change in the used vocabulary of the published works examined.

For their study they looked at over 14 million biomedical abstracts from 2010 to 2024 obtained via PubMed. These abstracts were then analyzed for word usage and frequency, which shows both natural increases in word frequency (e.g. from the SARS-CoV-2 pandemic and Ebola outbreak), as well as massive spikes in excess vocabulary that coincide with the public availability of ChatGPT and similar LLM-based tools.

In total 774 unique excess words were annotated. Here ‘excess’ means ‘outside of the norm’, following the pattern of ‘excess mortality’ where mortality during one period noticeably deviates from patterns established during previous periods. In this regard the bump in words like respiratory are logical, but the surge in style words like intricate and notably would seem to be due to LLMs having a penchant for such flowery, overly dramatized language.

The researchers have made the analysis code available for those interested in giving it a try on another corpus. The main author also addressed the question of whether ChatGPT might be influencing people to write more like an LLM. At this point it’s still an open question of whether people would be more inclined to use ChatGPT-like vocabulary or actively seek to avoid sounding like an LLM.

McDonald’s Terminates Its Drive-Through Ordering AI Assistant

Por: Maya Posch
18 Junio 2024 at 08:00

McDonald’s recently announced that it will be scrapping the voice-assistant which it has installed at over 100 of its drive-throughs after a two-year trial run. In the email that was sent to franchises, McDonald’s did say that they are still looking at voice ordering solutions for automated order taking (AOT), but it appears that for now the test was a disappointment. Judging by the many viral videos of customers struggling to place an order through the AOT system, it’s not hard to see why.

This AOT attempt began when in 2019 McDonald’s acquired AI company Apprente to create its McD Tech Labs, only to sell it again to IBM who then got contracted to create the technology for McDonald’s fast-food joints. When launched in 2021, it was expected that McDonald’s drive-through ordering lanes would eventually all be serviced by AOT, with an experience akin to the Alexa and Siri voice assistants that everyone knows and loves (to yell at).

With the demise of this test at McDonald’s, it would seem that the biggest change is likely to be in the wider automation of preparing fast-food instead, with robots doing the burger flipping and freedom frying rather than a human. That said, would you prefer the McD voice assistant when going through a Drive-Thru® over a human voice?

EMO: Alibaba’s Diffusion Model-Based Talking Portrait Generator

Por: Maya Posch
10 Junio 2024 at 23:00

Alibaba’s EMO (or Emote Portrait Alive) framework is a recent entry in a series of attempts to generate a talking head using existing audio (spoken word or vocal audio) and a reference portrait image as inputs. At its core it uses a diffusion model that is trained on 250 hours of video footage and over 150 million images. But unlike previous attempts, it adds what the researchers call a speed controller and a face region controller. These serve to stabilize the generated frames, along with an additional module to stop the diffusion model from outputting frames that feature a result too distinct from the reference image used as input.

In the related paper by [Linrui Tian] and colleagues a number of comparisons are shown between EMO and other frameworks, claiming significant improvements over these. A number of examples of talking and singing heads generated using this framework are provided by the researchers, which gives some idea of what are probably the ‘best case’ outputs. With some examples, like [Leslie Cheung Kwok Wing] singing ‘Unconditional‘ big glitches are obvious and there’s a definite mismatch between the vocal track and facial motions. Despite this, it’s quite impressive, especially with fairly realistic movement of the head including blinking of the eyes.

Meanwhile some seem extremely impressed, such as in a recent video by [Matthew Berman] on EMO where he states that Alibaba releasing this framework to the public might be ‘too dangerous’. The level-headed folks over at PetaPixel however also note the obvious visual imperfections that are a dead give-away for this kind of generative technology. Much like other diffusion model-based generators, it would seem that EMO is still very much stuck in the uncanny valley, with no clear path to becoming a real human yet.

Thanks to [Daniel Starr] for the tip.

What If

9 Junio 2024 at 08:00

We’ve noticed a recent YouTube trend of producing trailers for shows and movies as if they were produced in the 1950s, even when they weren’t. The results are impressive and, as you might expect, leverage AI generation tools. While we enjoy watching them, we were especially interested in [Patrick Gibney’s] peek behind the curtain of how he makes them, as you can see below. If you want to see an example of the result first, check out the second video, showing a 1950s-era The Matrix.

Of course, you could do some of it yourself, but if you want the full AI experience, [Patrick] suggests using ChatGPT to produce a script, though he admits that if he did that, he would tweak the results. Other AI tools create the pictures used and the announcer-style narration. Another tool produces cinematographic shots that include the motion of the “actors” and other things in the scene. More tools create the background music.

Once you have all that, it is straightforward to edit it together as a video. If you want to try your hand, many of the tools have some free tier, although you might not be able to do everything you want in one shot with free tools. [Patrick] reports he spends about $70 a month to get full access to the tools he uses, but he also mentions some other alternatives.

You have to wonder how long it will be before you can just get an AI filmmaker tool that does the whole thing in one swoop. However, doing it in pieces like this does give you a bit more control. In particular, we were interested that some of the “secret sauce” was using negative prompts to prevent certain behaviors in certain tools.

We were hoping [Patrick] would send up Star Trek, but for that, we had to check out [Rafa Reels]. Of course, you don’t have to limit yourself to the 1950s. For example, [Patrick] also wondered what it would be like if Star Wars were made in the 1990s with [Sir Sean Connery] as [Obi Wan]. Thanks to him, you don’t have to wonder.

Can You Hear Me Now? Try These Headphones

30 Mayo 2024 at 05:00

When you are young, you take it for granted that you can pick out a voice in a crowded room or a factory floor. But as you get older, your hearing often gets to the point where a noisy room merges into a mishmash of sounds. University of Washington researchers have developed what they call Target Speech Hearing. In plain English, it is an AI-powered headphone that lets you look at someone and pull their voice out of the chatter. For best results, however, have to enroll their voice first, so it wouldn’t make a great eavesdropping device.

If you want to dive into the technical details, their paper goes into how it works. The prototype uses a Sony noise-cancelling headset. However, the system requires binaural microphones so additional microphones attach to the outside of the headphones.

Given training data, we wonder if traditional correlation methods would be just as effective. In other words, you could use facial recognition to figure out who’s talking and pull their voice out using more traditional signal processing techniques. However, this system can potentially pick up sound from unknown speakers, figuring direction from the binaural microphones, so even if the correlation method worked well on known speakers, the new system is likely superior in new situations.

There’s more to noise-cancelling headgear than you might think. Or you can just go low-tech.

Feast Your Eyes on These AI-Generated Sounds

Por: Tom Nardi
28 Mayo 2024 at 11:00

The radio hackers in the audience will be familiar with a spectrogram display, but for the uninitiated, it’s basically a visual representation of how a range of frequencies are changing with time. Usually such a display is used to identify a clear transmission in a sea of noise, but with the right software, it’s possible to generate a signal that shows up as text or an image when viewed as a spectrogram. Musicians even occasionally use the technique to hide images in their songs. Unfortunately, the audio side of such a trick generally sounds like gibberish to human ears.

Or at least, it used to. Students from the University of Michigan have found a way to use diffusion models to not only create a spectrogram image for a given prompt, but to do it with audio that actually makes sense given what the image shows. So for example if you asked for a spectrogram of a race car, you might get an audio track that sounds like a revving engine.

The first step of the technique is easy enough — two separate pre-trained models are used, Stable Diffusion to create the image, and Auffusion4 to produce the audio. The results are then combined via weighted average, and enter into an iterative denoising process to refine the end result. Normally the process produces a grayscale image, but as the paper explains, a third model can be kicked in to produce a more visually pleasing result without impacting the audio itself.

Ultimately, neither the visual nor audio component is perfect. But they both get close enough that you get the idea, and that alone is pretty impressive. We won’t hazard to guess what practical applications exist for this technique, but the paper does hint at some potential use for steganography. Perhaps something to keep in mind the next time we try to hide data in an episode of the Hackaday Podcast.

Try Image Classification Running In Your Browser, Thanks to WebGPU

20 Mayo 2024 at 11:00

When something does zero-shot image classification, that means it’s able to make judgments about the contents of an image without the user needing to train the system beforehand on what to look for. Watch it in action with this online demo, which uses WebGPU to implement CLIP (Contrastive Language–Image Pre-training) running in one’s browser, using the input from an attached camera.

By giving the program some natural language visual concept labels (such as ‘person’ or ‘cat’) that fit a hypothetical template for the image content, the system will output — in real-time — its judgement on the appropriateness of such labels to what the camera sees. Again, all of this runs locally.

It’s maybe a little bit unintuitive, but what’s happening in the demo is that the system is deciding which of the user-provided labels (“a photo of a cat” vs “a photo of a bald man”, for example) is most appropriate to what the camera sees. The more a particular label is judged a good fit for the image, the higher the number beside it.

This kind of process benefits greatly from shoveling the hard parts of the computation onto compatible graphics cards, which is exactly what WebGPU provides by allowing the browser access to a local GPU. WebGPU is relatively recent, but we’ve already seen it used to run LLMs (Large Language Models) directly in the browser.

Wondering what makes GPUs so very useful for AI-type applications? It’s all about their ability to work with enormous amounts of data very quickly.

NetBSD Bans AI-Generated Code From Commits

Por: Maya Posch
18 Mayo 2024 at 08:00

A recent change was announced to the NetBSD commit guidelines which amends these to state that code which was generated by Large Language Models (LLMs) or similar technologies, such as ChatGPT, Microsoft’s Copilot or Meta’s Code Llama is presumed to be tainted code. This amendment was to the existing section about tainted code, which originally referred to any code that was not written directly by the person committing the code, and was due to licensing concerns. The obvious reason behind this is that otherwise code may be copied into the NetBSD codebase which may have been licensed under an incompatible (or proprietary) license.

In the case of LLM-based code generators like the above-mentioned, the problem stems from the fact that they are trained on millions of lines of code from all over the internet, which are naturally released under a wide variety of licenses. Invariably, some of that code will be covered by a license that’s not acceptable for the NetBSD codebase. Although the guideline mentions that these auto-generated code commits may still be admissible, they require written permission from core developers, and presumably an in-depth audit of the code’s heritage. This should leave non-trivial commits that got churned out by ChatGPT and kin out in the cold.

The debate about the validity of works produced by current-gen “artificial intelligence” software is only just beginning, but there’s little question that NetBSD has made the right call here. From a legal and software engineering perspective this policy makes perfect sense, as LLM-generated code simply doesn’t meet the project’s standards. That said, code produced by humans brings with it a whole different set of potential problems.

AI-Created Coffee Blend Isn’t Terrible

11 Mayo 2024 at 02:01
Kaffa Roastery founder Svante Hampf shows a bag of their AI-conic coffee blend.

Weren’t we just talking about coffee-based sacrilege the other day? Here’s something to make the single-origin bean snobs chew their espresso cups: an artisan roastery in Helsinki is offering a coffee blend created by artificial intelligence called AI-conic. The idea, of course, is that technology will lighten the workload needed to produce coffee.

This is an interesting development because Finland consumes the most coffee in the world, according to the International Coffee Organization. Coffee roasting is a highly-valued traditional artisan profession there, so it stands to reason that they might turn to technology for help.

Just like with scotch whisky, there’s nothing wrong with coffee blends outright. Bean blends are good for consistency, when you want every cup to taste pretty much exactly the same. Single-origin beans, though, are traceable to one location, and as a result, they usually have a distinct flavor based on the climate they’re grown in.

If you’re new to coffee, blends are a nice, safe way to start out. And, interestingly, the AI chose to make the blend out of four different types of beans instead of the usual two or three, despite being tasked with creating a blend that would suit the palates of coffee enthusiasts. But the coffee experts agreed that the AI blend was “perfect” and needed no human intervention. We probably won’t be getting to Finland anytime soon, so if you try it, let us know how it tastes!

Do you like cold brew? How would you like to be able to brew some in just three minutes?

Mitre Wants the Feds to Play in Its Sandbox

8 Mayo 2024 at 02:00

If you haven’t worked with the US government, you might not know Mitre, a non-profit government research organization. Formed in 1958 by the U.S. Air Force as a company to guide the SAGE computer, they are often research experts who oversee government contracts or evaluate proposals. Now they are building a $20 millon “AI Sandbox” for the Federal government to build AI prototypes.

Partnered with NVidia, the sandbox will use an NVidia GDX SuperPOD system capable of an exaFLOP of 8-bit AI computation. Mitre reports this will increase their compute power for AI by two orders of magnitude.

Access to the sandbox will be through one of the six federally funded R&D centers that Mitre operates on behalf of the government. These include centers that support the FAA, the IRS, Homeland Security, Social Security, health services, and cybersecurity with NIST. Of course, the DoD is likely in that mix, too.

So what do they (or the government) think they are going to do with all this AI power? We don’t know. But we are sure we’ll see some colorful guesses in the comments. The fact that it is through the R&D centers makes us think an AI might soon be sifting through your taxes soon or maybe routing your next airplane ride. We aren’t sure if that makes us feel better or worse.

AI servers seem to be the new supercomputer. The scary part is that what one generation considers a supercomputer, the next generation carries in their pocket.

AI Can Now Compress Text

Por: Jenny List
29 Abril 2024 at 05:00

There are many claims in the air about the capabilities of AI systems, as the technology continues to ascend the dizzy heights of the hype cycle. Some of them are true, others stretch definitions a little, while yet more cross the line into the definitely bogus. [J] has one that is backed up by real code though, a compression scheme for text using an AI, and while there may be limitations in its approach, it demonstrates an interesting feature of large language models.

The compression works by assuming that for a sufficiently large model, it’s likely that many source texts will exist somewhere in the training. Using llama.cpp it’s possible to extract the tokenization information of a piece of text contained in its training data and store that as the compressed output. The decompressor can then use that tokenization data as a series of keys to reassemble the original from its training. We’re not AI experts but we are guessing that a source text which has little in common with any training text would fare badly, and we expect that the same model would have to be used on both compression and decompression. It remains a worthy technique though, and no doubt because it has AI pixie dust, somewhere there’s a hype-blinded venture capitalist who would pay millions for it. What a world we live in!

Oddly this isn’t the first time we’ve looked at AI text compression.

Train a GPT-2 LLM, Using Only Pure C Code

28 Abril 2024 at 08:00

[Andrej Karpathy] recently released llm.c, a project that focuses on LLM training in pure C, once again showing that working with these tools isn’t necessarily reliant on sprawling development environments. GPT-2 may be older but is perfectly relevant, being the granddaddy of modern LLMs (large language models) with a clear heritage to more modern offerings.

LLMs are fantastically good at communicating despite not actually knowing what they are saying, and training them usually relies on PyTorch deep learning library, itself written in Python. llm.c takes a simpler approach by implementing the neural network training algorithm for GPT-2 directly. The result is highly focused and surprisingly short: about a thousand lines of C in a single file. It is a highly elegant process that does the same thing the bigger, clunkier methods accomplish. It can run entirely on a CPU, or it can take advantage of GPU acceleration, where available.

This isn’t the first time [Andrej Karpathy] has bent his considerable skills and understanding towards boiling down these sorts of concepts into bare-bones implementations. We previously covered a project of his that is the “hello world” of GPT, a tiny model that predicts the next bit in a given sequence and offers low-level insight into just how GPT (generative pre-trained transformer) models work.

❌
❌