Exploring Byte Level Transformers.

Last December, Meta AI released paper describing their Byte Latent Transformer (BLT) and today (13 May 2025), they released the weights to Hugging Face. Let’s break down the paper and explore what makes BLT a special model. What is BLT? Let’s break down the name: • Byte: This signifies that the architecture operates directly on raw byte data • Latent: This refers to the way BLT processes the byte data. Instead of processing every individual byte in the main computation layer (which would be prohibitively costly) • Transformer: This indicates that BLT is an LLM architecture based on the Transformer model ...

May 12, 2025 · 4 min · James Malcolm

Learning and applying Deepseek techniques

In January 2025, Deepseek made headlines with the release of their Deepseek R1 models and a suite of smaller models distilled from the larger R1 variant. The announcement sent shockwaves through the market—shaking NASDAQ and causing NVIDIA shares to drop nearly 20% in a single day. Although the performance of these models wasn’t the only factor, Deepseek’s innovation called into question the competitive advantage long held by US-based AI giants. ...

February 1, 2025 · 5 min · James Malcolm

Generative AI reflections for 2024

The year 2024 was an interesting one for Generative AI. It certainly wasn’t bigger than 2022 and 2023, when Generative AI went mainstream through the release of ChatGPT. On the other hand, it wasn’t a ‘nothing’ year as well. Sure, Generative AI is more widespread and prevalent than previous years. Helped through both Android and Apple deploying generative AI on mobile devices. But, as I look back, I feel it was a year of consolidation. Consolidation in the big players, consolidation and confidence in the technology, and consolidation in regulation. ...

December 16, 2024 · 4 min · James Malcolm