Exploring Byte Level Transformers.

Last December, Meta AI released paper describing their Byte Latent Transformer (BLT) and today (13 May 2025), they released the weights to Hugging Face. Let’s break down the paper and explore what makes BLT a special model. What is BLT? Let’s break down the name: • Byte: This signifies that the architecture operates directly on raw byte data • Latent: This refers to the way BLT processes the byte data. Instead of processing every individual byte in the main computation layer (which would be prohibitively costly) • Transformer: This indicates that BLT is an LLM architecture based on the Transformer model ...

May 12, 2025 · 4 min · James Malcolm

Learning and applying Deepseek techniques

In January 2025, Deepseek made headlines with the release of their Deepseek R1 models and a suite of smaller models distilled from the larger R1 variant. The announcement sent shockwaves through the market—shaking NASDAQ and causing NVIDIA shares to drop nearly 20% in a single day. Although the performance of these models wasn’t the only factor, Deepseek’s innovation called into question the competitive advantage long held by US-based AI giants. ...

February 1, 2025 · 5 min · James Malcolm

Generative AI reflections for 2024

The year 2024 was an interesting one for Generative AI. It certainly wasn’t bigger than 2022 and 2023, when Generative AI went mainstream through the release of ChatGPT. On the other hand, it wasn’t a ‘nothing’ year as well. Sure, Generative AI is more widespread and prevalent than previous years. Helped through both Android and Apple deploying generative AI on mobile devices. But, as I look back, I feel it was a year of consolidation. Consolidation in the big players, consolidation and confidence in the technology, and consolidation in regulation. ...

December 16, 2024 · 4 min · James Malcolm

Creating Private LLMs

I want to open this post by stating that privacy within large language models (LLMs) is a mammoth topic that spans much more than can be said in a single post. Instead, I want to narrow the focus of the post to showcase some approaches of introducing proprietary data into LLMs, with privacy and safety of sensitive data at the forefront. In a study done by the AI Accelerator Institute, data privacy was the second biggest barrier to adopting LLMs within their company ...

December 18, 2023 · 4 min · James Malcolm

What's next? Next word prediction with PyTorch

Today, I will take you through a simple next-word prediction model built using PyTorch. The inspiration for this, is of course predictive text - or more specifically Google’s Smart Compose. At its core, the Google Smart Compose model is a form of language model. Smart Compose uses a few words the user inputs and then predicts the following words or sentences in emails you want to write. Google details how they build their Smart Compose feature in their research blog post here. From this, I want to pull out some requirements for building a successful Smart Compose model: ...

November 8, 2023 · 7 min · James Malcolm