Saturday, January 27, 2024

Introduction: What are Large Language Models?

Happy New Year!

Welcome back!

At the end of 2023, I started mentioning ChatGPT.  

Merry Christimas! ... Also my thoughts about GPT

To show what makes GPT models so special, it would be interesting to give a basic introduction to large language models (LLMs) in general, focusing on building simple LLMs using T5 models.

You may be wondering "What is LLM?", "What are T5 models?", "You talked about GPT last month, so why use T5 models instead?"

What is LLM?

LLM refers to a type of AI that can generate text-based answers.  ChatGPT uses GPT models which are a type of LLM.  You have your LLM, you ask it a question, and it gives you back an answer in text.  

What are T5-models?

T5 models are another type of LLM.  

You talked about GPT last month, so why use T5 models instead?

  • Intuitively easier to understand LLM concepts:  I'll write about this in more details in future blog posts, but building T5 models require a basic understanding to how LLM reads, understands, and generates human text.  T5 models are SUPER easy to build so you can have the satisfaction of building your own AI while also learning how they work at the same time!  GPT models are very easy to use but most of how the LLM generates text isn't made apparent.  (GPT models are really cool in this aspect but more on this in the future.)
  • Cost: OpenAI gives an initial $5 credit to use OpenAI models, including GPT models, but after that's spent we have to pay to use them.  T5 models are completely free to build!  
  • Simple coding:  Basic T5 models only require a few lines of code!  

Starting next month, I'll be introducing key concepts related to how LLMs generate text:
  • Tokenization
  • Encoding
  • Decoding
Don't know what these mean?  Stay tuned!  On February, we're going to be learning "tokenization."  Get ready to build your own LLM!

Have any questions?  Any requests?  Do you just want to chat?  Comment down below!








LLM Part 2: Encoding

Welcome to Part 2 of building your own large language model! Part 1 was about breaking down your input text into smaller subwords. (tokeniza...