gitmyhub

minGPT

Python ★ 25k updated 1y ago

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

A minimal, readable Python implementation of GPT that teaches how language models work. About 300 lines of code showing the Transformer architecture, tokenizer, and training loop.

PythonPyTorchsetup: easycomplexity 2/5

MinGPT is a stripped-down, educational reimplementation of GPT — the type of AI model behind ChatGPT — written by Andrej Karpathy, a prominent AI researcher. GPT (Generative Pretrained Transformer) is the family of language models that take a sequence of text as input and predict what comes next. MinGPT's purpose is not to be the most capable or efficient version; it exists to be the most readable version, so people can actually understand what is happening inside these models.

The entire core implementation is about 300 lines of Python code split across three files: the model definition (the Transformer neural network itself), a tokenizer (which converts text into numbers the model can process), and a generic training loop. The Transformer is the architecture that modern large language models are built on — it processes sequences by letting every token "attend" to every other token to understand context.

The repo includes several small demonstrations: training a GPT from scratch to add numbers, training one as a character-level text generator on any text file, and loading OpenAI's pretrained GPT-2 weights to generate text from a prompt.

A machine learning student or researcher would use minGPT when they want to understand GPT from the ground up without wading through the complexity of production implementations. It is written in Python using PyTorch, a popular deep learning library. Note that the author has since created a successor called nanoGPT for those who want something similarly educational but more capable.

Where it fits