gitmyhub

minbpe

Python ★ 11k updated 1y ago

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

No plain-English explanation yet — one is being written right now. Check back in a minute.