AnderssonProgramming/llm-embeddings-text-preprocessing
LLM text preprocessing and embedding pipeline implementation for the Enterprise Architecture (AREP) course at Escuela Colombiana de IngenierĂa Julio Garavito. Based on "Build a Large Language Model (From Scratch)," it covers BPE tokenization, sliding window sampling experiments, and positional embedding integration using PyTorch.
Stars
—
Forks
—
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/AnderssonProgramming/llm-embeddings-text-preprocessing"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ContextualAI/gritlm
Generative Representational Instruction Tuning
xlang-ai/instructor-embedding
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
liuqidong07/LLMEmb
[AAAI'25 Oral] The official implementation code of LLMEmb
hpcaitech/CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
shobrook/weightgain
Train an adapter for any embedding model in under a minute