top of page
Key Terminology
Term | Definition | Simplified Explanation |
|---|---|---|
Artificial Intelligence (AI) | The broad field of creating machines that can perform tasks that typically require human intelligence. | Teaching computers to do smart things like recognizing faces or answering questions. |
Automatic Prompt Engineering (APE) | Using AI to generate better prompts automatically. | Letting the AI help write smarter prompts for itself. |
Beam Search / Top-K / Top-P Sampling | Methods to select next tokens during text generation based on different strategies for diversity and quality. | These are ways the model picks the next word. Some methods go for the best match, others add randomness to sound more natural. |
Chain-of-Thought (CoT) Prompting | Strategy where the model is encouraged to show reasoning steps before giving a final answer. | This means asking the AI to explain its steps, not just give an answer, like showing work in math class. |
Context Window | The maximum number of tokens a model can consider at one time for generating or interpreting text. | It’s how much text the model can remember and work with at once—like its short-term memory. |
Decoder-Only | Architecture used in generative models, which predicts the next token in a sequence using masked self-attention. | A model setup that focuses only on writing forward, like writing a story one word at a time. |
Direct Preference Optimization (DPO) | A fine-tuning method that directly optimizes for preferred outputs based on ranking. | A way to teach models to prefer better answers by comparing good and bad ones. |
Embedding | Dense vector representation of words or tokens used as input to a neural network. | It’s how words get turned into numbers the model can understand. |
Encoder-Decoder | Original transformer design where the encoder processes input and the decoder generates output. | One part reads the input, the other part writes the output—like a translator. |
Feedforward Layer | Fully connected neural network layer applied position-wise after attention to add complexity. | A part of the model that further processes each word’s meaning after attention has been applied. |
Fine-Tuning | Adjusting a pre-trained model's weights for a specific task or domain. | Tweaking a model that already knows language so it gets better at one specific job. |
Flash Attention | Optimized attention mechanism for speed and memory efficiency in large models. | A faster, smarter way for the model to pay attention to words. |
Gradient Descent | Optimization algorithm used to train models by adjusting weights to reduce prediction error. | It’s how models learn from mistakes by slowly improving step by step. |
Inference | The process of using a trained model to generate predictions or outputs. | When the model 'thinks' based on what it’s learned. |
Instruction Tuning | Fine-tuning models on human-written instructions and responses to improve task performance. | Training the model using examples where people give it tasks and responses, so it learns how to follow instructions better. |
LLM (Large Language Model) | A deep learning model trained on massive text corpora to perform language-related tasks. | An AI model trained on massive amounts of text data to understand and generate human language. |
Layer Normalization | Technique to normalize activations within layers, improving convergence and training stability. | It helps the model train smoothly by keeping values in check. |
LoRA (Low-Rank Adaptation) | Fine-tuning technique that updates only low-rank matrices rather than full model weights. | A clever shortcut for fine-tuning large models without using too much memory. |
Loss Function | A function used to measure how far off a model's prediction is from the actual result. | A scorecard that tells the model how wrong it was so it can improve. |
Machine Learning (ML) | A subfield of AI where computers learn from data to make predictions or decisions without being explicitly programmed. | Like teaching by example—if you show a machine enough pictures of cats, it learns what a cat looks like. |
Mixture of Experts (MoE) | Architecture where only a subset of sub-models are activated per input, enabling scale with efficiency. | A model made up of many smaller expert parts, but only a few are used at a time to save effort. |
Multi-Head Attention | Using multiple sets of attention calculations in parallel to capture different relationships in data. | The model pays attention in several ways at once to better understand language. |
Neural Network | A model inspired by how the human brain works, consisting of layers of nodes (neurons). | A system of virtual “brain cells” that help the AI think. |
Overfitting | When a model learns the training data too well, including noise, and performs poorly on new data. | Like memorizing a textbook but failing the real-world test. |
Parameter Efficient Fine-Tuning (PEFT) | Approaches like LoRA and Adapters that allow tuning only a small portion of the model. | Ways to update just part of a big model instead of the whole thing. |
Positional Encoding | Adds position information to token embeddings, allowing transformers to understand word order. | It tells the model where each word appears in the sentence. |
Pre-Training | The initial large-scale training phase on broad, unlabeled datasets. | The first stage where the model learns general language from reading a lot of text. |
Prompt | The input text given to an LLM to generate a response. | It’s the question or instruction you give to an AI to start the conversation. |
Prompt Engineering | The practice of crafting effective prompts to get the desired outputs from an LLM. | The art of asking better questions to get better answers from AI. |
QLoRA | LoRA with quantization for greater memory and compute efficiency. | An even lighter version of LoRA that saves memory and speeds up fine-tuning. |
Query, Key, Value (QKV) | Vectors used in self-attention to determine relevance of tokens. | Different roles each word takes when deciding how much attention to give other words. |
ReAct Prompting | Combines reasoning with actions (like web search or calculations). | The model can think and also 'do things' like look up facts or run code. |
Reinforcement Learning from Human Feedback (RLHF) | Training using human preferences as rewards to align model output with expectations. | A way to teach the model what people like by rewarding good answers. |
Residual Connections | Shortcut paths that preserve input signals in deep networks. | These act like memory lanes to help the model not forget what it already learned. |
Role Prompting | Assigns the model a specific identity (e.g., a teacher, lawyer). | Like giving the AI a costume or role to play. |
Sampling Temperature | A setting that controls how random the AI's answers are. Lower is more focused, higher is more creative. | Think of it like adding spice—more temperature means spicier, more unpredictable answers. |
Self-Attention | Mechanism allowing the model to weigh the importance of different tokens in a sequence. | The model looks at all the words in a sentence and decides which ones matter most to each other. |
Soft Prompting | Uses learnable vector prompts instead of full model updates for tuning. | Gives the model a gentle nudge using prompts rather than retraining it entirely. |
Speculative Decoding | Uses a small model to predict draft tokens, then confirms them with the main model. | A fast trick where a small model guesses ahead and the big model checks the work. |
Step-Back Prompting | Asks a broader question first to guide a deeper answer. | Like warming up with a big-picture question before solving a problem. |
Supervised Learning | ML technique where models are trained on labeled data (inputs with correct outputs). | It’s like giving the model a quiz with an answer key while it learns. |
System Prompting | Sets the overall tone and behavior of the model. | It tells the model 'who' it should act like. |
Tokenization | Breaking text into smaller units for model input. | Splitting sentences into chunks the model can understand. |
Transformer | Neural architecture using self-attention for sequence modeling. | The foundation of modern language models—like the engine in a car. |
Tree of Thoughts (ToT) | Exploring multiple reasoning paths at once. | Instead of one thought train, it branches into a tree of ideas. |
Unsupervised Learning | Models learn patterns in data without labeled outcomes. | Letting the model explore data on its own to find hidden patterns. |
Zero-shot / One-shot / Few-shot Prompting | Prompting strategies that vary in the number of examples given to guide the model. | Zero = no example, one = one example, few = a couple of examples. Just like showing how to do something before asking for it. |
bottom of page