Key Terminology

Term	Definition	Simplified Explanation
Artificial Intelligence (AI)	The broad field of creating machines that can perform tasks that typically require human intelligence.	Teaching computers to do smart things like recognizing faces or answering questions.
Automatic Prompt Engineering (APE)	Using AI to generate better prompts automatically.	Letting the AI help write smarter prompts for itself.
Beam Search / Top-K / Top-P Sampling	Methods to select next tokens during text generation based on different strategies for diversity and quality.	These are ways the model picks the next word. Some methods go for the best match, others add randomness to sound more natural.
Chain-of-Thought (CoT) Prompting	Strategy where the model is encouraged to show reasoning steps before giving a final answer.	This means asking the AI to explain its steps, not just give an answer, like showing work in math class.
Context Window	The maximum number of tokens a model can consider at one time for generating or interpreting text.	It’s how much text the model can remember and work with at once—like its short-term memory.
Decoder-Only	Architecture used in generative models, which predicts the next token in a sequence using masked self-attention.	A model setup that focuses only on writing forward, like writing a story one word at a time.
Direct Preference Optimization (DPO)	A fine-tuning method that directly optimizes for preferred outputs based on ranking.	A way to teach models to prefer better answers by comparing good and bad ones.
Embedding	Dense vector representation of words or tokens used as input to a neural network.	It’s how words get turned into numbers the model can understand.
Encoder-Decoder	Original transformer design where the encoder processes input and the decoder generates output.	One part reads the input, the other part writes the output—like a translator.
Feedforward Layer	Fully connected neural network layer applied position-wise after attention to add complexity.	A part of the model that further processes each word’s meaning after attention has been applied.
Fine-Tuning	Adjusting a pre-trained model's weights for a specific task or domain.	Tweaking a model that already knows language so it gets better at one specific job.
Flash Attention	Optimized attention mechanism for speed and memory efficiency in large models.	A faster, smarter way for the model to pay attention to words.
Gradient Descent	Optimization algorithm used to train models by adjusting weights to reduce prediction error.	It’s how models learn from mistakes by slowly improving step by step.
Inference	The process of using a trained model to generate predictions or outputs.	When the model 'thinks' based on what it’s learned.
Instruction Tuning	Fine-tuning models on human-written instructions and responses to improve task performance.	Training the model using examples where people give it tasks and responses, so it learns how to follow instructions better.
LLM (Large Language Model)	A deep learning model trained on massive text corpora to perform language-related tasks.	An AI model trained on massive amounts of text data to understand and generate human language.
Layer Normalization	Technique to normalize activations within layers, improving convergence and training stability.	It helps the model train smoothly by keeping values in check.
LoRA (Low-Rank Adaptation)	Fine-tuning technique that updates only low-rank matrices rather than full model weights.	A clever shortcut for fine-tuning large models without using too much memory.
Loss Function	A function used to measure how far off a model's prediction is from the actual result.	A scorecard that tells the model how wrong it was so it can improve.
Machine Learning (ML)	A subfield of AI where computers learn from data to make predictions or decisions without being explicitly programmed.	Like teaching by example—if you show a machine enough pictures of cats, it learns what a cat looks like.
Mixture of Experts (MoE)	Architecture where only a subset of sub-models are activated per input, enabling scale with efficiency.	A model made up of many smaller expert parts, but only a few are used at a time to save effort.
Multi-Head Attention	Using multiple sets of attention calculations in parallel to capture different relationships in data.	The model pays attention in several ways at once to better understand language.
Neural Network	A model inspired by how the human brain works, consisting of layers of nodes (neurons).	A system of virtual “brain cells” that help the AI think.
Overfitting	When a model learns the training data too well, including noise, and performs poorly on new data.	Like memorizing a textbook but failing the real-world test.
Parameter Efficient Fine-Tuning (PEFT)	Approaches like LoRA and Adapters that allow tuning only a small portion of the model.	Ways to update just part of a big model instead of the whole thing.
Positional Encoding	Adds position information to token embeddings, allowing transformers to understand word order.	It tells the model where each word appears in the sentence.
Pre-Training	The initial large-scale training phase on broad, unlabeled datasets.	The first stage where the model learns general language from reading a lot of text.
Prompt	The input text given to an LLM to generate a response.	It’s the question or instruction you give to an AI to start the conversation.
Prompt Engineering	The practice of crafting effective prompts to get the desired outputs from an LLM.	The art of asking better questions to get better answers from AI.
QLoRA	LoRA with quantization for greater memory and compute efficiency.	An even lighter version of LoRA that saves memory and speeds up fine-tuning.
Query, Key, Value (QKV)	Vectors used in self-attention to determine relevance of tokens.	Different roles each word takes when deciding how much attention to give other words.
ReAct Prompting	Combines reasoning with actions (like web search or calculations).	The model can think and also 'do things' like look up facts or run code.
Reinforcement Learning from Human Feedback (RLHF)	Training using human preferences as rewards to align model output with expectations.	A way to teach the model what people like by rewarding good answers.
Residual Connections	Shortcut paths that preserve input signals in deep networks.	These act like memory lanes to help the model not forget what it already learned.
Role Prompting	Assigns the model a specific identity (e.g., a teacher, lawyer).	Like giving the AI a costume or role to play.
Sampling Temperature	A setting that controls how random the AI's answers are. Lower is more focused, higher is more creative.	Think of it like adding spice—more temperature means spicier, more unpredictable answers.
Self-Attention	Mechanism allowing the model to weigh the importance of different tokens in a sequence.	The model looks at all the words in a sentence and decides which ones matter most to each other.
Soft Prompting	Uses learnable vector prompts instead of full model updates for tuning.	Gives the model a gentle nudge using prompts rather than retraining it entirely.
Speculative Decoding	Uses a small model to predict draft tokens, then confirms them with the main model.	A fast trick where a small model guesses ahead and the big model checks the work.
Step-Back Prompting	Asks a broader question first to guide a deeper answer.	Like warming up with a big-picture question before solving a problem.
Supervised Learning	ML technique where models are trained on labeled data (inputs with correct outputs).	It’s like giving the model a quiz with an answer key while it learns.
System Prompting	Sets the overall tone and behavior of the model.	It tells the model 'who' it should act like.
Tokenization	Breaking text into smaller units for model input.	Splitting sentences into chunks the model can understand.
Transformer	Neural architecture using self-attention for sequence modeling.	The foundation of modern language models—like the engine in a car.
Tree of Thoughts (ToT)	Exploring multiple reasoning paths at once.	Instead of one thought train, it branches into a tree of ideas.
Unsupervised Learning	Models learn patterns in data without labeled outcomes.	Letting the model explore data on its own to find hidden patterns.
Zero-shot / One-shot / Few-shot Prompting	Prompting strategies that vary in the number of examples given to guide the model.	Zero = no example, one = one example, few = a couple of examples. Just like showing how to do something before asking for it.