Glossary #
- AI, artificial intelligence #
Refers to the simulation of human intelligence in machines that are designed to learn and solve problems like humans. Enables computers to understand language, make decisions and improve from experience.
- Air gap #
A security measure where a computer network is physically isolated from unsecured networks, including the public Internet.
- Batch size #
The number of samples processed simultaneously during model inference, affecting processing speed and resource utilization.
- BYOC, bring your own certificate #
A practice allowing users to provide their own SSL/TLS certificates for securing communications instead of using default or auto-generated ones.
- CA, certification authority #
An entity that issues digital certificates to verify the identity of certificate holders and ensure secure communications.
- Chain-of-thought (CoT) prompting #
A prompting technique that guides AI models to break down complex problems into step-by-step reasoning processes, improving response accuracy and transparency.
- Chat template #
A structured format for organizing conversations between users and AI models, defining how system prompts, user inputs, and AI responses are formatted and processed.
- Context window #
The maximum amount of text (tokens) that an AI model can process at once, including both the input prompt and generated response.
- CRD, custom resource definitions #
Extensions of the Kubernetes API that allow users to define custom resources and their controllers in a Kubernetes cluster.
- CUDA, Compute Unified Device Architecture #
NVIDIA's parallel computing platform and programming model used to accelerate AI workloads on GPU hardware.
- Data leakage #
The unintended exposure of sensitive information through AI model responses, potentially compromising data security and privacy.
- Embeddings #
Numerical representations of data (text, images, etc.) in a high-dimensional space that capture semantic relationships and enable AI models to process information effectively.
- Fine-tuning #
The process of further training a pre-trained AI model on specific data to adapt it for particular tasks or domains, improving its performance for targeted applications.
- GenAI, generative AI #
A type of artificial intelligence that can create new content such as text, images or music.
- GPU, graphics processing unit #
Specialized hardware designed for parallel processing. In AI applications, GPUs accelerate model training and inference tasks.
- Hallucination #
An AI behavior where the model generates false or unsupported information that appears plausible but has no basis in provided context or real facts.
- Helm #
A package manager for Kubernetes that helps install and manage applications. Helm uses charts to define, install and upgrade complex Kubernetes applications.
- Helm chart #
A package format for Kubernetes applications that contains all resource definitions needed to deploy and configure application workloads.
- IaC, infrastructure as code #
The practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes.
- Inference #
The process of using a trained AI model to make predictions or generate outputs based on new input data.
- Kubernetes pods #
The smallest deployable units in Kubernetes that can host one or more containers, sharing networking and storage resources.
- LLM, large language model #
An advanced AI model trained on amounts of text data to understand and generate human-like text. Can perform tasks like translation, summarization and answering questions.
- Model weights #
The learned parameters of an AI model that determine how it processes inputs and generates outputs. These weights are adjusted during training to optimize model performance.
- NLG, natural language generation #
A process of automatically generating human-like text from structured data or other forms of input. Designed to convert raw data into coherent and meaningful language easily understood by humans.
- NLU, natural language understanding #
A process AI uses to analyze and understand the meaning of the input query.
- NVIDIA GPU driver #
Software that enables communication between the operating system and NVIDIA graphics hardware, essential for GPU-accelerated AI workloads.
- NVIDIA GPU Operator #
A Kubernetes operator that automates the management of NVIDIA GPUs in container environments, handling driver deployment, runtime configuration, and monitoring.
- Ollama #
An open source framework for running and serving AI models locally. Ollama simplifies the process of downloading, running and managing large language models.
- OpenGL #
A cross-platform API for rendering 2D and 3D graphics, commonly used in visualization applications and GPU-accelerated computing.
- Prompt Engineering #
The practice of crafting effective input queries to AI models to obtain desired and accurate outputs. Good prompt engineering helps prevent hallucinations and improves response quality.
- Prompt injection #
A security vulnerability where malicious inputs attempt to override or bypass an AI model's system prompt or safety constraints.
- Quantization #
A technique to reduce AI model size and computational requirements by converting model parameters to lower precision formats while maintaining acceptable performance.
- RAG, retrieval-augmented generation #
A technique that enhances AI responses by retrieving relevant information from a knowledge base before generating answers, improving accuracy and reducing hallucinations.
- RBAC, role-based access control #
A security model that restricts system access based on roles assigned to users, managing permissions and authorization in Kubernetes clusters.
- Semantic search #
A search method using AI to understand the meaning and context of queries rather than just matching keywords, enabling more relevant results.
- System prompt #
Initial instructions given to an AI model that define its behavior, role and response parameters. System prompts help maintain consistent and appropriate AI responses.
- Temperature #
A parameter controlling the randomness in AI model outputs. Lower values produce more focused and deterministic responses, while higher values increase creativity and variability.
- Token #
The basic unit of text processing in AI models, representing parts of words, characters or symbols. Models process text by breaking it into tokens for analysis and generation.
- Top-K #
A parameter that limits token selection during text generation to the K most likely next tokens, helping control output quality and relevance.
- Top-P #
Also known as nucleus sampling, a parameter that selects from the smallest set of tokens whose cumulative probability exceeds P, providing dynamic control over text generation diversity.
- Vector database #
A specialized database designed to store and efficiently query high-dimensional vectors that represent data in AI applications, enabling similarity searches and semantic operations.
- Vector store #
A specialized storage system optimized for managing and querying vector embeddings, essential for semantic search and RAG implementations in AI applications.