Glossary #
- AI, artificial intelligence #
- Refers to the simulation of human intelligence in machines that are designed to learn and solve problems like humans. Enables computers to understand language, make decisions and improve from experience. 
- Air gap #
- A security measure where a computer network is physically isolated from unsecured networks, including the public Internet. 
- Batch size #
- The number of samples processed simultaneously during model inference, affecting processing speed and resource utilization. 
- BYOC, bring your own certificate #
- A practice allowing users to provide their own SSL/TLS certificates for securing communications instead of using default or auto-generated ones. 
- CA, certification authority #
- An entity that issues digital certificates to verify the identity of certificate holders and ensure secure communications. 
- Chain-of-thought (CoT) prompting #
- A prompting technique that guides AI models to break down complex problems into step-by-step reasoning processes, improving response accuracy and transparency. 
- Chat template #
- A structured format for organizing conversations between users and AI models, defining how system prompts, user inputs, and AI responses are formatted and processed. 
- Context window #
- The maximum amount of text (tokens) that an AI model can process at once, including both the input prompt and generated response. 
- CRD, custom resource definitions #
- Extensions of the Kubernetes API that allow users to define custom resources and their controllers in a Kubernetes cluster. 
- CUDA, Compute Unified Device Architecture #
- NVIDIA's parallel computing platform and programming model used to accelerate AI workloads on GPU hardware. 
- Data leakage #
- The unintended exposure of sensitive information through AI model responses, potentially compromising data security and privacy. 
- Embeddings #
- Numerical representations of data (text, images, etc.) in a high-dimensional space that capture semantic relationships and enable AI models to process information effectively. 
- Fine-tuning #
- The process of further training a pre-trained AI model on specific data to adapt it for particular tasks or domains, improving its performance for targeted applications. 
- GenAI, generative AI #
- A type of artificial intelligence that can create new content such as text, images or music. 
- GPU, graphics processing unit #
- Specialized hardware designed for parallel processing. In AI applications, GPUs accelerate model training and inference tasks. 
- Hallucination #
- An AI behavior where the model generates false or unsupported information that appears plausible but has no basis in provided context or real facts. 
- Helm #
- A package manager for Kubernetes that helps install and manage applications. Helm uses charts to define, install and upgrade complex Kubernetes applications. 
- Helm chart #
- A package format for Kubernetes applications that contains all resource definitions needed to deploy and configure application workloads. 
- IaC, infrastructure as code #
- The practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes. 
- Inference #
- The process of using a trained AI model to make predictions or generate outputs based on new input data. 
- Kubernetes pods #
- The smallest deployable units in Kubernetes that can host one or more containers, sharing networking and storage resources. 
- LLM, large language model #
- An advanced AI model trained on amounts of text data to understand and generate human-like text. Can perform tasks like translation, summarization and answering questions. 
- Model weights #
- The learned parameters of an AI model that determine how it processes inputs and generates outputs. These weights are adjusted during training to optimize model performance. 
- NLG, natural language generation #
- A process of automatically generating human-like text from structured data or other forms of input. Designed to convert raw data into coherent and meaningful language easily understood by humans. 
- NLU, natural language understanding #
- A process AI uses to analyze and understand the meaning of the input query. 
- NVIDIA GPU driver #
- Software that enables communication between the operating system and NVIDIA graphics hardware, essential for GPU-accelerated AI workloads. 
- NVIDIA GPU Operator #
- A Kubernetes operator that automates the management of NVIDIA GPUs in container environments, handling driver deployment, runtime configuration, and monitoring. 
- Ollama #
- An open source framework for running and serving AI models locally. Ollama simplifies the process of downloading, running and managing large language models. 
- OpenGL #
- A cross-platform API for rendering 2D and 3D graphics, commonly used in visualization applications and GPU-accelerated computing. 
- Prompt Engineering #
- The practice of crafting effective input queries to AI models to obtain desired and accurate outputs. Good prompt engineering helps prevent hallucinations and improves response quality. 
- Prompt injection #
- A security vulnerability where malicious inputs attempt to override or bypass an AI model's system prompt or safety constraints. 
- Quantization #
- A technique to reduce AI model size and computational requirements by converting model parameters to lower precision formats while maintaining acceptable performance. 
- RAG, retrieval-augmented generation #
- A technique that enhances AI responses by retrieving relevant information from a knowledge base before generating answers, improving accuracy and reducing hallucinations. 
- RBAC, role-based access control #
- A security model that restricts system access based on roles assigned to users, managing permissions and authorization in Kubernetes clusters. 
- Semantic search #
- A search method using AI to understand the meaning and context of queries rather than just matching keywords, enabling more relevant results. 
- System prompt #
- Initial instructions given to an AI model that define its behavior, role and response parameters. System prompts help maintain consistent and appropriate AI responses. 
- Temperature #
- A parameter controlling the randomness in AI model outputs. Lower values produce more focused and deterministic responses, while higher values increase creativity and variability. 
- Token #
- The basic unit of text processing in AI models, representing parts of words, characters or symbols. Models process text by breaking it into tokens for analysis and generation. 
- Top-K #
- A parameter that limits token selection during text generation to the K most likely next tokens, helping control output quality and relevance. 
- Top-P #
- Also known as nucleus sampling, a parameter that selects from the smallest set of tokens whose cumulative probability exceeds P, providing dynamic control over text generation diversity. 
- Vector database #
- A specialized database designed to store and efficiently query high-dimensional vectors that represent data in AI applications, enabling similarity searches and semantic operations. 
- Vector store #
- A specialized storage system optimized for managing and querying vector embeddings, essential for semantic search and RAG implementations in AI applications.