S claudeskill.wiki

ai-research

129 skills

ai-research

agents-langchain

581

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrations, ReAct agents, tool calling, memory management, and vector store retrieval. Use for building chatbots, question-answering systems, autonomous agents, or RAG applications. Best for rapid prototyping and production deployments.

npx claude-code-templates@latest --skill ai-research/agents-langchain
Read more →
ai-research

agent-memory-systems

543

"Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them. Key insight: Memory isn't just storage - it's retrieval. A million stored facts mean nothing if you can't find the right one. Chunking, embedding, and retrieval strategies determine whether your agent remembers or forgets. The field is fragm"

npx claude-code-templates@latest --skill ai-research/agent-memory-systems
Read more →
ai-research

claude-code-guide

518

Master guide for using Claude Code effectively. Includes configuration templates, prompting strategies "Thinking" keywords, debugging techniques, and best practices for interacting with the agent.

npx claude-code-templates@latest --skill ai-research/claude-code-guide
Read more →
ai-research

ai-agents-architect

463

"Expert in designing and building autonomous AI agents. Masters tool use, memory systems, planning strategies, and multi-agent orchestration. Use when: build agent, AI agent, autonomous agent, tool use, function calling."

npx claude-code-templates@latest --skill ai-research/ai-agents-architect
Read more →
ai-research

context7-auto-research

405

Automatically fetch latest library/framework documentation for Claude Code via Context7 API

npx claude-code-templates@latest --skill ai-research/context7-auto-research
Read more →
ai-research

prompt-engineering

345

Expert guide on prompt engineering patterns, best practices, and optimization techniques. Use when user wants to improve prompts, learn prompting strategies, or debug agent behavior.

npx claude-code-templates@latest --skill ai-research/prompt-engineering
Read more →
ai-research

prompt-engineer

258

"Expert in designing effective prompts for LLM-powered applications. Masters prompt structure, context management, output formatting, and prompt evaluation. Use when: prompt engineering, system prompt, few-shot, chain of thought, prompt design."

npx claude-code-templates@latest --skill ai-research/prompt-engineer
Read more →
ai-research

agents-autogpt

241

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI automation systems.

npx claude-code-templates@latest --skill ai-research/agents-autogpt
Read more →
ai-research

agents-crewai

241

Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on complex tasks, when you need role-based agent collaboration with memory, or for production workflows requiring sequential/hierarchical execution. Built without LangChain dependencies for lean, fast execution.

npx claude-code-templates@latest --skill ai-research/agents-crewai
Read more →
ai-research

agent-manager-skill

220

Manage multiple local CLI agents via tmux sessions (start/stop/monitor/assign) with cron-friendly scheduling.

npx claude-code-templates@latest --skill ai-research/agent-manager-skill
Read more →
ai-research

agent-tool-builder

199

"Tools are how AI agents interact with the world. A well-designed tool is the difference between an agent that works and one that hallucinates, fails silently, or costs 10x more tokens than necessary. This skill covers tool design from schema to error handling. JSON Schema best practices, description writing that actually helps the LLM, validation, and the emerging MCP standard that's becoming the lingua franca for AI tools. Key insight: Tool descriptions are more important than tool implementa"

npx claude-code-templates@latest --skill ai-research/agent-tool-builder
Read more →
ai-research

autonomous-agents

198

"Autonomous agents are AI systems that can independently decompose goals, plan actions, execute tools, and self-correct without constant human guidance. The challenge isn't making them capable - it's making them reliable. Every extra decision multiplies failure probability. This skill covers agent loops (ReAct, Plan-Execute), goal decomposition, reflection patterns, and production reliability. Key insight: compounding error rates kill autonomous agents. A 95% success rate per step drops to 60% b"

npx claude-code-templates@latest --skill ai-research/autonomous-agents
Read more →
ai-research

langgraph

178

"Expert in LangGraph - the production-grade framework for building stateful, multi-actor AI applications. Covers graph construction, state management, cycles and branches, persistence with checkpointers, human-in-the-loop patterns, and the ReAct agent pattern. Used in production at LinkedIn, Uber, and 400+ companies. This is LangChain's recommended approach for building agents. Use when: langgraph, langchain agent, stateful agent, agent graph, react agent."

npx claude-code-templates@latest --skill ai-research/langgraph
Read more →
ai-research

agent-evaluation

176

"Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent."

npx claude-code-templates@latest --skill ai-research/agent-evaluation
Read more →
ai-research

research-engineer

171

"An uncompromising Academic Research Engineer. Operates with absolute scientific rigor, objective criticism, and zero flair. Focuses on theoretical correctness, formal verification, and optimal implementation across any required technology."

npx claude-code-templates@latest --skill ai-research/research-engineer
Read more →
ai-research

agent-memory-mcp

167

A hybrid memory system that provides persistent, searchable knowledge management for AI agents (Architecture, Patterns, Decisions).

npx claude-code-templates@latest --skill ai-research/agent-memory-mcp
Read more →
ai-research

conversation-memory

159

"Persistent memory systems for LLM conversations including short-term, long-term, and entity-based memory Use when: conversation memory, remember, memory persistence, long-term memory, chat history."

npx claude-code-templates@latest --skill ai-research/conversation-memory
Read more →
ai-research

qa-test-planner

152

Generate comprehensive test plans, manual test cases, regression test suites, and bug reports for QA engineers. Includes Figma MCP integration for design validation.

npx claude-code-templates@latest --skill ai-research/qa-test-planner
Read more →
ai-research

rag-engineer

148

"Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when: building RAG, vector search, embeddings, semantic search, document retrieval."

npx claude-code-templates@latest --skill ai-research/rag-engineer
Read more →
ai-research

agents-llamaindex

144

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features vector indices, query engines, agents, and multi-modal support. Use for document Q&A, chatbots, knowledge retrieval, or building RAG pipelines. Best for data-centric LLM applications.

npx claude-code-templates@latest --skill ai-research/agents-llamaindex
Read more →
ai-research

prompt-engineering-guidance

133

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

npx claude-code-templates@latest --skill ai-research/prompt-engineering-guidance
Read more →
ai-research

context-window-management

130

"Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rot Use when: context window, token limit, context management, context engineering, long context."

npx claude-code-templates@latest --skill ai-research/context-window-management
Read more →
ai-research

computer-use-agents

124

"Build AI agents that interact with computers like humans do - viewing screens, moving cursors, clicking buttons, and typing text. Covers Anthropic's Computer Use, OpenAI's Operator/CUA, and open-source alternatives. Critical focus on sandboxing, security, and handling the unique challenges of vision-based control. Use when: computer use, desktop automation agent, screen control AI, vision-based agent, GUI automation."

npx claude-code-templates@latest --skill ai-research/computer-use-agents
Read more →
ai-research

prompt-engineering-instructor

119

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream partial results with Instructor - battle-tested structured output library

npx claude-code-templates@latest --skill ai-research/prompt-engineering-instructor
Read more →
ai-research

voice-ai-development

117

"Expert in building voice AI applications - from real-time voice agents to voice-enabled apps. Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to build low-latency, production-ready voice experiences. Use when: voice ai, voice agent, speech to text, text to speech, realtime voice."

npx claude-code-templates@latest --skill ai-research/voice-ai-development
Read more →
ai-research

autonomous-agent-patterns

116

"Design patterns for building autonomous coding agents. Covers tool integration, permission systems, browser automation, and human-in-the-loop workflows. Use when building AI agents, designing tool APIs, implementing permission systems, or creating autonomous coding assistants."

npx claude-code-templates@latest --skill ai-research/autonomous-agent-patterns
Read more →
114

Use when the user mentions Jira issues (e.g., "PROJ-123"), asks about tickets, wants to create/view/update issues, check sprint status, or manage their Jira workflow. Triggers on keywords like "jira", "issue", "ticket", "sprint", "backlog", or issue key patterns.

npx claude-code-templates@latest --skill ai-research/jira
Read more →
ai-research

voice-agents

97

"Voice agents represent the frontier of AI interaction - humans speaking naturally with AI systems. The challenge isn't just speech recognition and synthesis, it's achieving natural conversation flow with sub-800ms latency while handling interruptions, background noise, and emotional nuance. This skill covers two architectures: speech-to-speech (OpenAI Realtime API, lowest latency, most natural) and pipeline (STT→LLM→TTS, more control, easier to debug). Key insight: latency is the constraint. Hu"

npx claude-code-templates@latest --skill ai-research/voice-agents
Read more →
ai-research

data-processing-ray-data

89

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

npx claude-code-templates@latest --skill ai-research/data-processing-ray-data
Read more →
85

Use when the user asks to run Gemini CLI for code review, plan review, or big context (>200k) processing. Ideal for comprehensive analysis requiring large context windows. Uses Gemini 3 Pro by default for state-of-the-art reasoning and coding.

npx claude-code-templates@latest --skill ai-research/gemini
Read more →
ai-research

parallel-agents

80

Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.

npx claude-code-templates@latest --skill ai-research/parallel-agents
Read more →
ai-research

prompt-engineering-dspy

78

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

npx claude-code-templates@latest --skill ai-research/prompt-engineering-dspy
Read more →
ai-research

behavioral-modes

76

AI operational modes (brainstorm, implement, debug, review, teach, ship, orchestrate). Use to adapt behavior based on task type.

npx claude-code-templates@latest --skill ai-research/behavioral-modes
Read more →
ai-research

prompt-library

76

"Curated collection of high-quality prompts for various use cases. Includes role-based prompts, task-specific templates, and prompt refinement techniques. Use when user needs prompt templates, role-play prompts, or ready-to-use prompt examples for coding, writing, analysis, or creative tasks."

npx claude-code-templates@latest --skill ai-research/prompt-library
Read more →
ai-research

distributed-training-pytorch-lightning

75

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

npx claude-code-templates@latest --skill ai-research/distributed-training-pytorch-lightning
Read more →
ai-research

distributed-training-accelerate

72

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

npx claude-code-templates@latest --skill ai-research/distributed-training-accelerate
Read more →
ai-research

emerging-techniques-knowledge-distillation

72

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-knowledge-distillation
Read more →
ai-research

data-processing-nemo-curator

71

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

npx claude-code-templates@latest --skill ai-research/data-processing-nemo-curator
Read more →
ai-research

rag-qdrant

70

High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.

npx claude-code-templates@latest --skill ai-research/rag-qdrant
Read more →
68

"Expert in CrewAI - the leading role-based multi-agent framework used by 60% of Fortune 500 companies. Covers agent design with roles and goals, task definition, crew orchestration, process types (sequential, hierarchical, parallel), memory systems, and flows for complex workflows. Essential for building collaborative AI agent teams. Use when: crewai, multi-agent team, agent roles, crew of agents, role-based agents."

npx claude-code-templates@latest --skill ai-research/crewai
Read more →
ai-research

emerging-techniques-long-context

67

Extend context windows of transformer models using RoPE, YaRN, ALiBi, and position interpolation techniques. Use when processing long documents (32k-128k+ tokens), extending pre-trained models beyond original context limits, or implementing efficient positional encodings. Covers rotary embeddings, attention biases, interpolation methods, and extrapolation strategies for LLMs.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-long-context
Read more →
ai-research

dispatching-parallel-agents

66

Use when facing 2+ independent tasks that can be worked on without shared state or sequential dependencies

npx claude-code-templates@latest --skill ai-research/dispatching-parallel-agents
Read more →
ai-research

data-scientist

63

Expert data scientist for advanced analytics, machine learning, and statistical modeling. Handles complex data analysis, predictive modeling, and business intelligence.

npx claude-code-templates@latest --skill ai-research/data-scientist
Read more →
ai-research

distributed-training-deepspeed

59

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

npx claude-code-templates@latest --skill ai-research/distributed-training-deepspeed
Read more →
ai-research

perplexity

59

Web search and research using Perplexity AI. Use when user says "search", "find", "look up", "ask", "research", or "what's the latest" for generic queries. NOT for library/framework docs (use Context7) or workspace questions.

npx claude-code-templates@latest --skill ai-research/perplexity
Read more →
ai-research

llm-app-patterns

58

"Production-ready patterns for building LLM applications. Covers RAG pipelines, agent architectures, prompt IDEs, and LLMOps monitoring. Use when designing AI applications, implementing RAG, building agents, or setting up LLM observability."

npx claude-code-templates@latest --skill ai-research/llm-app-patterns
Read more →
ai-research

ml-paper-writing

58

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verification workflows.

npx claude-code-templates@latest --skill ai-research/ml-paper-writing
Read more →
ai-research

evaluation-lm-evaluation-harness

56

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

npx claude-code-templates@latest --skill ai-research/evaluation-lm-evaluation-harness
Read more →
ai-research

inference-serving-vllm

55

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

npx claude-code-templates@latest --skill ai-research/inference-serving-vllm
Read more →
ai-research

distributed-training-megatron-core

53

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

npx claude-code-templates@latest --skill ai-research/distributed-training-megatron-core
Read more →
ai-research

loki-mode

52

Multi-agent autonomous startup system for Claude Code. Triggers on "Loki Mode". Orchestrates 100+ specialized agents across engineering, QA, DevOps, security, data/ML, business operations, marketing, HR, and customer success. Takes PRD to fully deployed, revenue-generating product with zero human intervention. Features Task tool for subagent dispatch, parallel code review with 3 specialized reviewers, severity-based issue triage, distributed task queue with dead letter handling, automatic deployment to cloud providers, A/B testing, customer feedback loops, incident response, circuit breakers, and self-healing. Handles rate limits via distributed state checkpoints and auto-resume with exponential backoff. Requires --dangerously-skip-permissions flag.

npx claude-code-templates@latest --skill ai-research/loki-mode
Read more →
ai-research

distributed-training-pytorch-fsdp

51

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2

npx claude-code-templates@latest --skill ai-research/distributed-training-pytorch-fsdp
Read more →
ai-research

emerging-techniques-model-merging

49

Merge multiple fine-tuned models using mergekit to combine capabilities without retraining. Use when creating specialized models by blending domain-specific expertise (math + coding + chat), improving performance beyond single models, or experimenting rapidly with model variants. Covers SLERP, TIES-Merging, DARE, Task Arithmetic, linear merging, and production deployment strategies.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-model-merging
Read more →
ai-research

prompt-engineering-outlines

48

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library

npx claude-code-templates@latest --skill ai-research/prompt-engineering-outlines
Read more →
ai-research

datadog-cli

46

Datadog CLI for searching logs, querying metrics, tracing requests, and managing dashboards. Use this when debugging production issues or working with Datadog observability.

npx claude-code-templates@latest --skill ai-research/datadog-cli
Read more →
ai-research

emerging-techniques-speculative-decoding

46

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-speculative-decoding
Read more →
ai-research

rag-chroma

46

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.

npx claude-code-templates@latest --skill ai-research/rag-chroma
Read more →
ai-research

distributed-training-ray-train

45

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

npx claude-code-templates@latest --skill ai-research/distributed-training-ray-train
Read more →
ai-research

rag-implementation

44

"Retrieval-Augmented Generation patterns including chunking, embeddings, vector stores, and retrieval optimization Use when: rag, retrieval augmented, vector search, embeddings, semantic search."

npx claude-code-templates@latest --skill ai-research/rag-implementation
Read more →
ai-research

inference-serving-tensorrt-llm

43

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

npx claude-code-templates@latest --skill ai-research/inference-serving-tensorrt-llm
Read more →
ai-research

fine-tuning-peft

42

Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.

npx claude-code-templates@latest --skill ai-research/fine-tuning-peft
Read more →
ai-research

subagent-driven-development

42

Use when executing implementation plans with independent tasks in the current session

npx claude-code-templates@latest --skill ai-research/subagent-driven-development
Read more →
ai-research

fine-tuning-unsloth

41

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

npx claude-code-templates@latest --skill ai-research/fine-tuning-unsloth
Read more →
ai-research

emerging-techniques-moe-training

39

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-moe-training
Read more →
ai-research

evaluation-bigcode-evaluation-harness

38

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

npx claude-code-templates@latest --skill ai-research/evaluation-bigcode-evaluation-harness
Read more →
ai-research

prompt-caching

37

"Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented."

npx claude-code-templates@latest --skill ai-research/prompt-caching
Read more →
ai-research

emerging-techniques-model-pruning

35

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

npx claude-code-templates@latest --skill ai-research/emerging-techniques-model-pruning
Read more →
ai-research

inference-serving-llama-cpp

35

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

npx claude-code-templates@latest --skill ai-research/inference-serving-llama-cpp
Read more →
ai-research

fine-tuning-llama-factory

34

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

npx claude-code-templates@latest --skill ai-research/fine-tuning-llama-factory
Read more →
ai-research

mlops-mlflow

33

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow - framework-agnostic ML lifecycle platform

npx claude-code-templates@latest --skill ai-research/mlops-mlflow
Read more →
ai-research

mlops-weights-and-biases

32

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

npx claude-code-templates@latest --skill ai-research/mlops-weights-and-biases
Read more →
ai-research

multimodal-whisper

30

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

npx claude-code-templates@latest --skill ai-research/multimodal-whisper
Read more →
ai-research

rag-faiss

29

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.

npx claude-code-templates@latest --skill ai-research/rag-faiss
Read more →
ai-research

fine-tuning-axolotl

27

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

npx claude-code-templates@latest --skill ai-research/fine-tuning-axolotl
Read more →
ai-research

mlops-tensorboard

27

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Google's ML visualization toolkit

npx claude-code-templates@latest --skill ai-research/mlops-tensorboard
Read more →
ai-research

evaluation-nemo-evaluator

26

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

npx claude-code-templates@latest --skill ai-research/evaluation-nemo-evaluator
Read more →
ai-research

infrastructure-modal

26

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.

npx claude-code-templates@latest --skill ai-research/infrastructure-modal
Read more →
ai-research

infrastructure-lambda-labs

25

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training.

npx claude-code-templates@latest --skill ai-research/infrastructure-lambda-labs
Read more →
ai-research

langfuse

25

"Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for debugging, monitoring, and improving LLM applications in production. Use when: langfuse, llm observability, llm tracing, prompt management, llm evaluation."

npx claude-code-templates@latest --skill ai-research/langfuse
Read more →
ai-research

inference-serving-sglang

24

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

npx claude-code-templates@latest --skill ai-research/inference-serving-sglang
Read more →
ai-research

multimodal-stable-diffusion

23

State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.

npx claude-code-templates@latest --skill ai-research/multimodal-stable-diffusion
Read more →
ai-research

rag-pinecone

23

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

npx claude-code-templates@latest --skill ai-research/rag-pinecone
Read more →
ai-research

infrastructure-skypilot

22

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.

npx claude-code-templates@latest --skill ai-research/infrastructure-skypilot
Read more →
ai-research

multimodal-audiocraft

22

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.

npx claude-code-templates@latest --skill ai-research/multimodal-audiocraft
Read more →
ai-research

observability-langsmith

21

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

npx claude-code-templates@latest --skill ai-research/observability-langsmith
Read more →
ai-research

data-engineer

20

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.

npx claude-code-templates@latest --skill ai-research/data-engineer
Read more →
ai-research

deep-research-notebooklm

20

"Deep research skill powered by NotebookLM MCP. Conducts structured multi-source research (market analysis, competitive intel, trend analysis, prospect research) using Google NotebookLM as the research engine, then delivers formatted briefs and optional studio artifacts (slides, audio podcasts, videos, infographics, reports, mind maps)."

npx claude-code-templates@latest --skill ai-research/deep-research-notebooklm
Read more →
ai-research

mechanistic-interpretability-transformer-lens

20

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

npx claude-code-templates@latest --skill ai-research/mechanistic-interpretability-transformer-lens
Read more →
ai-research

rag-sentence-transformers

20

Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.

npx claude-code-templates@latest --skill ai-research/rag-sentence-transformers
Read more →
ai-research

openai-docs

19

"Use when the user asks how to build with OpenAI products or APIs and needs up-to-date official documentation with citations (for example: Codex, Responses API, Chat Completions, Apps SDK, Agents SDK, Realtime, model capabilities or limits); prioritize OpenAI docs MCP tools and restrict any fallback browsing to official OpenAI domains."

npx claude-code-templates@latest --skill ai-research/openai-docs
Read more →
ai-research

optimization-flash-attention

19

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

npx claude-code-templates@latest --skill ai-research/optimization-flash-attention
Read more →
ai-research

tokenization-huggingface-tokenizers

19

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

npx claude-code-templates@latest --skill ai-research/tokenization-huggingface-tokenizers
Read more →
ai-research

deep-research

18

"Run autonomous research tasks that plan, search, read, and synthesize information into comprehensive reports."

npx claude-code-templates@latest --skill ai-research/deep-research
Read more →
ai-research

optimization-gguf

18

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

npx claude-code-templates@latest --skill ai-research/optimization-gguf
Read more →
17

Creates detailed, sectionized implementation plans through research, stakeholder interviews, and multi-LLM review. Use when planning features that need thorough pre-implementation analysis.

npx claude-code-templates@latest --skill ai-research/gepetto
Read more →
ai-research

model-architecture-nanogpt

17

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

npx claude-code-templates@latest --skill ai-research/model-architecture-nanogpt
Read more →
ai-research

multimodal-clip

16

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

npx claude-code-templates@latest --skill ai-research/multimodal-clip
Read more →
ai-research

multimodal-segment-anything

16

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

npx claude-code-templates@latest --skill ai-research/multimodal-segment-anything
Read more →
ai-research

optimization-bitsandbytes

16

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

npx claude-code-templates@latest --skill ai-research/optimization-bitsandbytes
Read more →
ai-research

observability-phoenix

15

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.

npx claude-code-templates@latest --skill ai-research/observability-phoenix
Read more →
ai-research

optimization-gptq

15

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

npx claude-code-templates@latest --skill ai-research/optimization-gptq
Read more →
ai-research

post-training-trl-fine-tuning

15

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

npx claude-code-templates@latest --skill ai-research/post-training-trl-fine-tuning
Read more →
ai-research

safety-alignment-constitutional-ai

15

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

npx claude-code-templates@latest --skill ai-research/safety-alignment-constitutional-ai
Read more →
ai-research

safety-alignment-nemo-guardrails

15

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

npx claude-code-templates@latest --skill ai-research/safety-alignment-nemo-guardrails
Read more →
ai-research

tokenization-sentencepiece

15

Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.

npx claude-code-templates@latest --skill ai-research/tokenization-sentencepiece
Read more →
ai-research

mechanistic-interpretability-saelens

13

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable features, analyzing superposition, or studying monosemantic representations in language models.

npx claude-code-templates@latest --skill ai-research/mechanistic-interpretability-saelens
Read more →
ai-research

optimization-hqq

13

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

npx claude-code-templates@latest --skill ai-research/optimization-hqq
Read more →
ai-research

optimization-awq

12

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

npx claude-code-templates@latest --skill ai-research/optimization-awq
Read more →
ai-research

post-training-grpo-rl-training

12

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

npx claude-code-templates@latest --skill ai-research/post-training-grpo-rl-training
Read more →
ai-research

safety-alignment-llamaguard

12

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.

npx claude-code-templates@latest --skill ai-research/safety-alignment-llamaguard
Read more →
ai-research

model-architecture-rwkv

11

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.

npx claude-code-templates@latest --skill ai-research/model-architecture-rwkv
Read more →
ai-research

post-training-openrlhf

11

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

npx claude-code-templates@latest --skill ai-research/post-training-openrlhf
Read more →
ai-research

model-architecture-mamba

10

State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2 (d_state=128, multi-head). Models 130M-2.8B on HuggingFace.

npx claude-code-templates@latest --skill ai-research/model-architecture-mamba
Read more →
ai-research

multimodal-blip-2

10

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

npx claude-code-templates@latest --skill ai-research/multimodal-blip-2
Read more →
ai-research

post-training-miles

10

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

npx claude-code-templates@latest --skill ai-research/post-training-miles
Read more →
ai-research

post-training-simpo

10

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

npx claude-code-templates@latest --skill ai-research/post-training-simpo
Read more →
ai-research

post-training-verl

10

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

npx claude-code-templates@latest --skill ai-research/post-training-verl
Read more →
ai-research

multimodal-llava

9

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

npx claude-code-templates@latest --skill ai-research/multimodal-llava
Read more →
ai-research

post-training-slime

9

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

npx claude-code-templates@latest --skill ai-research/post-training-slime
Read more →
ai-research

mechanistic-interpretability-pyvene

8

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

npx claude-code-templates@latest --skill ai-research/mechanistic-interpretability-pyvene
Read more →
ai-research

model-architecture-torchtitan

8

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

npx claude-code-templates@latest --skill ai-research/model-architecture-torchtitan
Read more →
ai-research

post-training-torchforge

8

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

npx claude-code-templates@latest --skill ai-research/post-training-torchforge
Read more →
ai-research

llm-evaluation

7

"Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing."

npx claude-code-templates@latest --skill ai-research/llm-evaluation
Read more →
ai-research

mechanistic-interpretability-nnsight

7

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

npx claude-code-templates@latest --skill ai-research/mechanistic-interpretability-nnsight
Read more →
ai-research

ml-engineer

7

Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring.

npx claude-code-templates@latest --skill ai-research/ml-engineer
Read more →
ai-research

model-architecture-litgpt

7

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

npx claude-code-templates@latest --skill ai-research/model-architecture-litgpt
Read more →
ai-research

prompt-engineering-patterns

7

"Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability."

npx claude-code-templates@latest --skill ai-research/prompt-engineering-patterns
Read more →
ai-research

pydantic-ai

6

"Build production-ready AI agents with PydanticAI — type-safe tool use, structured outputs, dependency injection, and multi-model support."

npx claude-code-templates@latest --skill ai-research/pydantic-ai
Read more →
1

"LLM Operations -- RAG, embeddings, vector databases, fine-tuning, prompt engineering avancado, custos de LLM, evals de qualidade e arquiteturas de IA para producao."

npx claude-code-templates@latest --skill ai-research/llm-ops
Read more →