AI infrastructure agent skills enable Claude Code to write, optimize, and debug high-performance systems. Operations teams benefit from automated infrastructure development, reducing manual effort and improving system reliability. Connects to Python-based workflows and integrates with Claude agents.
git clone https://github.com/yzlnew/infra-skills.gitinfra-skills provides expert-level agent capabilities for AI infrastructure engineers working with GPU optimization, distributed training, and model deployment. The skill collection includes TileLang GPU kernel development for NVIDIA/AMD/Ascend hardware, Megatron memory estimation for MoE and dense models, SLIME post-training framework integration for RL scaling, and professional technical visualization tools like TikZ flowcharts and Material You presentations. Each skill packages domain knowledge, code examples, and best practices to help Claude agents handle specialized infrastructure workflows—from writing high-performance CUDA kernels to creating architecture diagrams and slide decks.
Install skills by placing the skill directory in Claude's skills path, or ask Claude Code directly to install from the GitHub repository. Each skill is self-contained and ready to use with specific AI infrastructure tasks—select the skill matching your workflow (kernel development, memory estimation, training setup, or visualization).
Write and optimize GPU kernels using TileLang for matrix multiplication and attention mechanisms
Estimate GPU memory requirements for Megatron-based models with various parallelism strategies
Configure and implement RL training pipelines using SLIME framework with custom reward models
Generate professional flowcharts and architecture diagrams for technical documentation
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/yzlnew/infra-skillsCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
I need help automating my [COMPANY]'s [INDUSTRY] infrastructure. Please generate a Python script to deploy and manage [SPECIFIC_INFRASTRUCTURE_COMPONENT] using [CLOUD_PROVIDER]. Ensure the script includes error handling, logging, and follows best practices for [INDUSTRY] compliance.
# Infrastructure Automation Script for [COMPANY]
```python
import boto3
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class InfrastructureManager:
def __init__(self, region='us-west-2'):
self.ec2 = boto3.resource('ec2', region_name=region)
self.logger = logger
def deploy_ec2_instance(self, instance_type='t2.micro', key_name='my-key-pair', security_group_ids=['sg-123456']):
try:
instance = self.ec2.create_instances(
ImageId='ami-12345678',
InstanceType=instance_type,
KeyName=key_name,
SecurityGroupIds=security_group_ids,
MinCount=1,
MaxCount=1
)[0]
self.logger.info(f'Deployed EC2 instance: {instance.id}')
return instance
except Exception as e:
self.logger.error(f'Failed to deploy EC2 instance: {e}')
raise
def monitor_instance_status(self, instance_id):
instance = self.ec2.Instance(instance_id)
self.logger.info(f'Instance {instance_id} status: {instance.state[AI assistant built for thoughtful, nuanced conversation
Get more done every day with Microsoft Teams – powered by AI
Automate security compliance and monitor real-time security posture seamlessly.
Automate your spreadsheet tasks with AI power
Agentic AI Workflow platform
Connected workspace for docs, wikis, and projects
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan