Best Practices

This guide provides recommendations for running effective and reliable performance tests with virtbench.

Important Notice:

Do not run these benchmarks directly in your production environment without thorough testing first.

Always test in a non-production environment to understand the impact and behavior.

Test results will vary significantly based on your underlying infrastructure, including hardware specifications, storage backend, network configuration, and cluster resources.

Use at your own risk.

General Testing Practices

1. Start Small, Scale Gradually

Begin with 5-10 VMs to validate your setup
Gradually increase to 50, 100, 200+ VMs
Identify bottlenecks at each scale
Understand your infrastructure limits before large-scale tests

2. Run Multiple Tests

Run each test at least 3 times for consistency
Average results across multiple runs
Identify and investigate outliers
Account for cluster variability

3. Save Results Consistently

Always use --save-results to track performance over time:

virtbench datasource-clone \
  --start 1 \
  --end 50 \
  --storage-class YOUR-STORAGE-CLASS \
  --save-results \
  --storage-version 3.2.0

4. Use Meaningful Test Names

Organize results with storage version and configuration details:

--storage-version "portworx-3.2.0"
--storage-version "ceph-rbd-17.2"

5. Monitor Cluster Resources

Watch cluster resources during tests:

# In a separate terminal
watch kubectl top nodes

# Check storage backend metrics
# (specific to your storage solution)

VM Creation Testing

1. Validate Cluster First

Always run cluster validation before testing:

virtbench validate-cluster --storage-class YOUR-STORAGE-CLASS

2. Use Appropriate Concurrency

Default concurrency (50) works for most scenarios
Increase for large-scale tests (100-200 VMs)
Decrease if experiencing resource contention

virtbench datasource-clone \
  --start 1 \
  --end 200 \
  --concurrency 200 \
  --storage-class YOUR-STORAGE-CLASS

3. Namespace Batch Creation

Create namespaces in batches for faster setup:

virtbench datasource-clone \
  --start 1 \
  --end 100 \
  --namespace-batch-size 50 \
  --storage-class YOUR-STORAGE-CLASS

4. Boot Storm Testing

Test both single-node and multi-node boot storms
Start with smaller VM counts (20-30)
Gradually increase to find capacity limits
Compare initial creation vs boot storm performance

Migration Testing

1. Verify VMs Before Migration

Ensure VMs are healthy before starting migration tests:

# Check VM status
kubectl get vm -n kubevirt-perf-test-1

# Verify network connectivity
kubectl exec -it ssh-test-pod -- ping <vm-ip>

2. Choose Appropriate Migration Scenario

Sequential: For baseline performance
Parallel: For stress testing
Evacuation: For node maintenance scenarios
Round-robin: For load balancing validation

3. Set Realistic Timeouts

Adjust timeouts based on VM size and network:

virtbench migration \
  --start 1 \
  --end 10 \
  --migration-timeout 600 \  # 10 minutes for large VMs
  --source-node worker-1

Chaos Benchmark Testing

1. Understand Your Goals

Stress Test Cluster: Run concurrent chaos operations
Test Specific Scenarios: Use --max-iterations to limit test duration
Skip Unsupported Features: Use --skip-resize, --skip-clone, or --skip-snapshot if needed

2. Start with Conservative Settings

virtbench chaos-benchmark \
  --storage-class YOUR-STORAGE-CLASS \
  --concurrency 2 \
  --vms 5 \
  --max-iterations 5

3. Monitor for Failures

Watch for resource exhaustion
Check storage backend health
Monitor node resources
Review logs for errors

Failure Recovery Testing

1. Test in Non-Production First

Validate FAR configuration in test environment
Understand recovery behavior before production use
Document expected recovery times

2. Use Appropriate Timeouts

Set timeouts based on your RTO requirements:

virtbench failure-recovery \
  --start 1 \
  --end 10 \
  --recovery-timeout 600  # 10 minutes

3. Clean Up FAR Resources

Always clean up FAR resources after testing:

virtbench failure-recovery \
  --start 1 \
  --end 10 \
  --cleanup \
  --cleanup-vms

Logging and Debugging

1. Use Appropriate Log Levels

INFO: Normal operation (default)
DEBUG: Detailed troubleshooting
WARNING: Important issues only
ERROR: Critical errors only

2. Save Logs to Files

virtbench datasource-clone \
  --start 1 \
  --end 50 \
  --storage-class YOUR-STORAGE-CLASS \
  --log-file test-$(date +%Y%m%d-%H%M%S).log

3. Review Logs After Tests

Check for errors and warnings
Identify performance bottlenecks
Validate test completion

Cleanup Practices

1. Always Clean Up After Tests

virtbench datasource-clone \
  --start 1 \
  --end 50 \
  --storage-class YOUR-STORAGE-CLASS \
  --cleanup

2. Use Dry Run First

Preview cleanup before executing:

virtbench datasource-clone \
  --start 1 \
  --end 50 \
  --storage-class YOUR-STORAGE-CLASS \
  --dry-run-cleanup

3. Clean Up on Failure

Ensure resources are cleaned up even if tests fail:

virtbench datasource-clone \
  --start 1 \
  --end 50 \
  --storage-class YOUR-STORAGE-CLASS \
  --cleanup-on-failure

Results Management

1. Organize Results by Version

Use --storage-version to organize results:

--storage-version "portworx-3.2.0"

2. Generate Dashboards Regularly

Create dashboards after each test run:

python3 dashboard/generate_dashboard.py --days 30

3. Archive Important Results

Save dashboard HTML files
Keep JSON/CSV results for historical comparison
Document test conditions and configurations

Performance Optimization

1. Tune for Your Environment

Adjust concurrency based on cluster size
Optimize namespace batch size
Configure appropriate timeouts

2. Minimize External Load

Run tests when cluster is not under load
Avoid running multiple tests simultaneously
Ensure storage backend is not saturated

3. Use Consistent Test Conditions

Same time of day
Same cluster state
Same resource availability