ift-ts:dst:ift:2026q1-dst-lab
Description
DST lab is a set of machines that are mainly used in a composed Kubernetes cluster. In this environment, we can do a set of experiments to test regressions, new functionalities and features, or experimental changes on any decentralized system. In these experiments, we look for abnormal behaviors, we do measurements to study the performance and robustness of the system. These results can be also used to compare with theoretical results and analyze the scaling behaviors of the systems. At the same time, we will work on improving the lab’s resources and time allocation, while allow provisioning to other teams dedicated machines if they are needed.
Task list
Lab deployment code
- fully qualified name:
ift-ts:dst:ift:2026q1-dst-lab:lab-deployment-code - owner: Mamoutou
- status: done
- start-date: 2026/01/05
- end-date: 2026/02/10
Description
Complete the deployment and configuration of the remaining lab components using fleet, and deliver a one-click Ansible playbook that fully provisions and configures the new DST lab in a repeatable, automated way. Also, update BW resources during the scheduling cycle to avoid race conditions.
Deliverables
- Code:
- vacp2p/vaclab-2#3 Add Lab Components - Part 1
- Reports:
Analyze current stack
- fully qualified name:
ift-ts:dst:ift:2026q1-dst-lab:analyze-current-stack - owner: Mamoutou
- status: done
- start-date: 2026/01/01
- end-date: 2026/03/31
Description
Compare the current stack with new stack proposals. Study if changes are helpful and improve overall performance of the lab. Report findings with analysis and benchmarks of the results.
Deliverables
- Added external
metal-01into the cluster as a tainted worker node. - Fixed Longhorn persistent volume creation by disabling
multipathdacross the nodes. - vacp2p/vaclab-2#20 Increase Vmetrics Components Resources
- vacp2p/vaclab-2#15 Improve Cluster Configuration Part 1
- Increase authentik-server memory limits to prevent OOMKill
- Inspecting Network Flows with Hubble
- vacp2p/vaclab-2#11 Improve Runtime Settings
- vacp2p/vaclab-2#6 Add External DST Node
- status-im/infra-misc#457 vacdst: Open Required Ports for Kube-OVN CNI
- Reports:
- Notion: External node as K3S Master - System Components Traffic Share
- Google Slides: DST Kubernetes Cluster
- Recording: DST Kubernetes Cluster
- Notion: Vaclab 2.0 - Iperf Bandwidth Measurements
- Notion: Current Vaclab Scheduler Benchmark
- Notion: DST Explorer - Proposal For a Dynamic Public Lab Dashboard
Optimize data scrapping
- fully qualified name:
ift-ts:dst:ift:2026q1-dst-lab:optimize-data-scrapping - owner: Mamoutou
- status: done
- start-date: 2026/02/02
- end-date: 2026/02/24
Description
Improve the monitoring system of the lab. Adjust scraping frequency and selected metrics. Design a storage solution to store the data so it can be retrieved if necessary. Optimize the storage to reduce used space if data is older than a certain period.
Deliverables
- vacp2p/vaclab-2#5
- Notion: Vaclab 2.0 - Full Software Stack#getting Metrics and Logs
- Code:
- Reports:
Lab health monitoring
- fully qualified name:
ift-ts:dst:ift:2026q1-dst-lab:lab-health-monitoring - owner: Mamoutou
- status: done
- start-date: 2026/02/02
- end-date: 2026/03/31
Description
Design a set of metrics/dashboards that can be used to monitor the health of the lab. The metrics should be useful to detect abnormal behaviors and to detect potential issues. What we want to achieve with this is to be sure that we can compare experiments from one week to a different week being confident that the results can be trusted, without the need of repeat the same experiment again.
Deliverables
- Published private Krew releases of the
kubectl-healthplugin for Linux, Windows, and macOS. - Added intra-node benchmarking and tainted-node exclusion support to the health plugin.
- Created Iperf3 daemonsets/cronjobs for host and pod network health checks.
- Added Prometheus/VictoriaLogs ingestion plus a network health dashboard with throughput and RTT baselines.
- Code:
- Update Control Plane components Location #23
- Reports:
- Grafana: Node Hardware Cluster node temperature dashboard
- Grafana: LATEST INTRA NODE THROUGHPUT Intra-node network benchmark panels
- Grafana: Cluster CPU temperature High alert
- Notion: Vaclab 2.0 - Health Monitoring
- Notion: Vaclab 2.0 - Lightweight Health Monitoring
Fleet workshop
- fully qualified name:
ift-ts:dst:ift:2026q1-dst-lab:fleet-workshop - owner: Mamoutou
- status: done
- start-date: 2026/02/05
- end-date: 2026/02/17
Description
Prepare a workshop to teach fleet basics to infra-team.
Deliverables
- Slides:
- Presentation:
- mamoutou-diarra/rancher-fleet-tuto Rancher Fleet Hands-On Tutorial