expanded the objectives & requirements part of the TDC (@Frederico)
codex:cdx
incorporated into Codex Litepaper all material about Codex on GitHub (@Frederico)
reviewed the causal loop diagragam for Codex (@Frederico)
reviewed the stock and flow diagram for Codex (@Frederico)
waku:rln-membership:
Prepare a summary of the RLN membership model including user journey mapping (@Martin)
Review the pricing of Farcaster, etc. (@Martin)
waku:general-incentives
Follow up with general research into Waku strategy based on the IFT strategy call. (@Martin)
status:SNT-staking
Continue the review of the staking contract (@Martin)
Understand the severity of precision loss (due to Solidity constraints) and resulting discrepancy between the contract logic and radCad simulations (@Martin)
Assist the SC team in further checks and definition of testing scenarios (@Martin)
Work on plotting module in the Kubernetes framework
Modified main yaml to add plotting options
Created plotter class to group there all functionalities
Structured plotter to be able to group several experiments in same plot in an automatic manner
Lots of calls with Wings to test the lab, launch simulations, discuss about problems and so on.
Deployed iBGP for Calico
Which got the IP addresses wrong at first, fixed by editing Node annotations
Later removed BGP due to numerous issues with it
Numerous, numerous Kubernetes tests and improvements
Tried Cilium briefly
Switched from Cilium to Calico
Reinstalled entire cluster as Calico transition broke things (due to CNI switch without reinstall being a bad idea)
Scale testing revealed that Linux has limits per node that prevent us from scaling beyond about ~1400 waku nodes per physical host when running on bare metal
Created a new architecture for running tests
Hybrid between bare metal and virtualised Kubernetes
Rook-Ceph (Storage) and Prometheus-Thanos (Metrics) stacks run on bare metal, as does all management
The rest runs in a KubeVirt based deployment system.
We deploy what we’re calling “opal fragments” (fractions of the Opal Kubernetes cluster) - Kubernetes workers dedicated solely to running nwaku deployments.
Can deploy 5000 nodes in < 8 minutes, with stable mesh forming around 25 minutes into deployment
Experimented with various opal fragment deployments - 56x nodes seems to be the most stable configuration
Much higher than this (especially with poor allocation of cores) causes instability in the CNI (Calico)
Which causes monitoring issues as nodes drop out of Prometheus monitoring
And can mess with the mesh
Instability is lower with lower # of connections
Debugging CoreDNS issues - believe we’ve found a bug in CoreDNS and its interactions with HeadlessServices, returning NXDOMAIN even for valid hostnames about 1 in 5.5 to 6 queries.
Ran repeated simulations to get a stable simulation for testing.
Built a new “accelerated bootstrap” mode for simulations
Merged PR-1027, PR-1028 and used TxTime sorting on SendPeerList. Additionally used semaphors to limit simultaneous transmissions. Improves results in some cases and shows large fluctuations in other messages
Configured shadow simulation for variable latency and bandwidths. Trying to build some automated scripts (requires adding edges among all peers, and adding all nodes with variable latency/bandwidth). NetworkX package in python can help writing network in gml format