#385 Custom Virtual Thread Schedulers, CPU Cache Optimization and Work Stealing
break dancing and basketball including meeting Kobe Bryant in Italy during a dunk competition, using AI coding assistants like Claude Opus 4.5 and GitHub bots for infrastructure setup and CI/CD pipeline configuration, limitations of LLMs for novel performance-sensitive algorithmic work where training data is scarce, branchless IPv4 parsing optimization as a Christmas coding challenge, CPU branch misprediction costs when parsing variable-length IP address octets, converting branching logic into mathematical operations using bit tricks for better CPU pipeline utilization, LLMs excelling at generating enterprise code based on well-documented standards and conventions, providing minimal but precise documentation and annotations to improve LLM code generation quality, the Boundary Control Entity BCE architecture pattern and standards-based development, the core problem of thread handoff between event loops and ForkJoinPool worker threads in frameworks like quarkus Vert.x and Micronaut, mechanical sympathy implications of cross-core memory access when serialized data is allocated on one core and read by another, CPU cache coherency costs and last-level cache penalties when event loop and worker pool run on different cores, the custom virtual thread scheduler project (netty-virtual-thread-scheduler) enabling a single platform thread to handle both networking I/O and virtual thread execution, approximately 50% CPU savings demonstrated by Micronaut when using unified Netty-based scheduling, collaboration with Oracle Loom team including Victor Klang and Alan Bateman on minimal scheduler API design, the scheduler API consisting of just two methods onStart and onContinue plus virtual thread task attachments, work stealing algorithms and their complexity including heuristics similar to Linux CFS scheduler, the importance of being declarative about thread affinity rather than automatic magical binding to avoid issues with lazy class loading and background reaper threads, thread factory based approach for creating virtual threads bound to specific platform threads, stream-based run queues with graceful shutdown semantics that fall back to ForkJoinPool for progress guarantees, thread-local Scoped Values as a hybrid between thread locals and scoped values for efficient context propagation, performance problems with ThreadLocal including lazy ThreadLocalMap allocation overhead on virtual threads and scalability issues with ThreadLocal.remove() and soft reference queues, the impact on reactive programming where back pressure and stream composition still require higher-level abstractions beyond Basic Java concurrency primitives, structured concurrency limitations for back pressure scenarios compared to reactive libraries, deterministic testing possibilities enabled by custom schedulers where execution order can be controlled, the poller mechanism for handling blocking I/O in virtual threads in a non-blocking way, observability improvements possible through virtual thread task attachments for monitoring state changes, cloud cost implications of inefficient thread scheduling and unnecessary CPU wake-up cycles, the distinction between framework developers and application developers as different user personas with different abstraction needs
Francesco Nigro on twitter: @forked_franz