When we touch the threshold to increase the performance through software, we go back to the hardware. With a fixed set of machines and cores, the only thing we can exploit is hyper-threading, and binding a specific process to a particular core and not allowing any other process to execute on that core. We often encounter this scenario in low latency trading systems domain where even micro seconds latency is not acceptable.
Here is a nice article to get a sense of how to bind a process to a CPU core :