PageTable covers computing at the deepest possible level, without turning to physics. The site covers topics such as quirks of the first ever CPU, the Intel 4004, copying disks on the C64 – quickly, using branch delay slots, and much, much more. The topics in themselves can be entertaining, as historical and nostalgic curiosities. However, the depth of the analysis turns them into lectures into how to do thing efficiently.
On a 1MHz CPU, wasting 10000 cycles per second means performance degradation. This eats 1% out of the complete performance of the system. On a multi-giga-Hertz system, with out-of-order execution and a super-scalar execution engine, not so much. Still, taking the extra time to optimize loops, calculating things intelligently, and not always rely on a dynamic list and for-loops can make a big difference.
For a more modern context, here and here is a list of instruction set manuals and ABI (Application Binary Interface, i.e. calling conventions, et cetera) collected by Thiago Macieria. This list covers IA-32, x86-64, IA-64, ARM (32 and 64 bits), MIPS (32 and 64 bits), POWER and SPARC. Not only do these documents carry loads on interesting information, they also serve as a reminder that even the very top notch systems of today, rely on the same basic mechanisms as their oldest and smallest relatives.