The black square at the centre of an Arduino Uno is an ATmega328P. It's a complete computer, smaller than your thumbnail, that costs about two dollars. Once you know what's inside it, you'll write very different sketches — because suddenly every line of code maps to a piece of silicon you can point to on a die diagram.

One CPU, three memories

The CPU is an 8-bit AVR core. It executes one instruction per clock cycle (with a few exceptions), runs at 16 MHz, and has 32 general-purpose 8-bit registers. Compared to your laptop, this is a calculator. Compared to your microwave from 1985, it's a supercomputer.

It has three separate memories, which is the first surprise. Flash, 32 KB, is where your compiled sketch lives. It survives a power cycle. SRAM, 2 KB, holds variables and the stack while the sketch is running. It does not survive a power cycle. EEPROM, 1 KB, is small persistent storage you can write to from code — handy for config that has to survive a reboot.

The CPU reaches code and data through separate buses — what's called a Harvard architecture. Your const char *msg = "hello"; string lives in Flash, but the byte you read it back into lives in SRAM. There's a real distinction in the silicon. This is also why F("hello") exists in the Arduino library: it tells the compiler to leave the string in Flash and stream it from there rather than copy it into precious SRAM.

The peripherals do the real work

The CPU's job is mostly to configure peripherals, then read their results. The peripherals are dedicated circuits that handle the tedious parts: precise timing, analog conversion, asynchronous serial framing, I²C and SPI bus protocols. Each peripheral has a small set of registers — memory-mapped, just like RAM — that the CPU reads and writes to control it.

Take digitalWrite(13, HIGH). The Arduino library does a small dance to figure out which port pin 13 is on (PORTB, bit 5), then writes to a register called PORTB. That's it. One register write. If you're feeling fancy you can skip the library and do it directly:

PORTB |= (1 << 5);   // set bit 5: pin 13 HIGH
PORTB &= ~(1 << 5);  // clear bit 5: pin 13 LOW

That's two machine instructions. digitalWrite is dozens, because the library has to look up which port the pin belongs to at runtime. For tight timing loops, register-level access can be 10× faster.

Why the simulator can exist at all

The ATmega328P is fully documented — Atmel published a datasheet with every register, every instruction, every timing diagram. Because the chip is small and the rules are known, you can write a software model of it. That's what avr8js (the engine behind SimDuino) does: it executes AVR machine code cycle by cycle, in JavaScript, and tracks the state of every register and pin.

This is the moment when a microcontroller stops feeling like magic. Once you've seen the registers, the peripherals, and the bus diagram, you understand that an Uno is a small box of well-understood rules. There are no secrets. The same chip that drives a 3D printer can drive your blink, and it's the same six-clock-cycle path through the CPU either way.