Even more about CPUs: What is CPU Caching?
Well, the slowest steps in the fetch-execute cycle are those that require accessing memory (Englander, 2007). CPU caching is a technique developed to minimize the impact that accessing memory has on the overall processing performance of a CPU.
The technique involves placing a small amount (or multiple amounts) of high-speed memory between the CPU and main memory. This memory is referred to as cache memory, which contains copies of data stored in main memory. Because cache memory is generally located on the CPU itself, or in locations which are quicker to access than main memory, the technique serves to improve processing performance by reducing the number of trips across the memory bus required to access data stored in main memory. Briefly, this is how it works:
Cache memory differs from regular memory in that it is organized into blocks. Each block provides a relatively small amount of memory (generally 8 or 16 bytes) and containing copies of data from the most frequently used main memory locations. Each block also contains a tag with the address of the data in main memory that corresponds to the data being held in the block.
For each step in the fetch-execute cycle that requires accessing memory, the CPU first checks to see if the data exists in the cache before referencing main memory. The CPU accomplishes this through a component called the cache controller, which examines the tags to see if the address of the request is already stored within the cache.
If the CPU determines that the data is already stored in cache, it uses it as though it were stored in main memory; thereby, saving the performance cost of having to access memory across the memory bus. If the data is determined to not exist in the cache, the data is copied from main memory to the cache to potential later reuse.
When multiple separate amounts of cache memory are implemented, they are referred to as levels of cache. The level closest to cache controller, which is also generally the fastest to access, is referred to as L1 (for level 1). Subsequent levels of cache are referred to as L2 (level 2) and L3 (level 3); although rarely more than three levels of cache are implemented.
Unlike the L1 cache, which is almost always located on the CPU itself, subsequent levels of cache are generally, although not always, located on external chips. Although not as quick as accessing data from L1, accessing data from L2 cache (and L3 cache) is still faster than retrieving it from main memory.
Sequent level of cache (i.e. L2 and L3) work in the same manner as described above, but instead of the cache controller accessing data from main memory if a copy of it is not in the L1 cache, it first checks L2 cache (and then L3 cache if present) before doing so. One important note regarding higher levels of cache is that, for each level of cache added, each level much be able to contain more data than the one below it; otherwise, each level of cache would quickly become identical and therefore redundantly useless.
References:
Englander, I. (2003). The Architecture of Computer Hardware and Systems Software.
p206-208-209. Hoboken, NJ: Wiley.