Performance Enhancement by Cache Memory and Registers
Necessity is the mother of invention. The necessity of doing fast calculations led to the invention of computers. From the day one to today it has been in the process of improvement. This improvement includes fast computing on the first place besides wanting to have a user-friendly view, advanced applications, etc. All the operations being carried out in computer are dependent on the processor, therefore to enhance the performance and do fast computing it is needed to have a look on the operations performed in the processor i.e. Central Processing Unit of computers. In this paper the discussion is based on the functional responsibilities of a cache memory and registers. With that an elucidation of performance of system will also be shown.
Processor of a system is supposed to be the brain of computer and processes the inserted data to information in its readable form i.e. binary. It is an embedded circuit in computer that operates on quartz crystals that help in generating peaks, from pulses or peaks got from interaction of quartz crystals and electric current. An internal clock operates this function of generating pulses which makes the clock speed when measured on each second, hence making the clock frequency which is a multiple of system frequency. The circuit executes an instruction in terms of average number of clock cycles from which CPI i.e. Cycles per Instructions is measured. Processor’s power is number of instructions per second and processor’s frequency when divided by CPI gives the CPU Power in unit of MIPS i.e. million instructions per second (Kioskia).
An instruction is the process of conversion of data into readable binary information performed by the processor. Instruction is an operation to perform, hence consists of operation code and operand code. Instruction size may vary from 1 bit to 4 bytes and categorizes itself into the groups of memory access, arithmetic operations, logic operations, control. Instructions execution process needs small temporary memory spaces local to processor. These local memory spaces are called registers which are found in sizes of 8, 16, 32 or 64 bits. Cache memory or buffer memory is another small memory local to processor and helps in fast processing of data. As instruction is stored in RAM every time to wait for its turn to be executed as main memory is slower in the processor. By having a cache memory between the processor and main memory, the overhead is shared; primary data is stored in the cache memory to be processed. There are level 1 cache, level 2 and level 3 cache memories. L1 is incorporated directly with processor comprises of instruction cache and data cache. L2 cache is faster than main memory but slower than L1 in a way that it is located into the chip with the processor. L3 cache is located on motherboard (Kioskia).
Basically the Cache memory and its levels help to compute instruction rapidly. Latency time is reduced while performing any process or transfer of information with other memories. Interfacing is done on the basis that L1 interfaces with L2 and L2 interfaces with random access memory or simply say L3 interface which helps in transfers and processes, and the overall processor general jobs are also not disturbed (Kioskia).
Registers used in a processor are used for fast execution of an instruction. The most significant registers are accumulator registers which handles the arithmetic operations. Status registers are used for showing the status of operations being carried out like carry and overflow statuses. Instruction registers caters the instruction under execution. The ordinal counter is the one that holds the address of next instruction and automatically redirects on completion of execution. Buffer register is the one that takes the data from primary memory and stores it temporarily so that while execution of a statement it does not have to be called all the way from main memory.
The techniques used for performance enhancement include pipelining. This technique lets the instructions execute in sequence of steps to process an instruction known as instruction cycle or fetch/execute. Basically fetch refers to the fetching of a binary instruction from main memory to instruction register to be decoded. Execute refers to the process of instruction execution in whatever operation an instruction has requested (Bramer).
Performance enhances when CPU performs pre-fetching of an instruction that is sequence of instructions in main memory known as pipelining. In pipelining when the instruction crosses one-step of its execution, the next instruction is fetched at that very moment. This reduces the execution time in a way that if a program has five lines of code and if instruction execution has five steps and each step takes one second, in normal situation it would take 25 seconds to execute the whole program but due to the pre-fetch action it took only nine seconds to execute the same program (Bramer).
Elaborating it more, when the execution of an instruction starts, instruction register fetches the binary form or machine code operation word and the next words as well and generates a queue of another instruction’s binary machine format, which starts being decoded the first step of pipelining as soon as last instruction leaves the decode slot. Processors find it very useful technique which is optimized yet simple having no overheads. This technique of instruction passing between the registers is being used by powerful processors. The steps of pipelining include decoding, address evaluation, to obtain operand to work on, and parallel execution of instruction not one by one but simultaneously. Influential processors use this extensively using the set of lengthier and extended instructions. Pipelining came out with certain problems like how to cater other different kinds of instruction e.g. branch etc. which were solved later on (Bramer).
Cache memory is known for the fast computing of any instruction. An observable fact it uses is the locality of reference idea which is supposed to be keeping some data for use from the main memory, this way on time of execution of certain instruction the data would be in a locality that is local to the processor. There are also the disk caches extensions to cache. There is a hardware disk cache and software disk cache. The software disk cache is maintained by certain disk drivers or operating systems to accomplish caching tasks. Hardware disk cache is self-regulating and free from the control of CPU. It has its own random access memory and can easily control circuits. Apart from being costly, hardware disk caches are good enough for enhanced process needing a complicated disk controller (Bramer).
As the discussion is on for enhanced performance of processors using cache and registers, another approach is pointing towards the architecture of a processor. RISC or Reduced Instruction Set Architecture used large on-chip caches and increased number of processor registers reducing the number of instructions with a degree of magnitudes. The procedure of manipulations is the data decoded and processed by storing them on registers. Two instructions hold the right of changing or forwarding the data and circulating it between memory and registers. This approach was found to be pretty simple in a sense that all the operations have to be passed through any of these instructions only and none else making the code easier limiting the instruction set. There has been a fixed length defined for execution in the pipeline but with the reduced instruction set architecture the instructions used to be examined and some of them are executed in parallel and almost hardwired control unit (Bramer).
For performance boost it is assumed that larger the cache size enhanced will be the performance. The first processor with L2 cache had the size of about 256 kilobytes or 512 kilobytes so it had a considerable amount of performance benefits over some other processors where the cache memory was embedded into the motherboard. There is a huge variety of processors with small cache and large caches. For many years, it did not make any difference whether they have bigger cache or smaller, obviously any processor with smaller one would not have been that expensive. The latest range of processors has one which is offered in three diverse cache savors. Another processor came up with 256 kilobytes cache in its first version and with 512 kilobytes. Then in some processors 1-megabyte cache was introduced and 2 megabytes in the second version (Schmid, Does Cache Size Really Boost Performance?:What Is The Impact Of Cache Size?).
To know whether the latest processors are offering larger caches for performance’s sake or only for marketing purpose one should consider the performance parameters and its boundaries. It is to be noted that cache size is not the only performance factor to be analyzed but it should be kept in consideration. Cache has a purpose to reduce access of main memory as much as possible by applying ways like buffering data. For a better sizeable performance 256 KB to 512 KB is enough while market is selling the systems with the cache size up to 8 MB. Therefore it infers that one does not have to run after increment in units rather it should be considered whether it is worthwhile or not (Schmid).
Looking at some other ways to access the memory in a fast manner providing the effective time besides pipelining, pre-fetching, enlarging the cache size which is just an illusion, etc. We have a backward jump technique to enhance the performance speed wise. This is applied by employing the backward jump on an instruction under the loop in calculating the outcome of some conditional statement. This becomes useful because if we would jump to the last iteration of the loop, the same would be the instruction where the condition of the loop fails to be met and hence it breaks (Gillard).
Another common approach is to have self-regulating spots in memory called banks where neighboring memory word get a place in different memory banks, so that numerous collections of data and address lines would connect to those banks. This type of system of memory is referred to as interleaved memories. Such system gives way to access neighboring memory locations or banks simultaneously, overlapping time to reduce overall execution time. With the order of two, this phenomenon of interleaved memory is adopted. This needs to have a bigger cache size to draw data from every bank (Gillard).
Faster access to main memory has affected the performance greatly. And memory management and hierarchy are supposed to be one of the most crucial issues in the performance of processors. The reason is raising breach between the speeds of memory and processor. Being measured in terms of clock cycles, memory access time has been growing continually sluggishly. The use of on-chip cache memory has fought back against this problem very effectively. Pipelining has always been helpful in sorting out the matter. Although RISC load-store structure is also found to be great in handling the issue but it has put some overhead on instruction fetching, decoding, dependence checking logic and condition checking operations (Postiff).
Registers also proved to be used for quick access holding frequently used values which are scalar. The main memory or RAM which is slower in magnitude as compared to registers with the order of two almost. This is the reason why it needed to overcome by-value operations with by-reference operations using registers in order to trim down the number of memory operations in a program. Cache was incorporated in memory to pass the data between memory and registers effectively by acting as buffers. This act diminishes the impact of slow memory speeds. The registers apart from giving complicated program code also provides the shorter access time and shorter instruction coding because it mainly requires direct operations, being accessed with direct address. So we have the merits and also the demerits of caches and registers but they both are amongst those foundations which helped processors in functioning with speed, accuracy, and stability (Postiff).
Bramer, B. Processor Performance Enhancement Techniques . 24 September 1995. 13 March 2009 <http://www.cse.dmu.ac.uk/~cfi/Networks/WorkStations/Workstations4.htm>.
Gillard, P. Other methods for fast memory access. 24 November 1997. 13 March 2009 <http://www.cs.mun.ca/~paul/cs3725/material/web/notes/node7.html>.
Kioskia. Processor. 16 October 2008. 13 March 2009 <http://en.kioskea.net/contents/pc/processeur.php3>.
Postiff, M. A. Compiler and Microarchitecture Mechanisms for Exploiting Registers to Improve Memory Performance. Michigan, 2001.
Schmid, P. Does Cache Size Really Boost Performance?:What Is The Impact Of Cache Size? 24 October 2007. 13 March 2009 <http://www.tomshardware.com/reviews/cache-size-matter,1709.html>.
Schmid, P. Large Caches: Performance Or A Business Decision? 24 October 2007. 13 March 2009 <http://www.tomshardware.com/reviews/cache-size-matter,1709-2.html>.