Dynamic Branch Predictor (BTB + RAS + Gshare)

The dynamic branch predictor operates primarily in the instruction fetch stage, where it attempts to determine the next program counter (PC) before the current instruction has been fully decoded or executed. Its goal is to minimize pipeline invalidations by predicting things before the instruction is fully fetched:

Which part of the fetched word contains control-flow instructions.
Where these control-flow instructions would jump (if taken).
The likelihood of a conditional control-flow instruction to be taken.

When a new PC enters the fetch stage, the predictor first looks up the branch target buffer (BTB). The BTB is indexed using bits of the current PC and acts as a cache of previously seen branch instructions. If there is no matching entry (a BTB miss), the processor assumes the instruction is not a branch and continues fetching sequentially. If there is a BTB hit, the predictor proceeds with branch prediction.

At this point, the Gshare predictor is used to determine the likelihood of the branch to be taken. Gshare combines the current PC with the global history register (GHR) using an XOR operation to form an index into the pattern history table (PHT). The PHT contains 2-bit saturating counters that represent the likelihood of a branch being taken or not taken. Based on the value of the selected counter, the predictor decides whether the branch is predicted as taken or not taken.

If the branch is predicted as not taken, the processor continues fetching the next sequential instruction. If it is predicted as taken, the predictor must also determine the correct target address. For most branch and jump instructions, this target is obtained directly from the BTB entry. However, if the instruction is identified as a return function, the predictor instead uses the return address stack (RAS). The RAS maintains a stack of return addresses corresponding to previously executed function calls. When a return instruction is encountered, the top entry of the RAS is used as the predicted target address, allowing accurate prediction of return control flow.

Once the prediction decision is made, the fetch stage immediately updates the PC to the predicted next address, which can be either the sequential PC or the predicted target. This allows the pipeline to continue fetching instructions without waiting for the branch to be resolved in later stages.

As the instruction progresses through the pipeline and reaches the execution stage, the branch's actual outcome is determined. The processor is aware of whether the branch was taken, and what the correct target address is. This result is compared with the earlier prediction.

If the prediction is correct, execution continues without interruption. If the prediction is incorrect, the pipeline must be flushed and the correct PC inserted into the fetch stage. This introduces a performance penalty proportional to the pipeline depth.

After the control-flow instruction is resolved and committed, the predictor structures are updated to improve future accuracy. The Gshare predictor updates its PHT entry by incrementing or decrementing the corresponding 2-bit counter based on the actual outcome, and the GHR is shifted to include the latest branch result. The BTB is updated when the branch is taken, ensuring that future encounters with the same PC have a valid target entry. For function calls and returns, the RAS is updated in the fetch stage, directly from predictions; return addresses are pushed onto the stack for predicted calls and popped on predicted returns. In the event of a misprediction involving function calls or returns, the RAS may get out of synchronization with all previous calls.

Through this coordinated operation of BTB, Gshare, and RAS, the predictor enables the processor to make early, accurate decisions about control flow, significantly reducing pipeline invalidations and improving overall performances.