Hardware performance counters
Written by bcopos on May 26, 2015.
Most modern microprocessors have a set of special-purpose registers built to store the count of hardware events. Such counters can be used to fine-tune a program with respect to its performance.
Linux (starting with 2.6, I believe) has the Perf utility. Perf is a profiling tool which interacts with the appropriate kernel modules and programs these special registers to count hardware events as requested by the user. Perf is actually a great little tool which can measure a variety of events such as the number of instructions executed, branch instructions, and cache misses. Such measurements can give a program tester a good idea of what the program is doing at a very low level.
While experimenting with this nifty utility, I realized that some performance counters, such as the number of instructions executed, can vary quite significantly from one execution of a program to another. This phenomena has been noted before and there are academic papers discussing the reliability of hardware performance counters (e.g. "Can Hardware Performance Counter be Trusted?" by Weaver V.M. et. al and "Toward Accurate Performance Evaluation using Hardware Counters" by Mathur and Cook . However, in my personal experiences, the measurements of user-land events (e.g. user-land instructions) are quite reliable (compared to total, user-land + kernel, events) and vary insignificantly (couple instructions) between consecutive identical (same input) executions of a program. In the case of Perf, the user can specify to only measure user-land events by adding ":u" after the event type declaration (e.g. for measuring instructions, "perf stat -e instructions:u").
Another nice feature of Perf is the ability to sample during the execution of a program. By default, Perf will report the counter values at the end of the execution of the tested program. However, if it is of interest to gather statistics during the testing of the program, Perf can report the hardware event count every T seconds (where T is a value declared by the user). Take for example a program which, at the beginning of every execution, calls a procedure which does a random amount of work. Also, let's assume we are interested in measuring the number of instructions executed by the input handling function(s) of this program. If the total counter value is printed at the end of each execution, it will be hard to determine how much work is done by the function(s) of interest. On the other hand, if the hardware event is sampled every T seconds, we can execute the program, allow for the random amount of work to be executed, and gather the number of instructions executed between any two consecutive entering of input.
Overall, these hardware performance counters and the Perf utility are pretty interesting. In the near future, I will discuss how such performance counters can be used for more than just performance analysis.