Benchmarks & Performance Improvements

The big reason to upgrade to a Pi 2 Model B over a classic Raspberry Pi Model B+ is the big boost in performance

The Pi 2 has 4 processors in one chip (the B+ has only one), an ARMv7 core vs an ARMv6, and 1 Gig of RAM vs 512 MB for the model B and B+

Those 3 improvements translate to pretty big performance increases!

 OK but how much faster is the Pi 2 vs the Model B+? While it strongly depends on what you're doing, you should see at least 85% improvement (single-core processes that just depend on the ARMv7 vs ARMv6 upgrade. For anything that can take advantage of multi-core processors, you can see up to 7x increase in speed!

Using the Pi as a computer feels fast and 'desktop like' - not sluggish! Particularly for developers, compiling code on the Pi 2 is 4x faster and the extra RAM helps a lot too, so most programs can now be compiled directly on the Pi. We still recommend our Pi Kernel-O-Matic for cross-compiling kernels since you need a lot of space & RAM

Compared to other Single-Board-Computers

You can see how the Pi 2 compares to the Arduino Yun / Beaglebone Black / Intel Galileo by checking out this earlier comparison guide (we'll be updating the guide shortly to add the Pi 2 numbers!)

We provide nbench numbers below that you can compare to the other computers until we update that tutorial...

nbench on Pi 2 @ 900MHz

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          444.24  :      11.39  :       3.74
STRING SORT         :          36.251  :      16.20  :       2.51
BITFIELD            :      1.2604e+08  :      21.62  :       4.52
FP EMULATION        :          69.824  :      33.50  :       7.73
FOURIER             :          4728.6  :       5.38  :       3.02
ASSIGNMENT          :          6.7648  :      25.74  :       6.68
IDEA                :          1297.9  :      19.85  :       5.89
HUFFMAN             :           654.5  :      18.15  :       5.80
NEURAL NET          :          6.2233  :      10.00  :       4.21
LU DECOMPOSITION    :          228.32  :      11.83  :       8.54
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 19.909
FLOATING-POINT INDEX: 8.599
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU ARMv7 Processor rev 5 (v7l)
L2 Cache            :
OS                  : Linux 3.18.5-v7+
C compiler          : gcc version 4.6.3 (Debian 4.6.3-14+rpi1)
libc                : libc-2.13.so
MEMORY INDEX        : 4.228
INTEGER INDEX       : 5.607
FLOATING-POINT INDEX: 4.769
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

nbench @ 950MHz

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          481.57  :      12.35  :       4.06
STRING SORT         :            37.6  :      16.80  :       2.60
BITFIELD            :      1.1826e+08  :      20.29  :       4.24
FP EMULATION        :            87.4  :      41.94  :       9.68
FOURIER             :            5126  :       5.83  :       3.27
ASSIGNMENT          :          7.6138  :      28.97  :       7.51
IDEA                :          1450.7  :      22.19  :       6.59
HUFFMAN             :          705.88  :      19.57  :       6.25
NEURAL NET          :          6.3669  :      10.23  :       4.30
LU DECOMPOSITION    :          242.49  :      12.56  :       9.07
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 21.639
FLOATING-POINT INDEX: 9.082
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU ARMv7 Processor rev 5 (v7l)
L2 Cache            :
OS                  : Linux 3.18.1-v7+
C compiler          : gcc-4.7
libc                : libc-2.13.so
MEMORY INDEX        : 4.359
INTEGER INDEX       : 6.341
FLOATING-POINT INDEX: 5.037
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.

For comparison-geeks, note that if you overclock the Pi 2 to 900-1000 MHz it's essentially the same processing speed as a BeagleBone Black (also an ARMv7), but with the improved Floating Point capabilities. There's a lot of reasons to go with a BBB vs Pi2 so please note it's not that the Pi 2 is a 'replacement' for the BBB!

Sysbench tests (Compared to Pi B+)

Sysbench is a linux program that can do raw computational tests. It's a pure-math test, but will tell you the 'upper bound' for speed and is good for general comparison.

Running on a B+ @ 700MHz with one thread, we get:

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          523.7819s
    total number of events:              10000
    total time taken by event execution: 523.7231
    per-request statistics:
         min:                                 51.99ms
         avg:                                 52.37ms
         max:                                 54.81ms
         approx.  95 percentile:              53.54ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   523.7231/0.00

And for 4 threads:

Running the test with following options:
Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          523.1061s
    total number of events:              10000
    total time taken by event execution: 2091.9841
    per-request statistics:
         min:                                162.66ms
         avg:                                209.20ms
         max:                                252.29ms
         approx.  95 percentile:             232.33ms

Threads fairness:
    events (avg/stddev):           2500.0000/1.22
    execution time (avg/stddev):   522.9960/0.04

Note that both tests take 523 seconds, because the B+ is a single-core processor, there is no improvement for having 4 threads vs 1 (all 4 threads are one one processor)

In comparison, the Pi 2 at 900 MHz has for a single thread:

Running the test with following options:
Number of threads: 1

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          298.6816s
    total number of events:              10000
    total time taken by event execution: 298.6632
    per-request statistics:
         min:                                 29.64ms
         avg:                                 29.87ms
         max:                                 44.60ms
         approx.  95 percentile:              32.14ms

Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   298.6632/0.00

298 seconds vs 523 for a single thread, so even without taking advantage of multicore, there's a 523/298 = 75% increase. That's nearly double just by having a ARMv7 doing the computation

If running with 4 threads, one on each processor, we see another big improvement

Number of threads: 4

Doing CPU performance benchmark

Threads started!
Done.

Maximum prime number checked in CPU test: 10000


Test execution summary:
    total time:                          76.1168s
    total number of events:              10000
    total time taken by event execution: 304.4156
    per-request statistics:
         min:                                 29.65ms
         avg:                                 30.44ms
         max:                                 63.32ms
         approx.  95 percentile:              34.97ms

Threads fairness:
    events (avg/stddev):           2500.0000/7.38
    execution time (avg/stddev):   76.1039/0.01

because we could split the work over 4 cores, we sped up 4x to 76 seconds.

Compared to a model B+, the Pi 2 is at most 7x faster when using multi-threaded/core computation!

Web performance (Compared to Pi B+)

When using the Pi 2 for desktop use such as running Scratch, minecraft, or web-browsing, it feels much faster. But, that's pretty subjective and we wanted to have some Real Numbers for comparison so we ran a few web-browser Javascript tests.

Javascript is fairly processor-intensive and runs a huge amount of the interactivity of websites, so speedy Javascript will translate directly to speedy browsing!

The first test we did is called Octane, you can run it by visiting here - it runs in your webbrowser and does a series of tests.

On a B+, we actually couldn't get the test to finish without crashing, but before it crashed we got the following:

Compare to the Pi 2 which did at least finish and gave us these numbers:

Higher numbers are better in this case

You can tell that depending on the tests, the Pi 2 is at least 2x as fast, and in most cases is 4x as fast.

SunSpider (Compared to Pi B+)

Another test you can run is called SunSpider, it's also a Javascript benchmarker. Here's the results from running it on a B+

        ============================================
        RESULTS (means and 95% confidence intervals)
        --------------------------------------------
        Total:                  9477.4ms +/- 0.4%
        --------------------------------------------

          3d:                   1657.4ms +/- 0.9%
            cube:                552.5ms +/- 0.5%
            morph:               316.1ms +/- 0.7%
            raytrace:            788.8ms +/- 1.8%

          access:                482.1ms +/- 0.6%
            binary-trees:         80.6ms +/- 1.6%
            fannkuch:            203.0ms +/- 1.3%
            nbody:               133.7ms +/- 0.8%
            nsieve:               64.8ms +/- 3.2%

          bitops:                225.2ms +/- 0.3%
            3bit-bits-in-byte:    20.1ms +/- 1.1%
            bits-in-byte:         38.5ms +/- 2.4%
            bitwise-and:          51.1ms +/- 1.5%
            nsieve-bits:         115.5ms +/- 0.7%

          controlflow:            74.0ms +/- 1.2%
            recursive:            74.0ms +/- 1.2%

          crypto:                647.4ms +/- 2.9%
            aes:                 337.7ms +/- 2.0%
            md5:                 171.7ms +/- 5.4%
            sha1:                138.0ms +/- 3.6%

          date:                 1503.9ms +/- 0.6%
            format-tofte:        784.9ms +/- 0.8%
            format-xparb:        719.0ms +/- 0.8%

          math:                  431.4ms +/- 1.7%
            cordic:              104.7ms +/- 2.0%
            partial-sums:        238.2ms +/- 2.4%
            spectral-norm:        88.5ms +/- 2.6%

          regexp:                174.8ms +/- 1.2%
            dna:                 174.8ms +/- 1.2%

          string:               4281.2ms +/- 0.5%
            base64:              208.6ms +/- 1.6%
            fasta:               466.5ms +/- 3.7%
            tagcloud:            711.4ms +/- 0.9%
            unpack-code:        2436.8ms +/- 0.4%
            validate-input:      457.9ms +/- 0.9%

And running on a Pi 2:

============================================
RESULTS (means and 95% confidence intervals)
--------------------------------------------
Total:                 2476.9ms +/- 0.7%
--------------------------------------------

  3d:                   499.8ms +/- 2.6%
    cube:               141.7ms +/- 1.4%
    morph:              150.5ms +/- 6.1%
    raytrace:           207.6ms +/- 2.1%

  access:               190.6ms +/- 1.0%
    binary-trees:        24.8ms +/- 1.2%
    fannkuch:            90.8ms +/- 1.0%
    nbody:               48.1ms +/- 2.2%
    nsieve:              26.9ms +/- 3.9%

  bitops:               100.6ms +/- 1.3%
    3bit-bits-in-byte:    8.4ms +/- 4.4%
    bits-in-byte:        19.4ms +/- 1.9%
    bitwise-and:         25.8ms +/- 1.2%
    nsieve-bits:         47.0ms +/- 2.0%

  controlflow:           25.6ms +/- 3.0%
    recursive:           25.6ms +/- 3.0%

  crypto:               194.5ms +/- 1.3%
    aes:                 99.6ms +/- 0.7%
    md5:                 52.3ms +/- 4.2%
    sha1:                42.6ms +/- 0.9%

  date:                 303.5ms +/- 1.2%
    format-tofte:       154.4ms +/- 0.8%
    format-xparb:       149.1ms +/- 1.7%

  math:                 141.5ms +/- 1.0%
    cordic:              39.1ms +/- 1.0%
    partial-sums:        69.9ms +/- 1.0%
    spectral-norm:       32.5ms +/- 1.6%

  regexp:                90.0ms +/- 0.7%
    dna:                 90.0ms +/- 0.7%

  string:               930.8ms +/- 0.7%
    base64:              57.6ms +/- 1.0%
    fasta:              141.2ms +/- 0.7%
    tagcloud:           160.7ms +/- 0.6%
    unpack-code:        460.6ms +/- 1.0%
    validate-input:     110.7ms +/- 1.0%

In this case, lower numbers are better. Again, you can see that all tests are at least 2x faster on a Pi 2 vs a B+ and most are about 4x faster!

Other tests!

OK we'll be doing more tests, but one thing we did get going was playing around with emulators. Of course the Pi 2 is much speedier than the B+ and by overclocking to 900 MHz we could run pcsx (playstation 1 emulator) and Crash Bandicoot at full speed with HDMI audio! Simply download, build and run as per this tutorial.

Last updated on 2015-05-04 at 04.27.56 PM Published on 2015-02-02 at 03.04.56 AM