New Android Benchmark: How many flops can your phone REALLY do
There are many ways to benchmark an Android device but the one I have always been most interested in is the Linpack for android benchmark by GreeneComputing. The Linpack benchmarks have been used for many years by supercomputer builders to compare computational muscle and they form the basis of the Top 500 list of supercomputers.
Linpack measures how quickly a machine can solve a dense n by n system of linear equations which is a common task in scientific and engineering applications. The results of the benchmark are measured in flops which stands for floating point operations per second. A typical desktop PC might acheive around 50 gigaflops (50 billion flops) whereas the most powerful PCs on Earth are measured in terms of petaflops (Quadrillions of flops) with the current champion weighing in at 16 petaflops, that’s 16,000,000,000,000,000 floating point operations per second–which is a lot!
Acording to the Android Linpack benchmark, my Samsung Galaxy S2 is capable of 85 megaflops which is pretty powerful compared to supercomputers of bygone eras but rather weedy by today’s standards. It turns out, however, that the Linpack for Android app is under-reporting what your phone is really capable of. As the authors say ‘This test is more a reflection of the state of the Android Dalvik Virtual Machine than of the floating point performance of the underlying processor.’ It’s a nice way of comparing the speed of two phones, or different firmwares on the same phone, but does not measure the true performance potential of your device.Put another way, it’s like measuring how hard you can punch while wearing huge, soft boxing gloves.
Rahul Garg, a PhD. student at McGill University, thought that it was high time to take the gloves off!
rgbench – a true high performance benchmark for android devices
Rahul has written a new benchmark app called RgbenchMM that aims to more accurately reflect the power of modern Android devices. It performs a different calculation to Linpack in that it meaures the speed of matrix-matrix multiplication, another common operation in sicentific computing.
The benchmark was written using the NDK (Native Development Kit) which means that it runs directly on the device rather than on the Java Virtual Machine, thus avoiding Java overheads. Furthermore, Rahul has used HPC tricks such as tiling and loop unrolling to squeeze out the very last drop of performance from your phone’s processor . The code tests about 50 different variations and the performance of the best version found for your device is then displayed.
When I ran the app on my Samsung Galaxy S2 I noted that it takes rather longer than Linpack for Android to execute – several minutes in fact – which is probably due to the large number of variations its trying out to see which is the best. I received the following results
- 1 thread: 389 Mflops
- 2 threads: 960 Mflops
- 4 threads: 867.0 Mflops
Since my phone has a dual core processor, I expected performance to be best for 2 threads and that’s exactly what I got. Almost a Gigaflop on a mobile phone is not bad going at all! For comparison, I get around 85 Mflops on Linpack for Android. Give it a try and see how your device compares.
Links
- RgbenchMM on GooglePlay
- Prelim Analysis of RgbenchMM – Some of the in-depth details of the benchmark, written by the app’s author.
- Supercomputers vs mobile phones
Ooh a digital gadget pissing contest, I’m game! Here are my results:
Linpack single thread: 47.018 Mflops
Linpack multi thread: 70.572 Mflops
RgbenchMM 1 thread: 395 Mflops
RgbenchMM 2 threads: 1025 Mflops
RgbenchMM 4 threads: 961 Mflops
Yay I got the Gflops barrier broken! Now I can reveal that I have a… Samsung Galaxy S2!
Interesting. There seems to be a lot of variation between runs on both benchmarks. My S2 gives 76->90 Mflops in Linpack multi-thread. I ran RgbenchMM several times more in 2 threads and got 960, 1027,842 and 940. Happy that I broke the Gflop barrier too…once :)
My results for RgbencgMM on a 16GB Nexus 7:
1: 321.0 MF
2: 869.0 MF
4: 1519.0 MF
My results on a Samsung Galaxy S3 are quite similar to the Nexus 7 (and it’s almost as the same size too! ;-) )
1: 336 MF
2: 884 MF
4: 1585 MF
(using RgbencgMM)
Well, I was quite surprised to see the results of my HTC Sensation XE, since it did not perform outstandingly well in other available benchmarks.
Here are ratings from RgBenchMM
1: 593 Mflops
2: 1203 Mflops
4: 1226 Mflops
Linpack gives:
Single thread 45.414 Mflops
Multi thread 77.228 Mflops
Samsung Galaxy Note:
Here are ratings from RgBenchMM
1: 496 Mflops
2: 1213 Mflops
4: 1106 Mflops
Results of my MP-MFLOPS.apk benchmark on a Nexus 7 indictate up to 643, 1193, 2355 and 2178 MFLOPS using 1, 2, 4 and 8 threads. Results and download options are in the following for this and 7 other MP benchmarks.
http://www.roylongbottom.org.uk/android%20multithreading%20benchmarks.htm#anchor3
Single thread benchmarks, mainly using native code including Linpack, are available from:
http://www.roylongbottom.org.uk/android%20benchmarks.htm
All free. [Nexus 7 Linpack 100 DP – 56 MFLOPS Java, 151 MFLOPS native, SP 201 MFLOPS]. For real comparison with supercomputers, see my Livermore Loops benchmark.
My Sony Xperia SL LT26ii shows:
1: 617 mflops
2: 1270 mflops
3: 1285 mflops
HTC One :
1 : 882 mflops
2 : 1906 mflops
4 : 2800 mflops
Samsung Note II:
1: 517 MFlops
2: 1325 MFlops
4: 2564 MFlops
Not too shabby… Anyone got an S4 to test? :)
Samsung Galaxy S3 @ 1.6 Ghz running ParanoidAndroid 3.65 [4.2.2] and Googy-Max-1.6.8 kernel.
1 : 514 MFLOPS
2 : 1297 MFLOPS
3 : 2642 MFLOPS
This is still about 1/4 the theoretical performance[1.5 * 4 * 1.6 = 9.6 GFLOPS] of a Cortex A-9 which does 1.5 DP FLOPS/clock with 4 cores at 1.6 Ghz.
Compared to my PC, which does about 120 GFLOPS [2500k @ 4.5 Ghz] it is pretty solid for a handheld mobile device.
Samsung Galaxy S4, 4 threads, Stock Jellybean 4.2:
3086Mflops – definitely impressed
Samsung Galaxy Note 3, 4 threads. Stock Android 4.3
4471 Mflops..which is a lot!
Samsung S21 FE 5G 8/256GB
matrices 250 – 250
step 0
iterations 5
1 thread – 1840 Mflop/s
2 threads – 1950 Mflop/s
4 threads – 1500 – 1800 Mflop/s
I don’t understand why these are so weak..
matrices 500 – 500
step 0
iterations 5
1 thread – 1780 Mflop/s
2 threads – 2530 Mflop/s
4 threads – 2800 – 3400 Mflop/s
Let’s go big
matrices 3000 – 3000
step 1
iterations 10
1 thread – 1712 Mflop/s
2 threads – 2425 Mflop/s
4 threads – 2787 Mflop/s
and
8 threads – 1175 Mflop/s
Tell me what is going on…