 |
| Home |
| | News |
| | | Content |
| | | Other Stuff |
| |
|
|
|
 |
Comparing Performance - The ME6000 and M10000
Since we ran benchmarks for both the ME6000 and the M10000 it's only natural that we compare the two. The table below compares the tests for the ME6000 with the -Os optimization and the M10000 with the -O2 optimization.
| Mainboard |
M10000 |
ME6000 |
| Optimization Flag |
-O2 |
-Os |
| nbench Test |
Iterations/sec. |
| Numeric Sort |
330.77 |
138.45 |
| String Sort |
27.04 |
10.98 |
| Bitfield |
73,031,333 |
25,789,667 |
| FP Emulation |
17.73 |
9.97 |
| Fourier |
1,814.33 |
1,148.70 |
| Assignment |
5.00 |
1.72 |
| IDEA |
525.50 |
255.40 |
| Huffman |
356.95 |
143.22 |
| Neural Net |
3.36 |
1.57 |
| LU Decomposition |
227.45 |
95.43 |
|
|
|
| Baseline (MSDOS) Pentium 90, 256 KB L2-cache, Watcom compiler 10.0 |
|
|
| Integer Index |
10.74 |
4.50 |
| Floating Point Index |
5.08 |
2.53 |
|
|
|
| Baseline (LINUX) AMD K6/233, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 |
|
|
| Memory Index |
2.89 |
1.06 |
| Integer Index |
2.53 |
1.17 |
| Floating Point Index |
2.82 |
1.40 |
|
|
|
| lmbench Tests |
|
|
| Memory Bandwidth (MB/sec) |
|
|
| Read |
11,332.08 |
6,520.50 |
| Write |
11,258.97 |
3,523.05 |
| Read-Write |
6,746.23 |
2,287.17 |
| Copy |
6,404.34 |
3,295.68 |
| Proc Read |
3,529.57 |
1,903.59 |
| Proc Write |
2,105.25 |
991.85 |
| Proc Copy |
1,872.23 |
904.05 |
| mmap Read |
3,686.43 |
1,749.73 |
| mmap+o/c Read |
804.75 |
427.37 |
| file Read |
931.22 |
547.03 |
| file+o/c Read |
705.26 |
409.90 |
|
|
|
| I/O Bandwidth (MB/sec) |
|
|
| Pipe bandwidth |
160.98 |
84.31 |
| Sock stream bandwidth |
12,238.87 |
7,583.54 |
|
|
|
| TCP/IP connection (usec) |
134.67 |
136.27 |
|
|
|
| Context Switching |
|
|
| size=0k ovr=3.34 |
|
|
| 128 |
9.94 |
10.59 |
| 8 |
2.20 |
4.29 |
|
|
|
| File Creation (ops/sec for 1000 files) |
|
|
| 0k |
7,274.67 |
4,028.00 |
| 1k |
5,170.00 |
2,815.33 |
| 4k |
4,988.67 |
2,790.33 |
| 10k |
3,761.33 |
2,095.67 |
|
|
|
| CPU Operation Latency (nsec) |
|
|
| integer bit |
1.04 |
1.68 |
| integer add |
1.03 |
1.68 |
| integer mul |
7.11 |
14.80 |
| integer div |
54.71 |
93.40 |
| integer mod |
54.55 |
95.05 |
| int64 bit |
3.08 |
5.04 |
| int64 add |
1.03 |
1.69 |
| int64 mul |
36.14 |
56.68 |
| int64 div |
103.23 |
233.48 |
| int64 mod |
107.32 |
225.04 |
| float add |
7.07 |
9.41 |
| float mul |
8.10 |
12.34 |
| float div |
15.17 |
125.79 |
| double add |
7.08 |
9.41 |
| double mul |
8.09 |
12.36 |
| double div |
15.18 |
125.95 |
| float bogomflops |
32.37 |
173.72 |
| double bogomflops |
32.38 |
173.72 |
|
|
|
| Other Operations (usec) |
|
|
| Pagefaults on /root/test3 |
1.66 |
2.25 |
| Pipe latency |
8.17 |
12.83 |
| Procedure call |
0.02 |
0.07 |
| Process fork+exit |
155.90 |
312.41 |
| Process fork+execve |
4,662.83 |
6,581.00 |
| Process fork+/bin/sh -c |
6,756.67 |
9,503.33 |
| Select on 200 fd's |
19.21 |
51.37 |
| Select on 200 tcp fd's |
42.72 |
111.46 |
| Semaphore latency |
2.43 |
5.51 |
| Signal handler installation |
1.15 |
1.80 |
| Signal handler overhead |
4.27 |
3.98 |
| Protection fault |
0.56 |
1.40 |
| Simple syscall |
0.35 |
0.48 |
| Simple read |
0.83 |
0.97 |
| Simple write |
0.71 |
0.81 |
| Simple stat |
3.83 |
8.90 |
| Simple fstat |
1.02 |
1.54 |
| Simple open/close |
5.10 |
10.11 |
|
|
|
| Stream Operations |
|
|
| Copy latency (nsec) |
52.12 |
94.62 |
| Copy bandwidth (MB/sec) |
307.03 |
169.09 |
| Scale latency (nsec) |
51.79 |
68.40 |
| Scale bandwidth (MB/sec) |
308.95 |
233.91 |
| Sum latency (nsec) |
73.50 |
109.99 |
| Sum bandwidth (MB/sec) |
326.55 |
218.22 |
| Triad latency (nsec) |
73.13 |
127.53 |
| Triad bandwidth (MB/sec) |
328.18 |
188.19 |
|
|
|
| Benchmark Timing |
|
|
| Real Time |
06:38.96 |
06:48.91 |
| User Time |
04:42.52 |
05:07.33 |
| Sys Time |
00:18.34 |
00:59.72 |
|
|
|
| Kernel Size |
1,753,858 |
952,785 |
The Eden CPU of the ME6000 and the Nehemiah CPU of the M10000 differ in a few ways. First and most obvious the ME6000 CPU runs at just under 600mhz while the Nehemiah runs at 1000mhz. Further the floating point unit (FPU) of the Eden runs at half the clock speed while the FPU of the Nehemiah runs at the full clock speed. Given these differences we would expect non-floating point operations to be 40% faster and floating point operations to be 3 times faster on the Nehemiah.
The pure integer operations like bit, add, mul do show the difference expected by clock speed. However, looking at the table I don't believe clock speed alone fully explains what we see. For example the numeric sort and string sort, non-FPU operations, take only about 1/3 the time on the Nehemiah.
I don't know what accounts for these additional differences. I did note that the TLB size (reported by LMBench) is different for the Eden and Nehemiah CPU's. VIA documentation also mentions the difference in TLB between the processor cores. In the case of the Eden TLB varied between 28 and 36. For the Nehemiah it was consistently 64. TLB is the “translation lookaside buffer” and caches the most recently used virtual-to-physical address translations. This is a little deep for me but the article Optimizing the Idle Task and Other MMU Tricks discusses the impact cache and TBL hit rates can have on overall system performance. I'll leave this as an exercise for the reader.
Tim Copyright © by MagicITX All Right Reserved. Published on: 2005-02-02 (5631 reads) [ Go Back ] |
|  |
|
Don't have an account yet? You can create one. As a registered user you have some advantages like theme manager, comments configuration and post comments with your name.
|
|
There are currently, 2 guest(s) and 0 member(s) that are online.
You are Anonymous user. You can register for free by clicking here
|
|
|
 |