13

This question is inspired by this article.

Currently I spend a lot of time in R to analyze data. Some of the scripts I run in R could benefit from parallel computing in order to save time. Let's say I could build a barebones "real" computer for maybe ~$300 and for that same $300 I could get 6 Pis (including power cable and SD card). Assuming I had a task that was well suited for parallelization would I be better with the Pis or the "real" computer?

Would the answer change if I ramp up the "real" computer's hardware to something that may cost $1000 how would that fare against 20 Pis?

Ghanima
  • 15,855
  • 15
  • 61
  • 119
Dean MacGregor
  • 233
  • 1
  • 2
  • 6

4 Answers4

10

If you want to analyse this, you have to step beyond saying 20 rpis vs. $1000 and decide for yourself what you can actually get and use for $1000. Note that this is not the shopping channel.

Let's say you can get a motherboard, power supply, 8 GB ram, a 6-core 3.4 Ghz i7 processor, and some old hard drive (still faster than an SD card!) for ~$1000. The total cycles per second would be 6 * 3.4e9 = 20.4e9, vs. 20 * 0.7e9 = 14e9 for the pi supercomputer.

Now consider the fact that a multi-core processor is a multi-core processor, whereas the I/O betwixt pis (if this becomes a significant factor) is going to be orders of magnitude slower.

While I don't have a serious interest in (or knowledge of) pi supercomputers, I would assume they are for experimentation and educational purposes. It's definitely a cheap cluster, but if your goal is to crunch numbers fast (as opposed to experimenting and educating), there's no way a pi cluster it is going to be more cost or energy effective than a normal PC.

goldilocks
  • 58,859
  • 17
  • 112
  • 227
  • Thanks, "The total cycles per second would be 6 * 3.4e9 = 20.4e9, vs. 20 * 0.7e9 = 14e9 for the pi supercomputer." is basically what I was after but didn't know what to look for. – Dean MacGregor Jan 27 '14 at 19:40
  • One Pi only uses 5 watts. This mean that 20 pis only use 100 watt, which is less than the PC you describe so is is more cost and energy efficient (but with less CPU :) ) – Thorbjørn Ravn Andersen Jan 27 '14 at 21:10
  • @ThorbjørnRavnAndersen from my experience, 8 Core i7 + SSD drive + 16GB of RAM use about 50-70Wt depending on the processor load in a headless (sans the monitor) installation. – lenik Jan 27 '14 at 23:40
  • When you use RPi2's, the amount of cycles would increase to 20 * 4 * 0.9e9 = 72e9, with overclocking you can even reach 20 * 4 * 1.1e9 = 88e9 cycles per secound. Now, with this increase of the pi's computational power, do you think that it would be worth to build a RPi-Cluster? – Sirac Dec 31 '15 at 16:54
  • @Sirac That is a good point -- this was written before the Pi 2 came out. I personally still do not think it is pragmatically worthwhile because 1) The max 100 Mbps connections are a serious bottleneck, 2) I believe even given an even number of cycles per second, an x86-64 processor may be 5-10 times faster than an ARMv7 processor. The difference is in the instruction set, and that an ARM processor is designed for low power (and low cost), whereas the Intel Core processors are more dedicated to speed... – goldilocks Dec 31 '15 at 17:10
  • ...I can compile a kernel on a big quad core desktop in ~5 minutes and still use the GUI responsively; I am guessing it would take an hour or so on a Pi 2. But I could be wrong -- perhaps this is an intuitive prejudice. Of course, there are reasons other than the pragmatic to do a Pi cluster, I am sure it would be an interesting project. No matter what, that ethernet bottleneck is going to make it much more limited in terms of use value. – goldilocks Dec 31 '15 at 17:10
  • 1
    @goldilocks 1) My knowledge about super computers is not good to argue about the Mbps-speed. 2) Over all, a Intel-CPU might be faster than a set of ARMv7 for the same price. I have to look up some stats to get a better view about this. 3) It takes hours to compile a kernel on a RPi2, I tried it and I hope to not do it again in the future.

    To summarize, a RPi might not be the perfect choice for a supercomputer, even not the RPi2. But it sure makes a good project, since the RPi is very cheap and you can easily combine several of them. I am thinking of an network simulation, because you can ...

    – Sirac Dec 31 '15 at 17:38
  • have several OS's for a very low price per machine. (You can see that I'm not an expert at this topic (yet)) – Sirac Dec 31 '15 at 17:39
2

This has a somewhat complex answer - a key question you need to answer is "what kind of work are you asking the machine to do?"

The instruction sets across different machines (ARM vs Intel vs whoever else) as well as the quality of the compilers make a big difference in actual performance. If the work you're asking to do has hardware acceleration on one machine but not another, that factor alone is going to make more difference than a significant change in the clock rate.

In the most general terms, I think the biggest bang-for-the-buck in terms of price/performance will come from a moderately-clocked multicore general CPU from AMD or Intel. If you're in a controlled environment where the ambient temperatures are low, you can likely overclock these chips a bit to get more performance.

The rasPi is definitely NOT designed for this sort of stuff, which should not take away from it's high value in learning how things work, and even building a truly "distributed" system at a very affordable price. But if serious data and/or number crunching is what you need to do, the rasPi isn't likely to be the right choice.

ljwobker
  • 131
  • 2
1

Leaving aside the underpowered cpu on the pi, I can not see how you are going to get data to the CPUs fast enough on dozens of pis to see performance gains worth the effort. Bus speed is every bit as important in clustered super computing as CPU speed is, and the pi is very much inadequate here.

Both the networking and disk access will be sharing the same 60 MBs USB2 bus. The SD card has, at best, performance in the 20 MBs speed range.

Low end pc hardware with SATA at 150 MBs and Ethernet on a 2 GBs PCI bus offer orders of magnitude more bandwidth.

sal
  • 161
  • 4
  • 3
    This depends entirely on the application - some applications have very high ratios of "work" to "communication" and others require much more communication per amount of work done. – Chris Stratton Jan 28 '14 at 17:38
0

If you want to learn supercomputing between nodes. .. I would set up the pi. If you want something inexpensive but powerful - buyt a used Xenon based multi core Intel Server/Workstation and put one or more used Tesla card(s) or Cuda gpu card(s) or an Intel Phi card(s).