Infiniband at Home (10Gb networking on the cheap)

Would you like to have over 700MB/sec throughput  between your PCs at home for under €110? That’s like a full CD’s worth of data every second! If you do, then read on….

–edit–

Since this article was originally written, I’ve found the real-world throughput of infiniband from a windows machine and an ubuntu machine gives me a max of 135MB/sec, just under twice my 1gbps ethernet (75MB.sec). Thats with a raid array capable of 350MB/sec on the linux side, feeding a samba link to the windows machine at 95% CPU. So, it falls a lot short of the desired 700MB/sec that I thought may be possible. It’s not possible with IP over Infininband. And iSER isnt available on windows, so no SRP targets could be used, which uses RDMA. So a whole lotta research leading to block walls and 135MB/sec max.

—end edit—

With the increasing amout of data that I have to manage on my computers at home, I started looking into a faster way of moving data around the place. I started with a RAID array in my PC, which gives me read-write speeds of 250MB/sec. Not being happy with that, I looked a creating a bigger external array, with more disks, for faster throughput. I happened to have a decent linux box sitting there doing very little. It had a relatively recent motherboard , and 8 SATA connectors.  But no matter how fast I got the drives in that linux box to go, I’d always be limited by the throughput of the 1Gb ethernet network between the machines, so I researched several different ways of inter-PC communication that might break the 1gbps barrier. The 1GB ethernet was giving me about 75MB/sec throughput.

The first I looked at was USB 3.0 (5 gbit/s). While that’s very good for external hard drives, there didnt seem to be a decent solution out there for allowing multiple drives to be added together to increase throughput. We are now starting to see raid boxes appear with USB3.0 interfaces, but they are still quite expensive. To connect my existing linux box to my windows desktop, I’d need a card with a USB 3.0 slave port so that the external array would look like one big drive, and max out the 5Gbps bandwidth of a USB 3.0 link . However, these do not seem to exist, so I moved onto the next option.

Then I moved on to 10G Ethernet (10 gbit/s). One look at the prices here and I immediately ruled it out. Several hundred Euro for a single adapter.

Fibre channel (2-8 gbit/s). Again the pricing was prohibitive, especially for the higher throughput cards. Even the 2Gbps cards were expensive, and would not give me much of a boost over 1Gbps ethernet.

$_12
Then came Infiniband (10-40 gbit/s). I came across this while looking through the List of Device Bit Rates page on Wikipedia. I had heard of it as an interconnect in cluster environments and high-end data-centres. I also assumed that the price would be prohibitive. A 10G adapter would theoretically give up to a Gigabyte per second throughput between the machines. However, I wasn’t ruling it out until I had a look on eBay at a few prices. To my surprise, there was a whole host of adapters available ranging from several hundred dollars down to about fifty dollars. $50? for a 10Gig adapter?   Surely this couldn’t be right. I looked again, and I spotted some dual port Mellanox MHEA28-XTC cards at $35.99. This worked out at about €27 per adapter, plus €25 shipping. Incredible, if I could get it to work. I’d also read that it is possible to use a standard infiniband cable to directly connect two machines together without a switch, saving me about €700 in switch costs. If I wanted to bring another machine into the Infiniband fabric, though, I’d have to bear that cost. For the moment, two machines directly connected was all I needed.

With a bit more research, I found that drivers for the card were available for Windows 7 and Linux from OpenFabrics.org, so I ordered 2 cards from the U.S. and a cable from Hong Kong.

About 10 days later the adapters arrived. I installed one adapter in the Windows 7 machine. Windows initially failed to find a driver, so I then went on the OpenFabrics.org website and downloaded OFED_2-3_win7_x64.zip. After installation I  had two new network connections available in windows (the adapter was dual-port), ready for me to connect to the other machine.

Next I moved onto the Linux box. I won’t even start with the hassle I had to install the card in my linux box. After days of research, driver installation, kernel re-compilation, driver re-compilation, etc. etc., etc., etc., I eventually tried swapping the slot that I had the card plugged into. Low and below, the f&*cking thing worked. So, my mother board has two PCI-Ex16 slots, and the infiniband adapter would work in one, but not in the other. Who would have thought. All I had to do then was assign an IP address to it. –EDIT– here’s a quick HOWTO on getting the fabric up on Ubuntu 10.10. About 10 minutes should get it working – http://davidhunt.ie/wp/?p=375 –EDIT–

Without a cable (it still had not arrived from Hong Kong), all I could do was sit there and wait until it arrived to test the setup. Would the machines be able to feed the cards fast enough to get a decent throughput? On some forums I’d seen throughput tests of 700MB/sec. Would I get anywhere close to that with a 3GHz dual core athlon to a 3GHz i7 950?

A few days later, the cable arrived. I connected the cable into each machine, and could immediately send pings between the machines. I’d previously assigned static IP addresses to the infiniband ports on each machine. I wasn’t able to run “netperf”, as it didn’t see the cards as something it could put traffic through. So I upgraded the firmware on the cards, which several forums said would improve throughput and compatibility. Iwas then able to run netperf, with the following results:

root@raid:~# netperf -H 10.4.12.1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) 
port 0 AF_INET to 10.4.12.1 (10.4.12.1)
port 0 AF_INET : demo
Recv   Send   Send
Socket Socket Message Elapsed
Size   Size   Size    Time     Throughput
bytes  bytes  bytes   secs.    10^6bits/sec
87380  16384  16384   10.00    7239.95

That’s over 7 gigabits/sec, or over 700MB/sec throughput between the two machines!

So, I now have an Infiniband Fabric working at home, with over 7 gigabit throughput between PCs. The stuff of high-end datacentres in my back room. The main thing is that you don’t need a switch, so a PC to PC 10-gigabit link CAN be achieved for under €110! Here’s the breakdown:

2 x Mellanox MHEA28-XTC infiniband HCA’s @ $34.99 + shippping = $113 (€85)

1 x 3m Molex SFF-8470 infiniband cable incl shipping = $29 (€22)

Total: $142 (€107)

The next step is to set up a raid array with several drives and stripe them so they all work in parallel, and maybe build it in such a way if one or two drives fail, it will still be recoverable (raid 5/6). More to come on that soon.

References:
http://hardforum.com/showthread.php?p=1036510049
http://www.zdnet.com/blog/storage/build-a-10-gbit-home-network-for-1100/284
http://www.gossamer-threads.com/lists/drbd/users/19594
http://www.mellanox.com
http://www.openfabrics.org

29 thoughts on “Infiniband at Home (10Gb networking on the cheap)”

  1. Some additional Technical Info:
    Updating the firmware was problematic in Ubuntu, but a breeze in windows 7. On Windows, get the MFT from the Mellanox website. Binaries are installed by default in c:Program FilesMellanoxWinMFT. Open a command prompt as administrator (Browse to c:windowssystem32, right click on cmd.exe and open as administrator). Cd to the WinMFT dir, and then follow the instructions from the Mellanox website on how to find the adapter information, which firmware to download, and how to burn it to the adapter. Oh, and I had to use the -skip_is parameter with flint to get the firmware to go on, and it worked fine. I did both adapters this way.

  2. And then came Thunderbolt. A newly announced inteconnect from Intel, which is being released on the latest Macbook Pro, gives 10Gbps full duplex transfers. Runs displayport and PCI-Express protocols. Sounds like it’s got a lot of potential… 🙂

  3. Thank you very much for this article that quickly and simply answered to my interrogations 🙂 ! Indeed, i wanted to setup such a network (with same components) but couldn’t believe about the low costs…

    But i’m still wondering about something : according to your tests, 10GB bandwidth is available per port… So, how to optimally use the two ports : aggregate them (20GbE) (how ?), daisy-chain, etc… ?

    Thank you again !

  4. I forgot to suggest : 2 computers linked @10GbE each one to a “central” computer while using on this the 2 ports ?

  5. XZed, I havent experimented with port bonding yet, mainly because the disk subsystem in my raid box is nowhere near maxing out one port, never mind two. I’ve seen howto’s on linux, but windows may be a bit more difficult.

    On your other question, that sounds like a good idea. Assign a different ip address to each port on your central machine, then you get two machines with infiniband access.. 🙂

  6. Epic. I followed the same path and came to the same conclusion a while back, but didn’t have a proper server at the time. I just upgraded to a box with a PCIe slot, quickly dismissed 802.3ad for bandwidth increase, determined that Infiniband was still affordable, then decided to Google “Infiniband at home” just for fun. Thanks for confirming what I thought was correct!

  7. Thank you very much daveh for your answer :).

    And i’m glad to see that we are a few ones landing to this post after such researches :).

  8. Hi David!
    Maybe You can help me: I am just starting with Infiniband. I want to send messages like “Hello World” between 2 PCs. I don’t want to use MPI, because I need to make a relatively low-level program to control the Infiniband port. I need to programme a simple C program, to connect both PCs. Why I need to do this? Because later I will connect one of the PCs to a particular device which has an Infiniband port. At this moment I don’t know which kind of protocol uses that device. The only thing I know, is that this device can send and receive bytes. So, the first step I want to accomplish in order to do that, is connect 2 PCs with a simple C program, with a device level code.
    I’m trying to use a File Descriptor, using open(), read() and write(). But my problem is I don’t know which Device File must I use (I am trying with different files that I can see in /dev/Infiniband/ – I am using Centos 5.5). But I couldn’t reach any positive result with this.
    Maybe I am in the wrong way, and for my purpose maybe I need to do another thing, and not use open(), read() and write(). Thanks, Gustavo

  9. David great article ! I just ordered the same Mellanox cards fom ebay. Btw as these cards are dual port could i have my server connected to 2 client PC’s at the same time ?

  10. Lance, Yes, just assign IP addresses on different subnets to each port (ibo and ib1), and two client PCs is possible.

  11. @Gustavo, You need to know what protocol you’re going to use, otherwise the possibilities are endless. UDP, TCP, port numbers, etc.

  12. Dear Daveh,

    Strongly encouraged by your article, i finally got the same package that you adviced.

    Only a small difference but even better :

    MHEA28-XTC+MHGA28-1TC (20Gbs!)+Cable.

    Obviously, i’m conscious about “being bottlenecked to 10Gbs” due tu the MHEA28-XTC. But i don’t care, my main goal was to, at least, achieve the same performance you encountered.

    Between 2 windows 7 boxes, i reach and average 3Gbs :/ … Obviously, i made tests similars to yours (indepently from physical storage : card to card).

    As i knew not being bottlenecked by physical storage (due to being RAID storages), i even made real transfer tests : the previous value was reported (~ 3Gbs).

    Meanwhile, I read a alot about RDMA, SDP, WSD, SRP, iSER, etc… Indeed, RDMA seems the only way to achieve such speeds (In my case, i could reach 8Gbs w/ RDMA transfer tests).

    I quite thought having found the solution : set up SRP/iSER in order to transfer files and optimize bw using iSCSI over RDMA. But, it seems impossible to setup a simple iSCSI target/client scheme between 2 Windows computers (to be accurate, iSCSI target doesn’t exist for Windows (only initiator) : only for Linux w/ SCST program)).

    But, whatever, i couldn’t how you reached such bandwidth on IPoIB ! I wanted to ask you about real tests, but you can’t really prove IB bw as your RAID storage “only” reaches 2Gbs. Since it, i don’t know if you enhanced your storage.

    By the way, could you post your forum sources in order i read their setup ?

    Thank you.

    Sincerely,

    XZed

  13. XZed,
    I have not done much since my initial testing on this, exept today I placed an order for 5 Seagate Barracuda 7200.12 drives at £35.99 each from Scan.co.uk. Next step is to set them up as a raid 0 array (later I’ll change to raid 5) adn test the drive throughput, then test across infiniband IPoIB.
    The initial 7gbps speed was using netperf, and I have not been able to replicate that since, as I’m now using the drivers with the stock Ubuntu distribution. Still, Ramdisk to Ramdisk is giving me about 3gbps.
    nfs with rdma may not be possible, as windows home premium does not have nfs client capability, and even if it did, I’m not sure if it supports nfs over rdma. Still, I might look into iSER, etc.
    Rgds,
    Dave.

  14. Hello,

    Thank you for sharing your walkthrough :)…

    Well, i just hope you’ll share your results as soon as you’ve set it up :).

    Thank you.

    Sincerely,

    XZed

  15. I set up iSCSI this evening.
    For Ubuntu target (Server)
    http://www.howtoforge.com/using-iscsi-on-ubuntu-10.04-initiator-and-target
    For Windows initiator (Client)
    http://www.windowsnetworking.com/articles_tutorials/Connecting-Windows-7-iSCSI-SAN.html
    Amazingly, the iScsi initiator software is in Windows 7 Home Premium, and not kept for Pro or Ultimate, like other useful software like software raid or NFS client.
    Anyway, using the above tutorials, it was a breeze to set up iSCSI across the infiniband fabric.
    And, a file copy that previously used 75% CPU now uses only 25% CPU.
    Next is to wait for the 5 new drives to arrive so I can build a decent raid array and give this a good testing.

  16. XZed,
    Thanks for the links, very useful. I’ll try that.
    I am using the target provided for by the “iscsitarget” package (which I now realise is the wrong one for RDMA). I do indeed want to use RDMA, so the second link you provided has very inportant information about removing the iscsitarget package before installing iscsi-scst. Very good to know.
    The windows OFED distro does mention SRP, but I don’t think I have that option currently installed, and my windows machine gives errors when I try to install/change/remove anything to do with OFED. I might have to re-install altogether to get the SRP functionality loaded onto my windows box.
    My drives should arrive around 04/04, so I should have more updates then. 🙂

  17. Hello,

    Glad to have given you such a good link :).

    Indeed, i also had such effect (errors while trying to modify any previous OFED install) and resolved it by uninstalling/reinstalling the whole thing.

    Let’s wait for the drives 🙂

  18. OK, with the hard drive controllers I have in the linux machine, the fastest I can get the drives going is 350MB/sec.
    I’ve now got iSCSI targets configured in both normal iSCSI and RDMA protocola. However, the brick wall I’ve just hit is that the iSCSI initiator configuration on windows does not see the SRP targets, only the regular iSCSI targets. Seemingly the iSER initiator software is only available on linux.
    Back to the drawing board.
    I’ll just have to live with regular iSCSI.

  19. Some more research:
    Mellanox WinOF VPI for Windows version 2.1.3 was released in Feb 2011. This contains drivers for ConnectX based adapter cards and has Windows 7 support. Also has Beta support for Winsock Direct, Sockets Direct Protocol (SDP) and SCSI RDMA Protocol (SRP).
    So, to get SRP on windows I’d need to update my HCA.

  20. Your throughput problems are most likely due to CIFS/Samba, which is horribly latency sensitive, and Windows has a high latency turnaround. You’d need to run SMB2 from a Vista client to Samba (assuming it supports SMB2 these days) . Or, alternatively, use an NFS stack on Windows.

  21. Great article, very nice read.

    Only a 25% CPU load sounds very good. But I will wait a bit before jumping on the beyond 1GB bandwagon. Wait until we see a bit more about Intel’s new Thunderbolt. This should be a bit more future proof, and cheaper, compared to infiniband, hopefully.

  22. I did a lot of testing with Infiniband in a commercial setting and found your comments about it as a useful starting point to understand infiniband in general.

    Initially for testing we started using DDR hardware and that works really well. I was able to get good speed from it. However, when we finally got the FDR hardware we were initially disappointed as it wasn’t much quicker than DDR and for small packets it was slower for us (using our own custom RDMA protocols). What I found was that memory speed starts coming into play and many other factors that actually require a faster PC (rather than a cheap home one). And also, as you found out, software design makes a huge difference. So if you do start experimenting with Infiniband at home, DDR works well but as you start getting quicker, the PC’s at each end need to be quicker. Particularly the memory.

  23. Thanks for posting your research,im wondering did you fiddle with the buffer sizes in samba/nfs?it supposedly can make a difference….

Comments are closed.

%d bloggers like this: