NFSP

WIP Mon Nov 4 17:55:12 CET 2002

The most up-to-date information is available on NFSP page here.

Some statements done here are no longer accurate or true. The documentation is currently being updated to reflect these changes. Expect some proper release soon or check into the source code...
XXX means I have to write more things here.

Content

What is NFSP ?

NFSP aims at building a distributed NFS server. There are currently two versions (a user and a kernel mode version). A few sub-flavors may be enabled at compilation time by means of Makefile options.
The two main versions are: NFSP may be viewed as a mix of Parl's PVFS (Parallel Virtual Filesystem).
Our objectives are simple:

Current requirements

There are some limitations of the code that will probably cause problems with running or compiling the the code on different hardware or OS'es.
To summarize, NFSP have been tested on/with: It may work with lower versions (but was not tested), so feel free to let me know should this work on your system.


Compilation

A quick build method is available through the use of a script. Simply launch the script batchbuild You may also prefer building each component more "manually": Yet, you may copy only needed files according to the type of the node:

Installation: server and I/O daemons side

Three entities have to be distinguished: Among this set of nodes, there must be exactly one dedicated node which is to act as the server: this will be the only host your clients will have to know (same thing as for a plain NFS server).
The other nodes may be: iod(s) and/or client(s), provided there is at least 1 iod.


On the server:

On the I/O servers (iod's):

There are two ways to start the I/O storage servers and this depends on the clients' implementation.
If they are loosely checking the IP packets (as in Linux 2.4.x) then root privileges are not required and the following steps have to be done
If they strictly enforce IP checking, then root privileges are needed in order to enable UDP spoofing techniques. Yet the steps are not much altered.
Instead of starting iodng as the user, start it as root and use the -user ioduser flags. Once the application has its low-level sockets open, it changes its uid/gid to the ioduseruser's.

Back on the server:

As there are several NFSP flavors there are several ways to launch them...

User-mode without VIOD support
User-mode without VIOD support
If every step was successful then you have now a working NFS share available for your clients (see the part below to know how use it) !
If not, then go to the troubleshooting section.

Installation on the clients

Well... It's quite simple: the only thing the clients require is the kernel support of NFS protocol.
As root on all of your clients, issue the following command (metasrv being the host holding the dir /META):
root@clientbox# mount -t nfs metasrv:/META /mnt/mymountpoint/ -o rsize=8192,wsize=4096,intr
You may also edit the /etc/fstab if you happen to use often this NFS share to automagically mount it.
Note: DO NOT forget the options since otherwise you may have severe data corruptions (check known bugs below) or bad performances
And voilà it's done.

Troubleshooting

Component Trouble/Fix
hint all it's still a work in progress so do expect crashes, tears and desolation for your valuable data :)
hint all most executables will display help messages if you use the -h flag (recommended)
hint all for performance purposes, you may increase the number of processes since blocking read/write are used (check -h)
hint all you may tweak applications by editing the hardcoded values in default.h file
hint all nfsp specific #define's are at the beginning of iodng.h so if you need tweaking limitations, edit here
hint all location does not matter provided you specify it on the command line
hint all most applications have a man page (*.man) though these may be incomplete or accurate
ts iod paranoid setups may not allow IP spoofing so it may not work (check /proc/sys/net/ipv4/conf/*/rp_filter and /usr/src/linux/Documentation/filesystems/proc.txt)
ts iod the iod_ping utility may be used to "ping" an iod and to test if this host can do IP spoofing (check -s option)
ts iod the directory in which lives the iod must be at least 0700 and must belong to the uid/gid specified with the -u option (default to user nobody)
ts all most commands support a -F option (as in "Foreground") to tell the applications not to daemonize.
ts mount If you cannot mount your NFS share: check the files /etc/hosts.{allow,deny} for portmap and mountd, your metafile directory exists (/META above), test the iods with iod_ping, run foreground, enable debug mode SHOWINFO to 1 in dbg.h), check /var/log/syslog
BUG iod_ping spoof option (-s) may not work if you use "special" addresses to be spoofed (for instance 127.0.0.0/8 and 224.0.0.0/3) since it will interact strangely with routing tables. (You don't want to do that anyway, do you ? :)
BUG iod only 2 network interfaces are being currently supported for the iods (lo and eth0). They are probed by means of ioctl calls to get their MTU's and correctly fragment the IP packets.
BUG all this has only been tested with Linux x86 with Linux x86 (32bits) clients: there are issues with endianness and 64bits architectures...
BUG nfsd you may create a special block device - well if you are root - (mknod foo b 12 23) but you won't be able to remove it on a mounted partition
BUG nfsd if you truncate() a file then make it grow, by writing something beyond its new limit then read between the old end offset and the new end offset, there will most likely be stale data (understand: "undefined behavior") and not "0" as expected.
BUG all if an iod breaks, the system will break and clients hang. You may restart the iod and operations should work again (not much tested) but you should have mounted the NFS share with intr (or soft) option (as it was stated)
BUG nfsd the mount options wsize does not work for values over 4096 (it occurs if an access spanning on two data blocks is required) As a workaround, always use: -o rsize=8192,wsize=4096 (for rsize it will speed up read accesses)

Todo

Contact

Please, add the word '[nfsp]' somewhere in the subject of your mail and do not forget to set a real subject if you want a quicker answer.
The NFSP team within the ID-IMAG laboratory gathers Yves Denneulin (general coordination), Adrien Lebre (tools and performance evaluations), Pierre Lombard (prototype) and Olivier Valentin (kernel port). To contact a specific member, feel free use firstname.lastname@imag.fr or just mail me pierre.lombard@imag.fr.


Valid XHTML 1.0!