nettee - a network "tee" program
Synopsis
Description
Options
Related Programs
Copyrights
License
Author
nettee [options]
nettee passes a data stream to one or more child nodes using a "daisy chain" method. On each node nettee may also direct the stream to a file or pipe. nettee allows large amounts of data to be quickly distributed to multiple nodes on a switched network at a rate limited only by the network bandwidth. The distribution tree is typically linear for each network switch but may branch when nodes contain multiple interfaces. For maximum throughput only one instance of nettee should utilize each network interface.
When nettee starts it waits for a connection from the single parent node before attempting to connect to its child node(s). Consequently nettee may be started on the nodes in any order (by a script, rsh, ssh, and so forth.) Typically only the root node will be set to log messages, so that the progress of the transfer may be monitored. Transmission errors are detected by comparing the total number of bytes read by each child node with the number of bytes transmitted to that child.
nettee may also be used to perform processing of the data streams on each node. When used in this manner the data may flow either up or down the distribution tree. Additionally, on each node data may be consumed, created, or redirected. These actions are carried out by a child process on each node, said child process being a program which understands a simple command language used to communicate back and forth with the nettee parent process. This method may be used, for instance, to merge N presorted data files into one large sorted data stream.
The distribution tree is formed by a group of nodes running nettee. The topology of this tree is constrained in that it must possess one root node, may branch at each node to no more than 8 child nodes, and may not contain any loops. The root node has no parent node and at least one child node. Internal nodes have both a parent and at least one child node. Tip nodes connect to a parent but no child nodes. The topology is specified through the use of the command line options -next and -root. The direction of data flow within the tree is defined by the value of the -flow option. For maximum throughput on a switched network generally each node should connect to as many child nodes as it possesses network interfaces. If a group of systems on a switched network contain only one network interface each the most efficient distribution tree is a simple linear chain. Specifying N child nodes on a single network interface will reduce the throughput to each of those nodes to 1/N of the interface’s maximum throughput. This will also occur if nettee is used on a shared (as opposed to switched) network.
By default severe errors cause the entire chain to abort. By utilizing the -colrf, -colwf, -conrf, or -conwf, options nettee may be instructed to do its best to continue processing in the event of certain read or write failures of the data stream. Note that failures which occur while the distribution chain is forming are still fatal events. The -connf option allows the program to use a truncated distribution tree if errors are encountered during tree formation. In conjunction with the -next option supplied with alternate targets in each hostlist -connf allows the program to build alternative distribution trees in order to connect around failed nodes. If the node above the failed node is allowed to emit messages and errors ( for instance: -v 5 ) messages similar to these will be sent to the log destination ( -log ):Failures detected in child 0 [node34]: NWF
Failures detected in child 1 [node35]: NONE
Failures detected in chain: NWFThe first type of message describes the failures that were detected in the named child node, that is, those named in the -next option. The second message describes failures that were detected anywhere further on in the chain. The error codes currently defined are:
NONE no errors NWF network write failure LWF local write failure NRF network read failure LRF local read failure BBC child returned incorrect byte count BSTAT child returned unknown or bad status NNF could not connect to (one or more) child nodes
nettee will normally emit an EXIT_SUCCESS status. (0 on Unix.) This is true even if the errors were detected and handled in the node itself or in a child node. nettee will emit an EXIT_FAILURE status if it was forced to close by an unhandled event such as a timeout, write failure, or unexpected socket closure.
Version 0.2 with the command line options -flow push may be employed in a distribution tree with older 0.1.x versions of nettee. Version 0.2 added the command line options -root, -buf_*, -conrf, and -conlrf, and dropped the option -in nettee.
-buf_size N Set the size in bytes of the transfer buffers. Increasing the size of the buffer may be useful in acheiving maximum throughput when the nodes in the distribution tree are not always working at 100% capacity, for instance, when some other task is running. The default is 1048576.
-buf_minread N Set the minimum free storage in a buffer before more data will be read in. The default is 128.
-buf_minwrite N Set the minimum/maximum stored data in a buffer before data will be written out. If N is larger than 1 data may remain in the output buffer until more data is added to the buffer or it is flushed at program exit. For this reason such a value should not be combined with the -stm option. Additionally, if an output buffer fills to the point where less than N bytes remain free, the input(s) will not be read until such time as this buffer has emptied sufficiently. Otherwise, if an output buffer is very nearly full and is only emptying slowly, a very high rate of cycling may result, where only small numbers of bytes at a time are read and written. The default is 1.
-buf_maxread N
-buf_maxwrite NSet the maximum number of bytes to read/write from a buffer at once, before other IO operations are allowed to proceed. The -buf_maxread and -buf_maxwrite options may be used to force better interleaving of IO operations. If the buffers are large and these values are set to a very high value a fast data source/sink may completely fill/empty an entire buffer at once, never allowing time for other IO operations to run. For instance, if input is from a one gigabyte file, and both the buffer size and the maxread value are also one gigabyte, nettee will move data continuously from the input file to the input buffer until it fills - without pausing to do anything else. The default is 65536.
-cmd COMMAND Mandatory option when either -in socket or -out socket are used. COMMAND is the program or script that writes into, or reads from, the socket. Since only a single COMMAND may be specified socket may not be applied to both -in and -out at the same time. Additionally, the -cmd and -process options may not be used together. When -cmd is used with -in socket a child process running COMMAND reads data from a disk or other device and writes the resulting data stream to stdout. When -cmd is used with -out socket a child process running COMMAND reads the datastream from stdin and writes the processed data to a disk or other device. Typically the COMMAND string invokes tar or some other archiving program. In some instances using sockets and -cmd will be faster than using the same command in a pipe due to the larger buffer size used for the socket. Run nettee -hexamples to see a usage example.
-colrf Continue on Local Read Failure. Normally the failure of a read of the data stream from a local input will be fatal and the entire distribution tree will collapse immediately. When -colrf is set and a local read failure occurs on a node that node will continue to relay data along the chain. The top node will emit an error message when this occurs so that a subsequent analysis with other tools may locate the node(s) which failed. There may be instances where this is useful but in general continuing on this type of failure will result in expected information not being present in the data stream.
-colwf Continue on Local Write Failure. Normally the failure of a write of the data stream to the local output will be fatal and the entire distribution tree will collapse immediately. (Typically this happens when data is written to disk and a partition fills or there is an ownership problem. A complete disk failure may initially present this way but often goes on to crash the node, resulting also in a network write failure.) When -colwf is set and a local write failure occurs on a node that node will continue to relay data along the tree. The node that failed will not have correctly processed the data stream locally but all other nodes will be unaffected by this failure. The top node will emit an error message when this occurs so that a subsequent analysis with other tools may locate the node(s) which failed.
-conrf Continue on Network Read Failure. Normally the failure of a read of the data stream from the previous node will be fatal and the entire distribution tree will collapse immediately. (Typically this happens when a node crashes while nettee is running.) When -conrf is set and a network read failure occurs on a node (indicating that the previous node has failed) the node will continue to process the data stream locally but will make no further attempts to transfer data from the previous node in the tree. This allows the data transfer to complete on a tree below a failed node. It is unlikely that very many instances will be found where this switch is actually useful since data from the tree on the other side of the failed node will be lost.
-conwf Continue on Network Write Failure. Normally the failure of a write of the data stream to the next node will be fatal and the entire distribution tree will collapse immediately. (Typically this happens when a node crashes while nettee is running.) When -conwf is set and a network write failure occurs on a node (indicating that the next node has failed) the node will continue to process the data stream locally but will make no further attempts to transfer data to the next node in the tree. This allows the data transfer to complete normally on a tree down to the node above a failed node. The top node will emit an error message when this occurs so that a subsequent analysis with other tools may locate the node(s) which failed.
-connf WAIT Continue on Nexit Node Failure. Give each node in a hostlist WAIT seconds to join the chain. After that each successive host in the hostlist is given WAIT seconds to join, and if none succeed, no data will be sent to any of those hosts. If -connf is not specified or the wait time is set to zero seconds, the program will wait forever for a connection to the first node in each hostlist.
-flow DIR Sets the direction of data flow in the distribution tree. DIR may be either push, where data flows away from the root node, or pull, where data flows towards the root node. The default is push. The pull option may only be used in conjunction with the -process option. Additionally -flow defines the default settings for -in and -out. For -flow push these are: root node: -in - -out none internal node(s): -in none -out - tip node(s): -in none -out -For -flow pull these are:root node: -in none -out - internal node(s): -in none -out - tip node(s): -in - -out none-h Print nettee help information.
-hexamples Print nettee examples.
-herrors Print nettee error status codes.
-hprocess Print verbose information on the -process option. Describes the interface between nettee and the CMD started by the -process option. See also the nettee_cmd(3) man page.
-i Print version, license, and copyright information.
-in SRC1(,SRC2,SRC3...) Reads local data from SRC which may have one of four values: none no local input; - reads from stdin; socket read the output of a command from a socket; filename reads from a file. More than one SRC may be used only if -process is also specified. When more than one SRC is employed only one may be from stdin and none may be from a socket or none. If no -in option is present the default value is determined by the -flow option.
-log LDST Errors and messages are written to LDST which may have one of two values: - writes to stderr or filename writes to a file. If no -log option is present the program writes messages to stderr.
-name STRING Specify the node name used in messages (<=127 characters). If not supplied the values of the environmental variables MYHOSTNAME and HOSTNAME are first checked, and if those are not defined, the result of a gethostname() call is used.
-next HOSTLISTS Writes data to child node destination[s] hostlist1(,hostlist2(,hostlist3(...))) where the hostlist entries are separated by commas or spaces. A hostlist consists of either a single hostname, or a comma separated list of hostnames enclosed in square brackets. Example: node1,[node2,node3],[node4,node5,node6],node7. The bracketed form allows for automatic failover if unreachable nodes are encountered and if -connf is specified. The first hostname in the list is tried, then the next, and so on. There may be 1-8 hostlists. The number of hostlists controls the topology of the distribution chain. Use a linear distribution chain (a single hostlist) when all nodes share a single network switch. Use a forked distribution chain (multiple hostlist) when nodes are connected to two or more network switches. The End of Chain condition (no child write) is indicated by a HOSTS value of . , , or _EOC_ . An End of Chain condition is also indicated by the absence of an -next option. If End of Chain is indicated there may not be any other hostslists specified.
-out DST Writes data locally to DST which may have one of three values: none writes nothing locally; - writes to stdout; socket write the datastream to a command through a socket; filename writes to a file. If no -out option is present the default is set by the -flow option.
-p,-port PORT First of two consecutive ports use for communication. If no -port option is present the program uses the default value of 9997.
-process CMD Specifies the command to be executed instead of the normal copy operation. Input is from the source(s) specified by -in and from the data distribution tree. Output is to the local destination specified by and to the data distribution tree. Whenever two or more data sources exist on a node some form of merge is required and -process must be specified to supply that function. For -flow push the distribution tree may branch whether or not -process is set, but -process must be used if local input is to be accepted on nonroot nodes. For -flow pull only if -process is set may the distribution tree branch or local input be accepted on internal nodes. May not be combined with the -cmd option. See also -hcommand which provides more details
-q Suppresss "ignored signal" messages.
-root Defines the current node as the root node of the distribution tree. If this is not specified a node will wait for an incoming connection request from the node which will become its parent. This tells the root node that no such request will be forthcoming, and to begin connecting to the child node(s) specified by the -next option. The distribution tree always assembles down from the root node, even if -flow pull is specified.
-stm EOS Stream text through a nettee chain until the string EOS is encountered, then exit. This allows short text messages to traverse the chain without waiting for a buffer to fill. Since the text message can very rapidly traverse the nettee chain it can be piped into execinput (or any other program that will execute its stdin as commands) to produce essentially simultaneous execution on all target nodes. The EOS string is not passed through the data chain and its length is ignored. When used to start further nettee processes on the target nodes PORT values must be chosen to avoid interference. While this mode may be convenient for setting up Beowulf nodes it is exceedingly dangerous for general use since any command introduced into the command stream will execute on all chain nodes as if submitted by the owner of the nettee process on that node. Run nettee -hexamples to see a usage example.
-t WAIT Wait up to WAIT seconds for a connection from the parent node in the distribution tree or data to be received. If neither of these events occur exit with an error. A value of 0 waits forever and will only exit on an end of data condition. If no -t is present the program uses a default WAIT value of 0. The -connf WAIT and -w options control timeouts for connections to child nodes.
-v VERBOSE VERBOSE is a bit mask which controls the types of warning and error messages which are sent to the -log destination. Bit values indicate: 1 show error messages 2 show command line settings 4 show messages 8 show periodic status messages during transfer 16 prepend nodename to all messagesUse a VERBOSE value of 0 to eliminate all messages. If no -v is present the program uses a default VERBOSE value of 1.
-w Wait for the next node to boot or attach to the network. If not specified and the next node is not reachable nettee will exit with an error no matter what the -t WAIT and -connf WAIT timeout values are.
Man pages: netcat(1) nettee_cmd(3)
nettee is derived from Felix Rauch’s dolly which is available here: http://www.cs.inf.ethz.ch/CoPs/patagonia/#dolly
The nettee home page is: http://saf.bio.caltech.edu/nettee.html
Copyright: 2008 David Mathog and Caltech. Copyright: Felix Rauch and ETH Zurich
Freely distributed under the second GNU General Public License (GPL 2).
David Mathog Biology Division, Caltech
| nettee 0.2.0 | nettee (1) | MAR 2008 |