This document provides an overview of the basic descriptions and operations necessary to run an RWhois server installation. The RWhois server package consists of the server process itself (rwhoisd), tools to enable and manage the native database (rwhois_indexer, repack), and a number of configuration files (rwhoisd.conf and rwhoisd.dir, for example).
The following offers a brief description of the programs and utilities found in the RWhois server release.
rwhoisd is the RWhois protocol server.
Summary: rwhoisd [-c config file] [-r] [-s] [-Vvq] [-di]
Summary: rwhois_indexer [-c config file] [-C class] [-A auth area] [-ivqn] [-s suffix|file list …]
At start up, the RWhois server reads various configuration files. They are categorized into the general (server) configuration files and authority area (database) configuration files.
General Configuration Files
General configuration files consist of the main configuration file, directive configuration file, extended directive configuration file, directive security files, and the RWhois parent file. In these configuration files, extra white space is ignored and lines beginning with the '#' character are treated as comments.
1. Main Configuration File (rwhois.conf)
The main configuration file is a "<tag: <value" delimited file with the following tags.
root-dir: /home/databases/rwhois/sample.data bin-path: bin auth-area-file: rwhoisd.auth_area directive-file: rwhoisd.dir x-directive-file: rwhoisd.x.dir max-hits-default: 20 max-hits-ceiling: 2000 register-spool: register_spool punt-file: rwhoisd.root local-host: host.domain.com local-port: 4321 security-allow: rwhoisd.allow security-deny: rwhoisd.deny deadman-time: 200 server-type: daemon userid: guest chrooted: yes server-contact: email@example.com use-syslog: no default-log-file: rwhoisd.log
2. Directive Configuration File (rwhois.dir)
The directive configuration file contains entries to enable or disable the RWhois directives.
Soa yes Register no
3. Extended Directive Configuration File (rwhois.x.dir)
The extended directive configuration file is a "<tag: <value" delimited file with the following tags.
Command: date Command-len: 4 Program: /usr/bin/date --- command: pgp command-len: 3 description: The PGP keyring gateway directive program: Xpgp
4. Directive Security Files (rwhois.allow/rwhois.deny)
The directive security files are (or may be) localized versions of Weitze Venema's TCP Wrapper configuration files. In general, entries in this file take the form of
where <directive is a particular directive name without the leading '-'. (i.e. 'xfer', 'register', 'X-pgp'), and the security pattern is a space delimited list of IP addresses or domain names. See hosts_access(5) located in the tcp_wrappers distribution.
xfer: 198.41.0 x-date: all
register: all xfer: all
The RWhois punt (or parent) file contains a list of RWhois Universal Resource Locators (URLs) that are referrals to a higher point in the RWhois information tree. At this point, the server does not arbitrate between the different punt referrals listed in this file, so all listed destinations should be equivalent.
B. Authority Area Configuration Files
An authority area is an identifier for an RWhois database containing data and its schema. It has a hierarchical structure that helps identify the position of the database in the global RWhois data information tree. An authority area has the structure of either a domain name or an IP address in quad-octet prefix/prefix length format.
The authority area configuration files consist of the authority area file, the Start of Authority (SOA) file, the schema file, and the attribute definition files.
1. Authority Area File (rwhois.auth-area)
The authority area file is a "<tag:<value" delimited file containing information about the authority areas for which the RWhois server is primary or secondary.
A primary (or master) RWhois server is where data is registered for an authority area; it answers authoritatively to queries for data in that authority area. There must be one and only one primary server for a particular authority area. An RWhois server may be primary for multiple authority areas. The authority area model is explained in more detail below.
A secondary (or slave) RWhois server is where data is replicated from a primary server for an authority area. It, like its primary server, answers authoritatively to queries for data in that authority area. There can be multiple secondary servers for a particular authority area, and an RWhois server may be secondary for multiple authority areas.
The authority area file contains the following tags.
Type: master Name: a.com Data-Dir: a.com/data Schema-File: a.com/schema Soa-File: a.com/soa Slave: rwhois.internic.net 4321 Slave: dmeister.rwhois.net 4321 --- Type: master Name: 10.0.0.0/8 Data-Dir: net-10.0.0.0-8/data Schema-File: net-220.127.116.11-8/schema Soa-File: net-18.104.22.168-8/soa Slave: rwhois.internic.net 4321 Slave: dmeister.rwhois.net 4321 --- Type: slave Name: b.com Data-Dir b.com/data Schema-File: b.com/schema Soa-File: b.com/soa Master: rwhois.b.com 4321
2. SOA File
The SOA file is a "<tag: <value" delimited file with the following tags.
Serial-Number: 19961008101010 Refresh-Interval: 3600 Increment-Interval: 1800 Retry-Interval: 60 Time-To-Live: 86400 Primary-Server: rwhois.internic.net:4321 Hostmaster: firstname.lastname@example.org
3. Schema File
The schema file is a "<tag: <value" delimited file with the following tags.
name: contact alias: user attributedef: a.com/attribute_defs/contact.tmpl dbdir: a.com/contact description: user class
4. Attribute Definition File
The attribute definition file is a "<tag: <value" delimited file that describes the attributes of a particular class. It has the following tags.
The valid attribute types are
Currently the valid index types are
attribute: name attribute-alias: nm description: full name is-primary-key: TRUE is-required: TRUE is-repeatable: FALSE is-multi-line: FALSE is-hierarchical: FALSE index: ALL type: TEXT --- attribute: email attribute-alias: em format: re:[a-za-z0-9-._]+@[a-za-z0-9-.] description: rfc 822 email address is-primary-key: TRUE is-required: TRUE is-repeatable: FALSE is-multi-line: FALSE is-hierarchical: TRUE index: EXACT type: TEXT
IV. The Native Database (MKDB)
The RWhois server uses its own database, named MKDB (Mark Kosters' Database). It is a fairly simple database whose purpose is to scale up well to larger databases; currently there are 1.9 million records in the RWhois root. The database is designed to be simple to understand and can be manipulated by hand.
MKDB's foundation is a series of sorted index files containing pointers to entries in data files. To support this, there are (currently) three different kinds of files that MKDB uses: data files, index files, and master file lists.
B. The Files
For rwhoisd, data is segregated by authority area and "class" into separate data directories, where it is then indexed. Data added via the protocol (using the "-register" directive) is automatically indexed. For initial database loads, or by-hand manipulation of the data, a command-line indexer (rwhois_indexer) is provided. For each class, there is a single master file list (typically called "local.db") and any number of index and data files.
C. The Master File List
The master file list is a list of all of the data and index files for a particular class. It exists primarily to define which index and data files are currently relevant to the database and to assign each file an index number. The file list also tracks a number of statistics (number of records, size in bytes) designed to help the search engine.
The format of the master file list is considered to be opaque, as it may change at any time. It is manipulated entirely by the indexing process. The following is a sample of the current format, with an explanation of the different fields. The master file list consists of "<tag: <value" pairs separated into records by the record separator ("---"). The current tags include the following.
Example (this is a.com/data/domain/local.db):
Type: DATA File: a.com/data/domain/domain.txt File_No: 0 Size: 581 Num_Recs: 1 Lock: OFF --- Type: EXACT File: a.com/data/domain/index-0.ndx File_No: 1 Size: 33 Num_Recs: 2 Lock: OFF
Note that it is entirely possible for the index and data files to exist outside of the directory structure. The only file in MKDB that needs to be in a predictable place is the master file list itself.
D. The Data Files
Data files have a similar format to all of the configuration files in the rwhois server: they are "<tag:<value", where "<tag" is an attribute name. The different records are separated with the record separator. Lines beginning with '#' are considered comments and ignored. Case and leading and trailing whitespace is also ignored. The data should conform to the class description described in the attribute definitions file, and it should contain (at least) the required attributes contained in the base class.
ID: 222.a.com Auth-Area: a.com Name: Public, John Q. Email: email@example.com Type: I First-Name: John Last-Name: Public Phone: (847)-391-7926 Fax: (847)-338-0340 Organization: 777.a.com Created: 11961022 Updated: 11961023 Updated-By: firstname.lastname@example.org --- ID: 223.a.com Auth-Area: a.com Name: Doe, Jane Email: email@example.com Type: I First-Name: Jane Last-Name: Doe Phone: (847)-391-7943 Fax: (847)-338-0340 Organization: 777.a.com Created: 11961025 Updated: 11961025 Updated-By: firstname.lastname@example.org
Attributes can either be TEXT, ID, or SEE-ALSO types. Type ID attributes should contain the ID of the referenced RWhois object. Type SEE-ALSO attributes should be URLs.
When data records are deleted via the "-register" directive, they are not actually removed immediately. First, they are marked for deletion by replacing the first character of every line in the record with an underscore character ('_'). The process of actually removing deleted records from a file completely is known as a "purge" and is covered below.
The number of data files have no substantial impact on the performance of rwhoisd, although an extreme number of data files can slow down the "-xfer" directive.
E. The Index File
The index file format is very simple. It consists of a number of sorted index records, where each record contains a pointer to a location in a data file, a "deleted" flag, the "global" id of the attribute, and the key.
An index record has the following format.
<file offset:<data file no:<deleted flag:<global attribute id:<key
This indicates that the record containing the key "EDWARD" is 398 bytes into data file "0", it is not deleted, and it corresponds to the global attribute "8" (Last-Name). The key is always stored in uppercase letters.
Each index file contains the indexed keys of one or more data files, and each data file should only have one corresponding index file. While it is certainly possible to index a single data file into multiple index files using the provided indexer, this will produce "false multiples" of records. That is, a query that should result in one record being found will instead result in multiple identical records being found.
There are three different types of index files: EXACT, SOUNDEX, and CIDR. They all share the common index file format. The only difference between them is how they are treated by the search engine. For instance, when searching a SOUNDEX index file, a transform (soundex) is performed on the search key first.
There is no limit to the number of index files, but if there are more index files, the search will be slower. As the number of index files increases, the typical binary search will approach a linear search in performance.
The MKDB indexes are generated using a basic process.
1. The first step of the process is to identify the actual data file(s) to be indexed, and the authority area and class to which those data files belong.
Indexing can occur in one of two ways: as part of the "-register" directive and "by hand" using the command line indexer. The indexing that occurs during the "-register" directive processing is handled automatically and uses a subset of the functionality available in the command line indexer. For instance, the syntax checks are skipped, because the register directive has already performed them. The "-register" directive also adds data in a fast, incremental fashion. Each "-register" action, if it succeeds, produces a data file and an index file. If "-register" is used often fairly severe fragmentation can ensue. In this case, the purge operation should be used to defragment the database; purging is discussed in the next section.
The command line indexer is probably the most convenient way to index data. In the most basic operation, it is used to index data initially. The most convenient way to do this is to place all of the data files in the appropriate data directories (as indicated by the "db-dir" attribute in the schema file) and name all of the files with a common suffix. Then, index all the files in a single step.
% rwhoisd_indexer -i -s "suffix"
The "-i" option removes all previous index files, and the "-s" option indicates that all files ending in "suffix" should be indexed. In the sample database, all data files end in ".txt" but could end in any suffix except ".ndx", which is the suffix for the index files themselves.
Please see the rwhois_indexer man page for more details.
To date, purge operations have not been written. However, there are two levels of purging that can be performed: index purges and data purges. Index purges would simply remove index entries marked for deletion and would perhaps merge sort the index files together. This is a fairly safe and efficient operation. Data purges involve rewriting data files to remove deleted records. Once the data files are rewritten, the files must be reindexed, since the position of records within those files may have changed.
V. Authority Areas
A more complete and accurate treatment of authority areas is given the RWhois Version 1.5 specification. This treatment is given to provide some reasoning for the RWhois behavior and configuration options.
An authority area is an identifier for an RWhois database containing data and its schema. It has a hierarchical structure that helps identify the position of the database in the global RWhois data information tree. In the RWhois 1.5 protocol, an authority area has the structure of either a domain name or an IP address in quad-octet prefix/prefix length format. The hierarchical structure of an authority area helps route a query that cannot be resolved locally up or down the tree.
B. Referral Model
There are two types of referrals. When a query is referred up the tree, it is called a punt referral. When a query is referred down the tree, it is called a link referral. The referral model for an RWhois server follows below.
1. Try to parse hierarchical value from the search value in each query term. For example, parse the domain name from an email address.
C. Setting Up Referrals
To set up punt referrals, the RWhois parent file (rwhois.root) must have at least one entry to an RWhois server up the tree. In the sample data, it is a referral to the root RWhois server.
To set up link referrals, the RWhois protocol Version 1.5 defines the referral class. It has the following attributes.
ID: 888.a.com Auth-Area: a.com Guardian: 444.a.com Referral: rwhois://rwhois.second.a.com:4321/auth-area=fddi.a.com Organization: 777.a.com Referred-Auth-Area: fddi.a.com Created: 19961022101010 Updated: 19961023101010 Updated-By: email@example.com --- ID: 822.214.171.124.0/8 Auth-Area: 10.0.0.0/8 Referral: rwhois://rwhois.third.a.com:4321/auth-area=10.1.0.0/16 Referral: rwhois://rwhois.fourth.a.com:4321/auth-area=10.1.0.0/16 Referred-Auth-Area: 10.1.0.0/16 Created: 19961022101010 Updated: 19961023101010 Updated-By: firstname.lastname@example.org
VI. Contacting the Authors
There is a mailing list for discussion of the RWhois protocol and software. Send a message to email@example.com with the word "subscribe" in the body to subscribe. There is a mailing list for RWhois developers as well: send a message to firstname.lastname@example.org with "subscribe" as the body.