# control-archive

Copyright 2002-2004, 2007-2014, 2016-2018 Russ Allbery <eagle@eyrie.org>.
Copyright 2001 Marco d'Itri.  Copyright 1996 UUNET Technologies, Inc..
This software is distributed under a BSD-style license.  Please see the
section [License](#license) below for more information.

## Blurb

This software generates an INN control.ctl configuration file from
hierarchy configuration fragments, verifies control messages using GnuPG
where possible, processes new control messages to update a newsgroup list,
archives new control messages, and exports the list of newsgroups in a
format suitable for synchronizing the newsgroup list of a Netnews news
server.  It is the software that maintains the control message and
newsgroup lists available from ftp.isc.org.

## Description

This package contains three major components:

* All of the configuration used to generate a `control.ctl` file for INN
  and the `PGPKEYS` and `README.html` files distributed with pgpcontrol,
  along with the script to generate those files.

* Software to process control messages, verify them against that
  authorization information, and maintain a control message archive and
  list of active newsgroups.  Software is also included to generate
  reports of recent changes to the list of active newsgroups.

* The documentation files included in the control message archive and
  newsgroup lists on ftp.isc.org.

Manual changes to the canonical newsgroup list are supported in a way that
generates the same log messages and uses the same locking structure so
that they can co-exist with automated changes and be included in the same
reports.

This is the software that generates the [active newsgroup
lists](ftp://ftp.isc.org/pub/usenet/CONFIG/) and [control message
archive](ftp://ftp.isc.org/pub/usenet/control/) hosted on ftp.isc.org, and
the source of the `control.ctl` file provided with INN.

For a web presentation of the information recorded here, as well as other
useful information about Usenet hierarchies, please see the [list of
Usenet managed hierarchies](http://usenet.trigofacile.com/hierarchies/).

## Requirements

Perl 5.6 or later plus the following additional Perl modules are required:

* Compress::Zlib (included in Perl 5.10 and later)
* Date::Parse (part of TimeDate)
* Net::NNTP (included in Perl 5.8 and later)
* Text::Template

[gzip](https://www.gnu.org/software/gzip/) and
[bzip2](http://www.bzip.org/) are required.  Both are generally available
with current operating systems, possibly as supplemental packages.

process-control expects to be fed file names and message IDs of control
messages on standard input and therefore needs to be run from a news
server or some other source of control messages.  A minimalist news server
like tinyleaf is suitable for this (I wrote tinyleaf, available as part of
[INN](https://www.eyrie.org/~eagle/software/inn/), for this purpose).

## Versioning

This package uses a three-part version number.  The first number will be
incremented for major changes, major new functionality, incompatible
changes to the configuration format (more than just adding new keys), or
similar disruptive changes.  For lesser changes, the second number will be
incremented for any change to the code or functioning of the software.  A
change to the third part of the version number indicates a release with
changes only to the configuration, PGP keys, and documentation files.

## Layout

The configuration data is in one file per hierarchy in the `config`
directory.  Each file has the format specified in FORMAT and is designed
to be readable by INN's new configuration parser in case this can be
further automated down the road.  The `config/special` directory contains
overrides, raw `control.ctl` fragments that should be used for particular
hierarchies instead of automatically-generated entries (usually for
special comments).  Eventually, the format should be extended to handle as
many of these cases as possible.

The `keys` directory contains the PGP public keys for every hierarchy that
has one.  The user IDs on these keys must match the signer expected by the
configuration data for the corresponding hierarchy.

The `forms` directory contains the basic file structure for the three
generated files.

The `scripts` directory contains all the software that generates the
configuration and documentation files, processes control messages, updates
the database, creates the newsgroup lists, and generates reports.  Most
scripts in that directory have POD documentation included at the end of
the script, viewable by running perldoc on the script.

The `templates` directory contains templates for the `control-summary`
script.  These are the templates I use myself.  Other installations should
customize them.

The `docs` directory contains the extra documentation files that are
distributed from ftp.isc.org in the control message archive and newsgroup
list directories, plus the DocKnot metadata for this package.

## Installation

This software is set up to run from `/srv/control`.  To use a different
location, edit the paths at the beginning of each of the scripts in the
`scripts` directory to use different paths.  By default, copying all the
files from the distribution into a `/srv/control` directory is almost all
that's needed.  An install rule is provided to do this.  To install the
software, run:

```sh
    make install
```

You will need write access to `/srv/control` or permission to create it.

`process-control` and `generate-files` need a GnuPG keyring containing all
of the honored hierarchy keys.  To generate this keyring, run `make
install` or:

```sh
    mkdir keyring
    gpg --homedir=keyring --allow-non-selfsigned-uid --import keys/*
```

from the top level of this distribution.  `process-control` also expects a
`control.ctl` file in `/srv/control/control.ctl`, which can be generated
from the files included here (after creating the keyring as described
above) by running `make install` or:

```sh
    scripts/generate-files
```

Both of these are done automatically as part of `make install`.
process-control expects `/srv/control/archive` to exist and archives
control messages there.  It expects `/srv/control/tmp` to exist and uses
it for temporary files for GnuPG control message verification.

To process incoming control messages, you need to run `process-control` on
each message.  `process-control` expects to receive, on standard input,
lines consisting of a path to a file, a space, and a message ID.  This
input format is designed to work with the tinyleaf server that comes with
INN 2.5 and later, but it should also work as a channel feed from
pre-storage-API versions of INN (1.x).  It will not work without
modification via a channel feed from a current version of INN, since it
doesn't understand the storage API and doesn't know how to retrieve
articles by tokens.  This could be easily added; I just haven't needed it.

If you're using tinyleaf, here is the setup process:

1. Create a directory that tinyleaf will use to store incoming articles
   temporarily, the archive directory, and the logs directory and install
   the software:

   ```sh
       make install
   ```

2. Run tinyleaf on some port, configuring it to use that directory and to
   run process-control.  A typical tinyleaf command line would be:

   ```sh
       tinyleaf /srv/control/spool /srv/control/scripts/process-control
   ```

   I run tinyleaf using systemd, but any inetd implementation should work
   equally well.

3. Set up a news feed to the system running tinyleaf that sends control
   messages of interest.  You should be careful not to send cancel control
   messages or you'll get a ton of junk in your logs.  The INN newsfeeds
   entry I use is:

   ```
       isc-control:control,control.*,!control.cancel:Tf,Wnm:
   ```

   combined with nntpsend to send the articles.

That should be all there is to it.  Watch the logs directory to see what
happens for incoming messages.

`scripts/process-control` just maintains a database file.  To export that
data in a format that's useful for other software, run
`scripts/export-control`.  This expects a `/srv/control/export` directory
into which it stores active and newsgroups files, a copy of the
`control.ctl` file, and all of the logs in a `LOGS` subdirectory.  This
export directory can then be made available on the web, copied to another
system, or whatever else is appropriate.  Generally,
`scripts/export-control` should be run periodically from cron.

Reports can be generated using `scripts/control-summary`.  This script
needs configuration before running; see the top of the script and its
included POD documentation.  There is a sample template in the `templates`
directory, and `scripts/weekly-report` shows a sample cron job for sending
out a regular report.

## Bootstrapping

This package is intended to provide all of the tools, configuration, and
information required to duplicate the ftp.isc.org control message archive
and newsgroup list service if you so desire.  To set up a similar service
based on that service, however, you will also want to bootstrap from the
existing data.  Here is the procedure for that:

1. Be sure that you're starting from the latest software and set of
   configuration files.  I will generally try to make a new release after
   committing a batch of changes, but I may not make a new release after
   every change.  See the sections below for information about the Git
   repository in which this package is maintained.  You can always clone
   that repository to get the latest configuration (and then merge or
   cherry-pick changes from my repository into your repository as you
   desire).

2. Download the current newsgroup list from:

       ftp://ftp.isc.org/pub/usenet/CONFIG/newsgroups.bz2

   and then bootstrap the database from it:

   ```sh
       bzip2 -dc newsgroups.bz2 | scripts/update-control bulkload
   ```

3. If you want the log information so that your reports will include
   changes made in the ftp.isc.org archive before you created your own,
   copy the contents of ftp://ftp.isc.org/pub/usenet/CONFIG/LOGS/ into
   `/srv/control/logs`.

4. If you want to start with the existing control message repository,
   download the contents of ftp://ftp.isc.org/pub/usenet/control/ into
   `/srv/control/archive`.  You can do this using a recursive download
   tool that understands FTP, such as wget, but please use the options
   that add delays and don't hammer the server to death.

After finishing those steps, you will have a copy of the ftp.isc.org
archive and can start processing control messages, possibly with different
configuration choices.  You can generate the files that are found in
ftp://ftp.isc.org/pub/usenet/CONFIG/ by running `scripts/export-control`
as described above.

## Maintenance

To add a new hierarchy, add a configuration fragment in the `config`
directory named after the hierarchy, following the format of the existing
files, and run `scripts/generate-files` to create a new `control.ctl`
file.  See the documentation in `scripts/generate-files` for details about
the supported configuration keys.

If the hierarchy uses PGP-signed control messages, also put the PGP key
into the `keys` directory in a file named after the hierarchy.  Then, run:

```sh
    gpg --homedir=keyring --import keys/<hierarchy>
```

to add the new key to the working keyring.

The first user ID on the key must match the signer expected by the
configuration data for the corresponding hierarchy.  If a hierarchy
administrator sets that up wrong (usually by putting additional key IDs on
the key), this can be corrected by importing the key into a keyring with
GnuPG, using `gpg --edit-key` to remove the offending user ID, and
exporting the key again with `gpg --export --ascii`.

When adding a new hierarchy, it's often useful to bootstrap the newsgroup
list by importing the current checkgroups.  To do this, obtain the
checkgroups as a text file (containing only the groups without any news
headers) and run:

```sh
    scripts/update-control checkgroups <hierarchy> < <checkgroups>
```

where <hierarchy> is the hierarchy the checkgroups is for and
<checkgroups> is the path to the checkgroups file.

## Support

The [control-archive web
page](https://www.eyrie.org/~eagle/software/control-archive/) will always
have the current version of this package, the current documentation, and
pointers to any additional resources.

Configuration updates should be sent to usenet-config@isc.org.

For bug tracking, use the [issue tracker on
GitHub](https://github.com/rra/control-archive/issues).  Please be aware
that I tend to be extremely busy and work projects often take priority.
I'll save your report and get to it as soon as I can, but it may take me a
couple of months.

## Source Repository

control-archive is maintained using Git.  You can access the current
source on [GitHub](https://github.com/rra/control-archive) or by cloning
the repository at:

https://git.eyrie.org/git/usenet/control-archive.git

or [view the repository on the
web](https://git.eyrie.org/?p=usenet/control.archive.git).

The eyrie.org repository is the canonical one, maintained by the author,
but using GitHub is probably more convenient for most purposes.  Pull
requests are gratefully reviewed and normally accepted.

## License

The control-archive package as a whole is covered by the following
copyright statement and license:

> Copyright 2002-2004, 2007-2014, 2016-2018
>     Russ Allbery <eagle@eyrie.org>
>
> Copyright 2001
>     Marco d'Itri
>
> Copyright 1996
>     UUNET Technologies, Inc.
>
> Permission is hereby granted, free of charge, to any person obtaining a
> copy of this software and associated documentation files (the "Software"),
> to deal in the Software without restriction, including without limitation
> the rights to use, copy, modify, merge, publish, distribute, sublicense,
> and/or sell copies of the Software, and to permit persons to whom the
> Software is furnished to do so, subject to the following conditions:
>
> The above copyright notice and this permission notice shall be included in
> all copies or substantial portions of the Software.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS IN THE SOFTWARE.
>
> This product includes software developed by UUNET Technologies, Inc.

Some files in this distribution are individually released under different
licenses, all of which are compatible with the above general package
license but which may require preservation of additional notices.  All
required notices, and detailed information about the licensing of each
file, are recorded in the LICENSE file.

Files covered by a license with an assigned SPDX License Identifier
include SPDX-License-Identifier tags to enable automated processing of
license information.  See https://spdx.org/licenses/ for more information.

For any copyright range specified by files in this package as YYYY-ZZZZ,
the range specifies every single year in that closed interval.
