Skip Navigation LinksHome > About PeDALS > Architecture

PeDALS Architecture

As much as possible, the system has been designed to be modular. A repository could use pieces of the system, or adopt the system while changing out a particular component. For example, an agency that wanted to run services on Windows boxes instead of Linux could do that. Partners are encouraged to keep systems as nearly identical as possible during the pilot.

The various pieces of the architecture is shown below. Click on a link for more detail, or download a PDF version of the graph.

1. FTP Drop Box5a. LOCKSS Cluster at Your Site
2. BizTalk Middleware Server5b. LOCKSS Cluster at Partner Sites
3. Catalog Database Server6. LOCKSS Administration Server
4. Manifest Server7. Web Portal

1. FTP Drop Box

The Office of Origin deposits records on this box. Alternatively, the repository may deposit records that it received through some other means, especially one-time transfers. Each office of origin has its own directory, and each series from the office has a separate subdirectory. Technical skills: ability to apply patches, ability to create user accounts, ability to create subdirectories. All are straightforward using a graphical interface; not too complicated from a command line prompt. Security: Ideally, offices of origin would be able to ftp through a firewall; should be fairly secure as the transfer would be from a known IP range and on a known port; might be possible to schedule the opening to specific times. Transfer requires basic authentication, which can require secure userid and passwords. Another option would be for the office of origin to transfer the records on physical media and have the repository load the records. The partners would benefit from a basic knowledge of the Linux operating system and FTP services, but the project staff will provide instructions for a standard configuration.

2. BizTalk Middleware Server

The middleware server hosts the rules and oversees their execution. It ties together all the other pieces through the processes listed below. This server can be entirely behind the primary firewall; repositories may want to install a secondary firewall between it and the FTP Drop Box. This server will be a fairly high-powered and run Windows.

  • Listens to the FTP drop box. When it discovers records, it transfers the records off the drop box and initiates processing.

  • Aids in validation of records, ensuring that all records sent by the office of origin (and only those records) were received without corruption.

  • Creates an entry for each record in the accessions register database to support administration of the archives.
  • Transforms the records in submission information package (SIP) to an archival information package (AIP), including transforming and enhancing metadata and adding signatures.

  • Deposits the AIP and publishes the list of AIPs for ingest list on the LOCKSS manifest server.

  • Validates the ingest of AIPs on the LOCKSS server and deletes validated AIPS from the LOCKSS manifest server.

  • For non-restricted records, creates an entry in the public Web Portal search database.

  • For non-restricted records, creates a dissemination information package (DIP) and deposits it on the Web Portal Server.

Partners' IT staff should be familiar with initial configuration of a Windows server; project staff will provide guidance on optimal configuration. Project staff will guide partners' technical staff in the installation of BizTalk software. Partners should plan to learn about BizTalk middleware to ensure that the system can be maintained after the grant.

:: top of page ::

3. Catalog Database Server

The Catalog Database hosts a listing of all records ingested into the system, including core metadata to support administration, discovery, and preservation. The information will be stored in a MS SQL Server database on a fairly powerful Windows server. The server will be protected by firewalls from public access. Partners' IT staff should be familiar with initial configuration of a Windows server; project staff will provide guidance on optimal configuration. Project staff will guide partners' technical staff in the installation of MS SQL Server software. Partners should plan to learn about SQL Server to ensure that the system can be maintained after the grant.

:: top of page ::

4. Manifest Server

As noted above, LOCKSS software pulls information from a website. The Manifest Server is a temporary storage area with a simple website containing lists of records ready for ingest. This system will be running the Linux operating system and a very simple implementation of Apache. Each server in the LOCKSS Dark Archives Cluster must be able to get through firewalls to this box. Firewalls can be configured to accept connections from specific IP addresses, and the web server will be required to require userid and password authentication. The partners would benefit from a basic knowledge of the Linux operating system and Apache server, but the project staff will provide instructions for a standard configuration.

:: top of page ::

5. LOCKSS Dark Archives Cluster

A LOCKSS Dark Archives Cluster consists of seven nodes. Each server replicates the same content, so each record is stored seven times. Servers in the cluster can be geographically dispersed to protect the records against catastrophe at a single location.

Each partner will have its own cluster. The pilot assumes that each partner will store a server at each of the other partners, and that each partner will host one server from the other partners. Firewalls will need to allow the boxes to talk to the LOCKSS Administration Server, to the Manifest Server, and each to each other.

The servers in the cluster are inexpensive boxes running LOCKSS software. LOCKSS is an application that runs on top of OpenBSD. Partners will not need to be familiar with OpenBSD, and installing the LOCKSS software is very simple. Membership in the LOCKSS Alliance provides regular updates of the LOCKSS software, which is also easy to install.

:: top of page ::

6. LOCKSS Administration Server

This server hosts databases that the LOCKSS cluster uses to manage content, including the title database and plug-in repository. As noted above, firewalls should be configured to allow access from each server in the LOCKSS Dark Archives Cluster. The LOCKSS Administration Server runs on an inexpensive Linux box.

:: top of page ::

7. Web Portal

For the pilot, this server includes three key components: copies of records accessible to the public (DIPs), a SQL database containing the core metadata about each record, and a web server. In a production environment, these services will almost certainly need to be separated. The records could be stored on attached storage, and the web server and database would likely reside on different servers. Partners will need to provide backup of this server.