This guide will give you a quick run down of the architecture of the system and all the salient points. We'll go from a front to back route starting with the Rails app before heading to the backend. If you're vaguely familiar with web application frameworks you should be able to follow this. However you may wish to read the wikipedia article on MVC before we proceed: MVC

Front End#

The front end is entirely built in Rails and has 2 major components to worry about; the application and the daemon.

Rails Application: NewPred#

The application is called NewPred and can be downloaded from our code repository. The principle components that you need to know about are the migrations, the models, the controllers and the views.


In Rails migrations are instructions sets for building and editing database tables. ALL database tables must be accompanied by a Model which is named after the table you've created. There are command line tools for creating Model files and their accompanying migrations, or you can follow the naming style.


Ruby files in the Models directory hold all the methods that are associated with the appropriately named table in your underlying database (as per the migrations you created). Note how the instructions at the top of any model describe the relationships between the tables created by your migrations. Within the application you can essentially retrieve rows of any database table as an object, such that the columns in the database table are transformed in to the data fields in your object, where the functions in the named model can be thought of as the methods. By far the most significant of these in the NewPred application is the job.rb migration/model. This table is the heart of the entire NewPred application, this table describes the incoming jobs which we are going to process for users. Importantly it also handles data verification and validation, this can be found in 2 locations, at the top there are a number of validates_x and requires_x statements and at the bottom of the file is a validate function. Any data to be inserted in to the jobs table must pass all these validations otherwise the app will return an error.

Really importantly the NewPred application does not function like a typical Rails application. The underlying database is used solely as an object persistence layer. While users can indeed insert new jobs in to a single table they essentially get no ability to interact with that data at any later point. This is unlike what is going on at twitter or facebook where users have a fair amount of ability to interact, update and edit the database/model. For NewPred, the data gets stored purely so the application can retrieve and use it at a later point. Possibly a bit of overkill to use a database but it's not like we're going to rebuild the whole thing in Clojure any time soon.


Controllers run the code to broker user interaction, handle routing and translate incoming data in to a clean format that can the model can insert in to the database. Not much to say for NewPred except we really only have 5 main controllers (ok Maybe 6 if you count the casp stuff). Worth noting that when people submit new jobs the appropriate controller handles dispatching the data to the backend processing nodes.

psipredThe controller for the main psipred sequence analysis homepage
psipredtestThe controller for any beta sequence based service
structureThe controller for the main structure analysis homepage
strucgturetestThe controller for the beta structure based service
simple modellerThe controller for the old MODELLER API that CATH wanted and never used
bio_serfThe controller for the CASP entry servers


This is just the stuff we show to the user, the index page and the results pages. Could do with a little refactoring and tidying the results, submission and ongoing pages could all be removed in favour or a results page that handles all three types. Other than that it should be fairly obvious what is going on. The code is written in Rail's take on the whole TemplateToolkit idea mostly it's quite readable but it can become a bit of a hideous mess in places

The Daemon: The runner#

Technically this is just part of the Rails app but it runs as it's own service so is worth dealing with separately. What this does is that it runs as a daemon which polls the jobs table/model looking for as yet unfinished jobs, those with states that are not 3 or 4. It then, in turn sends and XML RPC request to the relevant backend node asking for any data or files which have been generated that needs to be stored in the database. Anything produced that needs to be passed/shown to the user is received and inserted in to the request_results table. The code that the runner is running can be found in the job.rb model. The method where the action starts is the Job.poll_all_loop, you'll note that it just loops over 2 function calls with a 15second pause between each. Control flow follows this path poll_all_loop->poll_all_status->poll_status->a number of functions

poll_all_loopRuns manages the infinite loop which makes this function a daemon, calls poll_all_status
poll_all_statusMakes a list of all jobs that are in a running state and calls poll_status
poll_statusTakes the list of jobs and makes an RPC request to each backend node for any data, email_handler and job_status_handler are the key functions
job_status_handlerTakes the parse RPC data and given the item number inserts it in to the database request_results table
email_handlerIf a job has finished, it emails the user, if they left an email address

job_status_handler is worth dwelling on. This lists a large number of status codes which refer to different types of data for different analyses that the backend can perform, these are all defined in the backend's org.ucl.shared.jobStatus class. A small note is given with each response type. Note that it is here that we insert the cached psiblast results in to the check_point_caches table.


The Backend code is a Java XML-RPC server which receives XML-RPC requests with job data from the controllers at the front end and returns data when finished on requests from the runner.



The main loop for the application can be found in org.ucl.newpredserver. This is a fairly straight forward class that initialises an XML-RPC servelet and then sits and listens for XML-RPC requests. Principally it initialises and brings up the npsimpleserver class.


This is the central class the initialises the server's behaviour. Note that the class constructor creates a pool of worker threads that we are going to run every analysis process in. Right now we only have 2 analysis process (some more if you count the experimental CASP ones). The principal XML-RPC action/method this class exposes is addJob. This method receives the complete job configuration stored in the frontend, merges it with any default config it knows of, next creates a new instance of the job type that was asked for (structJob or seqJob) and submits it to the pool of workers.

org.ucl.newpredserver.seqJob org.ucl.newpredserver.structJob#

These are the two main job running classes one for sequence jobs and one for structure analysis jobs. They are fairly self explanatory, they parse the configuration data that came from the frontend, create an instance of the appropriate classes in org.ucl.servertasks, set whichever set of analyses they are going perform, then run each of the analyses in turn.

Logical Flow of the Application#

So here's the overall logical flow of the application

1) As Admin we add all the backend configurations information to the configurations interface available on the frontend. The backend NewPredServer somewhat strangely keeps hold of no information about where any data, database or executables are
2) We make the service live for users
3) Users submit sequence (or structure data) via the website
4) The appropriate controller (psipred for sequences) establishes whichever settings need to be set, saves the incoming data to the jobs table, saves any setting overrides to the overrides table then calls the submit to backend function
5) The submit function reads all the config options out of the database for the appropriate job (seqJob for sequences), and reads the user submission data from the job table then formats a new XML-RPC query for the backend to submit to the addJob action.
6) At the backend it recieves the addJob request and sends it to the npsimpleserver class, this checks that the job type exists (seqJob) and if so adds the job and all the incoming data and settings to the job queue
7) The seqJob class then unpacks the data and runs each of the executables it needs to in turn by calling the appropriate shared tasks.
8) Meanwhile the runner continually polls the back end asking for any calculated data that may be available
9) Each polled backend server replies with any data file that may be ready by formatting and XML-RPC message that contains the data
10) The runner takes the XML-RPC request unpacks the data, looks up the type data in job_status_handler and if it can handle it inserts it in to the database
11) Eventually a runner request for a given job may be passed the "job finished" XML response
12) The runner handles this like the other job_status_handler info but also sets the job status in the job table to complete (status 4)
13) The user can now surf to the results page, if the status is 4 the results will be rendered by the results method in the controller, if the job is not finished they will see an ongoing message

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-1) was last changed on 12-Feb-2013 18:17 by UnknownAuthor