[NTLUG:Discuss] Distributed processing

Wed Jul 18 21:34:28 CDT 2001

> 
> On Tue, 17 Jul 2001 20:07:03 CDT, the world broke into rejoicing as
> Greg Edwards <greg at nas-inet.com>  said:
> > I've been searching for tools that will do distibuted processing at the
> > function level and haven't had much luck.  There are plenty of
> > distributed load managers and parallel processing managers (such as
> > Beowolf).  The load managers work at the program level and parallel
> > process managers at the calculation level.  Neither of these solutions
> > answer my needs and multi-threaded has too many drawbacks for a solution
> > here.  What I need is a distribution manager that will pass the load at
> > the procedural level.
> >

cbbrowne at hex.net wrote:
> 
> This sounds just like a "message queueing" application.
> 
> The entry application reaches the point where it needs data; it then
> submits a request to for data, throwing it into the "Data Request Queue."
> 
> It proceeds with something else.
> 

Chris Cox <cjcox at acm.org> wrote:
>
> This sounds like a typical distributed transaction processing
> scenario.  Nowadays, you would use a Web Application Server...
> like Weblogic (maybe WebSphere).  I am somewhat familiar with
> what Weblogic can do and it supports the idea of application

Stephen Davidson <NorthGorky at earthlink.net> wrote:
>
> Orion will be supporting Clustering/Load Balancing in a release due out
> in August/September.  The project I am on is going with Orion for that
> reason.  They won't be ready for production until late September, which
> means that they can live with this timing.

The concept is along the lines of a distributed transaction environment
but the web server would not be the process controlling point.  I think
the overall environment will not be conducive to a "message queue"
solution due to the diversity of specialties each host will be able to
support and the overhead of queue controls.  This is also not a typical
load balancing problem.

A little expansion on the environment.  Consider that this site handles
maybe a dozen different interactive web applications (not e-tailer). 
Say a card game (hearts for example), data warehouse applications,
educational applications, multi-player adventure games, etc.

The web server (apache) would accept a page request and pass it into the
system request monitor (I'm calling it an application farm for not
having a better term that comes to mind).  The web server will do
nothing more than accept requests and serve pages.  The CGI (PHP, Java,
Python, or other scripting lang) would do nothing more than accept the
post, collect the input, pass the request on, once the system returns
the results request that the page be generated and pass the (now static)
page back to the user.  The page may be generated on the web server or
transferred back from the application farm (maybe a page generation
server).

Nothing special so far.

Once inside the farm the system request monitor would determine which
application is involved and pass the data along.  The application would
begin processing the data.  The applications would work on parts of the
data, some concurrently, some parallel, and some monolithic.  Rather
than a set of functions and procedures being part of a single
application they would be independent processes using IPCs (message
queues, shared memory, network transfers) to pass data and return
results.

Each host would be assigned a set of functions that it would support and
not a set of applications.  This does not fit the load balance or
clustering model very well.  Some hosts could act as database servers,
some in a Beowulf cluster, some as game tree processors, some as search
engines (these are just examples).  Not every host would support every
request that could be generated.  This makes for a poor fit for a strict
message queue solution as you don't want to send the queue to hosts that
are non players in the logic thread.

To expand (scale) the farm would be a simple matter of adding another
system to the network and activating the desired functions on that
host.  As a particular application sees heavier traffic more hosts that
support functions needed by that application can take more of the load
while applications that are currently supporting fewer users can reduce
the number of active hosts.  Dropping a host out of the farm (for maint)
would simply be a matter of blocking new requests until all active
requests have completed and then shutting down the server.

This is not a new problem that hasn't been solved many times.  I spent
many years doing embedded real time systems and this kind of problem was
a constant and being solved by many different methods.  Especially in
multiple card systems (like switches) where each card has its own
special abilities and functions.

Now the kicker:)  I'm building this on sweat equity!!  High cost
solutions are not in the picture cause I simply don't have the cash. 
What I do have is the knowledge to build it, but that takes lots of
time.  It'd be nice if I can find it in the free software world but I
haven't yet.  Even if I needed to fork over a couple hundred I'd
consider that.

I know a lot of details about why I'd want to make the farm this
distributed and (as some would to say) so highly decoupled, are not
included but this is long winded enough.  I really don't want to
reinvent the wheel but I'm not sure this wheel exists in the free or low
cost software tool worlds.  Then again maybe I'm trying to work too
hard:)

Any comments, help, opinions, "your nuts":), or what ever would be
helpful.

Thanks,

-- 
Greg Edwards
New Age Software, Inc.
http://www.nas-inet.com