Sunday, September 9, 2007

Volunteer computing using BOINC

VOLUNTEER COMPUTING USING BOINC

Introduction and A Brief history of Volunteer Computing

Scientists have developed accurate mathematical models of the physical universe, and computers programmed with these models can approximate reality at many levels of scale:an atomic nucleus, a protein molecule, the Earth's biosphere, or the entire universe. Using these programs, we can predict the future, validate or disprove theories, and operate "virtual laboratories" that investigate chemical reactions without test-tubes .In general, greater computing power allows a closer approximation of reality. This has spurred the development of computers that are as fast as possible. One way to speed up a computation is to "parallelize" it - to divide it into pieces that can be worked on by separate processors at the same time.In the 1990s two important things happened. First, because of Moore'sLaw, PCs became very fast - as fast as supercomputers only a few years older. Second, the Internet expanded to the consumer market. Suddenly there were millions of fast computers, connected by a network. The idea of using these computers as a parallel supercomputer occurred to many people independently.

In 1999, a project, SETI@home, was launched, with the goal of

detecting radio signals emitted by intelligent civilizations outside Earth. SETI@home acts as a "screensaver", running only when the PC is idle, and providing a graphical view of the work being done. SETI@home's appeal extended beyond hobbyists; it attracted millions of participants from all around the world. It inspired a number of other academic projects, as well as several companies that sought to commercialize the public computing

paradigm.

Volunteer Computing Now

BOINC (Berkeley Open Infrastructure for Network Computing) is a software system that makes it easy for scientists to create and operate public-resource computing projects.

It supports diverse applications, including those with large storage or communication requirements. PC owners can participate in multiple BOINC projects, and can specify

how their resources are allocated among these projects It is being used for applications in

physics, molecular biology, medicine, chemistry, astronomy, climate dynamics, mathematics, and the study of games. There are currently about 40 BOINC based

projects and about 400,000 volunteer computers performing an average of over 400 TeraFLOPS.

GOALS

BOINC’s general goal is to advance the public resource computing paradigm: to encourage the creation of many projects, and to encourage a large fraction of the

world’s computer owners to participate in one or more projects. Specific goals include:

Reduce the barriers of entry to public-resource computing.

BOINC allows a research scientist with moderate computer skills to create and operate a large public-resource computing project with about a week of initial work and an hour per week of maintenance. The server for a BOINCbased project can consist of a single machine onfigured with common open-source software (Linux, Apache, PHP, MySQL, Python).

Share resources among autonomous projects.

BOINC-based projects are autonomous. Projects are not centrally authorized or registered. Each project operates its own servers and stands completely on its own. Nevertheless,PC owners can seamlessly participate in multipleprojects, and can assign to each project a resource share determining how scarce resource (such as CPU and disk space) are divided among projects. If most participants register with multiple projects, then overall resource utilization is improved: while one project is closed for repairs, other projects temporarily inherit its computing power. On a particular computer, the CPU might work for one project while the network is ransferring files for another.

Support diverse applications.

BOINC accommodates a wide range of applications; it provides flexible and scalable

mechanism for distributing data, and its scheduling algorithms intelligently match requirements with resources.Existing applications in common languages (C, C++, FORTRAN)can run as BOINC applications with little or no modification. An application can consist of several files(e.g. multiple programs and a coordinating script). New

versions of applications can be deployed with no participant involvement.

Reward participants.

Public-resource computing projects must provide incentives in order to attract and

retain participants. The primary incentive for many participants is credit: a numeric measure of how much computation they have contributed. BOINC provides a credit accounting system that reflects usage of multiple resource types (CPU, network, disk), is common across multiple projects, and is highly resistant to cheating (attempts to

gain undeserved credit). BOINC also makes it easy for projects to add visualization graphics to their applications, which can provide screensaver graphics.

DESIGN AND STRUCTURE OF BOINC

BOINC is designed to be a free structure for anyone wishing to start a distributed computing project. BOINC consists of a server system and client software that communicate with each other to distribute, process, and return work units.

Technological innovation

The most recent versions of BOINC client and server have incorporated BitTorrent file sharing technology into the application distribution subsystem. The application distribution subsystem is different from the work unit distribution subsystem at the server end -- but not at the client end.With BitTorrent fully in place by clients and servers late-2007, great savings are expected in the telecommunication cost structures of the current server user base.

Server structure

A major part of BOINC is the backend server. The server can be run on one or many machines to allow BOINC to be easily scalable to projects of any size. BOINC servers run on Linux based computers and use Apache, PHP, and MySQL as a basis for its web and database systems.

BOINC does no scientific work itself. Rather, it is the infrastructure which downloads distributed applications and input data (work units), manages scheduling of multiple BOINC projects on the same CPU, and provides a user interface to the integrated system.

Scientific computations are run on participants' computers and results are analyzed after they are uploaded from the user PC to a science investigator's database and validated by the backend server. The validation process involves running all tasks on multiple contributor PCs and comparing the results.

Another feature provided by these servers are

homogeneous redundancy (sending work units only to computers of the same platform -- e.g.: Win XP SP2 only.) work unit trickling (sending information to the server before the work unit completes) locality scheduling (sending work units to computers that already have the necessary files and creating work on demand) work distribution based on host parameters (work units requiring 512 MB of RAM, for example, will only be sent to hosts having at least that much RAM)

Client structure

BOINC on the client is structured into a number of separate applications. These intercommunicate using the BOINC remote procedure call (RPC) mechanism.

These component applications are:

  • The program boinc (or boinc.exe) is the core client.

The core client is a process which takes care of communications between the client and the server. The core client also downloads science applications, provides a unified logging mechanism, makes sure science application binaries are up-to-date, and schedules CPU resources between science applications (if several are installed).

Although the core client is capable of downloading new science applications, it does not update itself. BOINC's authors felt doing so posed an unacceptable security risk, as well as all of the risks that automatic update procedures have in computing. On Unix, the core client is generally run as a daemon (or occasionally as a cron job). On Windows, BOINC initially was not a Windows service, but an ordinary application. BOINC Client for Windows, Versions 5.2.13 and higher add, during installation, the option of "Service Installation". Depending on how the BOINC client software was installed, it can either run in the background like a daemon, or starts when an individual user logs in (and is stopped when the user logs out). The software version management and work-unit handling provided by the core client greatly simplifies the coding of science applications.

One or several science applications. Science applications perform the core scientific computation. There is a specific science application for each of the distributed computation projects which use the BOINC framework. Science applications use the BOINC daemon to upload and download work units, and to exchange statistics with the server.

  • boincmgr (or boincmgr.exe), a GUI which communicates with the core application over RPC (remote procedure call). By default a core client only allows connections from the same computer, but it can be configured to allow connections from other computers (optionally using password authentication); this mechanism allows one person to manage a farm of BOINC installations from a single workstation. A drawback to the use of RPC mechanisms is that they are often felt to be security risks because they can be the route by which hackers can intrude upon targeted computers (even if it's configured for connections from the same computer).

The GUI is written using the cross-platform WxWidgets toolkit, providing the same user experience on different platforms. Users can connect to BOINC core clients, can instruct those clients to install new science applications, can monitor the progress of ongoing calculations, and can view the BOINC system message logs.

  • The BOINC screensaver. This provides a framework whereby science applications can display graphics in the user's screensaver window. BOINC screensavers are coded using the BOINC graphics API, Open GL, and the GLUT toolkit. Typically BOINC screensavers show animated graphics detailing the work underway, perhaps showing graphs or charts or other data visualisation graphics.

Some science applications do not provide screensaver functionality (or stop providing screensaver images when they are idle). In this circumstance the BOINC screensaver shows a small BOINC logo which bounces around the screen.

In Mac OS X, the program is able to dynamically take up extra processor speed while you work, varying how much processor time BOINC receives based on how intensively the computer is being used.

A BOINC network is similar to a hacker/spammers botnet. In BOINC's case, however, it is hoped that the software is installed and operated with the consent of the computer's owner. Since BOINC has features that can render it invisible to the typical user, there is risk that unauthorized and difficult to detect installations may occur. This would aid the accumulation of Boinc-credit points by hobbyists who are competing with others for status within the BOINC-credit subculture.

PROJECTS USING BOINC FRAMEWORK

The BOINC platform is currently the most popular volunteer-based distributed computing platform. Some examples For popular Projects are

Performance of BOINC projects:

  • over 1,021,000 participants
  • over 1,980,000 computers
  • over 550 TeraFLOPS (more than supercomputer BlueGene)
  • over 12 Petabytes of free disk space
  • SETI@home: 2.7 million years of computer time (2006)

FUTURE OF VOLUNTEER COMPUTING

The majority of the world's computing power is no longer in supercomputer centers and institutional machine rooms. Instead, it is now distributed in the hundreds of millions of personal computers all over the world. This change is critical to scientists whose research requires extreme computing power.The number of Internet-connected PCs is growing rapidly, and is projected to reach 1 billion by 2015. Together, these PCs could provide

many PetaFLOPs of computing power. The public resource approach applies to storage as well as computing.If 100 million computer users each provide 10 Gigabytes of

storage, the total (one Exabyte, or 1018 bytes) would exceed the capacity of any centralized storage system.

REFERENCES

http://boinc.berkeley.edu/

http://en.wikipedia.org/wiki/Boinc

http://en.wikipedia.org/wiki/List_of_distributed_computing_projects

http://www.boincstats.com/