Cyberinfrastructure for Phylogenetic Research (CIPRes)


Operational Start Date: 2003 End Date: 2008

  Community Infrastructure Development  

  Distributed Research Center  

  Evolutionary biology, Phylogeny  

The goal of CIPRes is to establish the parameters of the cyberinfrastructure to reconstruct the evolutionary history, or the "Tree of Life," of all species on the planet. The project is a collaboration between biologists and computer scientists and has an almost equal number of researchers from each domain. Many of the biologists involved in CIPRes are accomplished software developers who created programs that are widely used in their scientific community. One of the aims of CIPRes is to bring together these dispersed efforts and to build on them by employing the latest advances in algorithm development and optimization. CIPRes is committed to providing open-source software.

CIPRes faces significant computational challenges. Current algorithms are able to process a few hundred species, but the number of described species is over a million, and biologists estimate that 10-20 million species actually exist. Lack of data is another challenge in the effort to reconstruct evolutionary history. Even though most species have yet to be described, technology for genotyping and sequencing is moving quickly, and it is possible to conceive of the day when much more data will be available. Until that time comes, however, CIPRes must simulate data in order to develop a cyberinfrastructure that can one day build a Tree of Life that includes millions of species.

CIPRes is a very distributed project and is organized into five main groups. The goal of the simulation group is to constantly refine models of evolution and to use these models to provide the simulated datasets needed to benchmark project efforts. The algorithm group devotes its efforts to long-term thinking about how to scale up reconstruction processes and solve difficult optimization problems. Members of the software architecture group guide the overall software development for the entire project, and a team of professional programmers implements the vision of the software architects. The outreach group's mission is to help all age levels understand the concept of evolution.

An Executive Committee oversees the general activities of CIPRes. This eight-member committee meets monthly and is comprised of the Principal Investigators (PI) from the five lead institutions and the leaders from the database, simulation, and outreach groups. The PI at the University of New Mexico serves as the Director; he receives administrative and accounting support from a part-time project manager. Day-to-day activities are the responsbility of the focus group leaders, who organize their own meetings and report each month to the Executive Committee. An Advisory Committee helps CIPRes assess its short- and long-term goals and overall direction.


  CIPRes is in the process of acquiring a large machine. The exact specifications of the machine will not be known until a contract is awarded; however, it is expected to have 100-200 processors and at least 300GB of memory. It will also have a database server. The machine's main purpose is to test CIPRes models and algorithms, but it will also be available to the entire community, so that researchers not part of CIPRes can benchmark their own approaches, and so that biologists not interested in software development or installation can employ a web server to analyze their data using the CIPRes platform. The machine should be installed in the first quarter of 2005 and open to community access by summer 2005.  

  Project documents, focus group email archives, presentations, slide templates, and desgin documents are available on the CIPRes intranet.  

  Most of the biologists in the project are also competent computer programmers, so they are able to contribute to the software development. Computer scientists are accustomed to thinking in terms of models, so they can help propose models of evolution to compute with. Each domain is able to contribute to the other, which makes CIPRes a very interdisciplinary project.

CIPRes participants are widely distributed, so they communicate primarily by phone conferences and email. Project-wide meetings are held twice each year.

CIPRes is supported by the NSF Information Technology Research program; it is a large ITR project. It was funded through NSF's collaborative funding mechanism, which allows investigators from two or more institutions to collaborate on a unified research project. The University of New Mexico is the lead institution. The other institutions that submitted proposals are Florida State University, University of California-Berkeley, University of California-San Diego, and University of Texas at Austin. There is a Principal Investigator at each of these locations. Subcontracts were made to eight other participating institutions.

More institutions are likely to join in the future because some of the students who were instrumental in developing parts of the project are graduating and moving to other universities. The project also has collaborators from Europe, New Zealand, and Singapore. Foreign participants do not receive funds except for travel to meetings.


  The primary communication technologies used are email and phone. The Executive Committee (EC) tried to use videoconferencing, but they found that it was expensive and that the technology did not work well with eight people. Most members of the EC have direct access to the Grid, so they also tried to conference using the Grid, but it is not yet mature enough to support this kind of activity.  

  A key deliverable of the CIPRes project is platform-independent software that can be downloaded and installed in a PC or a large supercomputer. A current version of the software is available at:  
