Science of Collaboratories logo - link
 
 
 
An alliance to advance the understanding of collaboratories
Science of Collaboratories
   
   

Return to the list of Collaboratory Projects

 
   
 

Name of Collaboratory :

 

Cyberinfrastructure for Phylogenetic Research (CIPRes)

 
 

URL :

  http://www.phylo.org  
 

Collaboratory Status :

 
Operational   Start Date : 2003 End Date : 2008 Info Last Updated : Wed, Dec 8 2004 10:01pm PST
 
 

Primary Collaboratory Function :

  Community Infrastructure Development  
 

Secondary Collaboratory Functions :

  Distributed Research Center  
 

Domain(s) :

  Evolutionary biology, Phylogeny  
 

Brief Description of the Collaboratory :

 

The goal of CIPRes is to establish the parameters of the cyberinfrastructure to reconstruct the evolutionary history, or the "Tree of Life," of all species on the planet. The project is a collaboration between biologists and computer scientists and has an almost equal number of researchers from each domain. Many of the biologists involved in CIPRes are accomplished software developers who created programs that are widely used in their scientific community. One of the aims of CIPRes is to bring together these dispersed efforts and to build on them by employing the latest advances in algorithm development and optimization. CIPRes is committed to providing open-source software.

CIPRes faces significant computational challenges. Current algorithms are able to process a few hundred species, but the number of described species is over a million, and biologists estimate that 10-20 million species actually exist. Lack of data is another challenge in the effort to reconstruct evolutionary history. Even though most species have yet to be described, technology for genotyping and sequencing is moving quickly, and it is possible to conceive of the day when much more data will be available. Until that time comes, however, CIPRes must simulate data in order to develop a cyberinfrastructure that can one day build a Tree of Life that includes millions of species.

CIPRes is a very distributed project and is organized into five main groups. The goal of the simulation group is to constantly refine models of evolution and to use these models to provide the simulated datasets needed to benchmark project efforts. The algorithm group devotes its efforts to long-term thinking about how to scale up reconstruction processes and solve difficult optimization problems. Members of the software architecture group guide the overall software development for the entire project, and a team of professional programmers implements the vision of the software architects. The outreach group's mission is to help all age levels understand the concept of evolution.

An Executive Committee oversees the general activities of CIPRes. This eight-member committee meets monthly and is comprised of the Principal Investigators (PI) from the five lead institutions and the leaders from the database, simulation, and outreach groups. The PI at the University of New Mexico serves as the Director; he receives administrative and accounting support from a part-time project manager. Day-to-day activities are the responsbility of the focus group leaders, who organize their own meetings and report each month to the Executive Committee. An Advisory Committee helps CIPRes assess its short- and long-term goals and overall direction.


 
 

Access to Instruments :

  CIPRes is in the process of acquiring a large machine. The exact specifications of the machine will not be known until a contract is awarded; however, it is expected to have 100-200 processors and at least 300GB of memory. It will also have a database server. The machine's main purpose is to test CIPRes models and algorithms, but it will also be available to the entire community, so that researchers not part of CIPRes can benchmark their own approaches, and so that biologists not interested in software development or installation can employ a web server to analyze their data using the CIPRes platform. The machine should be installed in the first quarter of 2005 and open to community access by summer 2005.  
 

Access to Information Resources :

  Project documents, focus group email archives, presentations, slide templates, and desgin documents are available on the CIPRes intranet.  
 

Access to People as Resources :

  Most of the biologists in the project are also competent computer programmers, so they are able to contribute to the software development. Computer scientists are accustomed to thinking in terms of models, so they can help propose models of evolution to compute with. Each domain is able to contribute to the other, which makes CIPRes a very interdisciplinary project.

CIPRes participants are widely distributed, so they communicate primarily by phone conferences and email. Project-wide meetings are held twice each year.
 
 

Funding Agency or Sponsor :

   
 
 

Notes on Funding Agencies/Sponsors:
CIPRes is supported by the NSF Information Technology Research program; it is a large ITR project. It was funded through NSF's collaborative funding mechanism, which allows investigators from two or more institutions to collaborate on a unified research project. The University of New Mexico is the lead institution. The other institutions that submitted proposals are Florida State University, University of California-Berkeley, University of California-San Diego, and University of Texas at Austin. There is a Principal Investigator at each of these locations. Subcontracts were made to eight other participating institutions.

 
 
 
Organizations with Funded Participants:
 
Organization name:
Approx # of participants:
Description of organization's role(s):
American Museum of Natural History (AMNH)
1
Outreach/Education
University of Pennsylvania
   Department of Computer and Information Science (UPenn)
1
Modeling
   Biology Department (UPenn)
1
Modeling
University of California, Berkeley
   Jepson Herbarium (UC-Berkeley)
1
Outreach/Education; Databases
   Computer Science Division (UC-Berkeley)
7
Algorithms
Florida State University
   Department of Biological Science (FSU)
2
Software development
University of New Mexico (UNM)
   Department of Computer Science (UNM)
3
Algorithms
University of Texas at Austin
   School of Biological Sciences (UT-Austin)
3
Algorithms; Modeling
   Department of Computer Sciences (UT-Austin)
8
Algorithms; Databases; Software development
Yale University
   Peabody Museum (Yale)
1
Outreach/Education
   Department of Ecology & Evolutionary Biology (Yale)
1
Modeling; Databases
University of Connecticut (UConn)
   Department of Ecology and Evolutionary Biology (UConn) (EEB)
1
Software development
University of Arizona
   Department of Entomology (U-Arizona)
1
Software development; Databases; Outreach/Education
North Carolina State University
   Department of Statistics (NCSU)
1
Modeling
University of British Columbia
   Department of Zoology (UBC)
1
Software development; Databases
University of California, San Diego (UCSD)
   San Diego Supercomputer Center (SDSC)
9
Algorithms; Databases; Software development
State University of New York (SUNY)
   University at Buffalo
1
Databases
 
TOTAL PARTICIPANTS:
43
 

Notes on Participants/Organizations:
More institutions are likely to join in the future because some of the students who were instrumental in developing parts of the project are graduating and moving to other universities. The project also has collaborators from Europe, New Zealand, and Singapore. Foreign participants do not receive funds except for travel to meetings.

   
     
 
 

Communications Technology Used :

  The primary communication technologies used are email and phone. The Executive Committee (EC) tried to use videoconferencing, but they found that it was expensive and that the technology did not work well with eight people. Most members of the EC have direct access to the Grid, so they also tried to conference using the Grid, but it is not yet mature enough to support this kind of activity.  
 

Technical Capabilities :

  Management of technical resources
Access control/login facilities
Asynchronous object sharing
Common file space
Asynchronous conversation
Threaded discussion, Email
Synchronous conversation
Audio
 
  Key Articles :    
 

Project-reported performance data :

  A key deliverable of the CIPRes project is platform-independent software that can be downloaded and installed in a PC or a large supercomputer. A current version of the software is available at: http://www.phylo.org/software.html  
 
         
    
  Home | About SOC | Workshops | Resources | News & Events  

University of California, Irvine Logo

University of California, Irvine

School of Information Logo

School of Information University of Michigan