Studienarbeiten : International, Interaktiv, Interdisziplinär !

      The virtual Study Project Agency has currently got the following projects

      ``Up for Grabs'':

        Sorry, no projects are currently available.

      If you're interested, please contact Georg Fuellen, via email (fuellen@techfak.uni-bielefeld.de), Phone (2903), or in person (M3-114).

      These project proposals are intended to be ``Studienarbeiten'' (in accordance with the Bielefeld Curriculum in ``Naturwissenschaftliche Informatik'' -- NWI), organized by the Study Project Agency. However, if you're not a Bielefeld student, and found this page on the WWW, please inquire, and we will probably be able to establish a contact.



      Old, outdated proposals:

      VIWB Project Proposal

      The Virtual Institute of Experimental (Wet) Biology proposes the following project:

      Project VIWB-1: TRP Multiple Alignment

      Project description

      TRP (an acronym for Transient Receptor Potential ) is a very important family of proteins gaining more importance as time passes. This includes a role in the taste receptors and the light receptors as well as other roles. The multiple alignment of the TRP family of proteins is not at all trivial as the family presents low levels of homology, and this is homology in patterns and signatures rather than the normal homology one finds in traditional protein families. There is an extensive growth in the number of family members. The output needed is a family tree.

      More about TRP:
      Store-operated Ca2+ entry, a mode of Ca2+ influx activated by depletion of Ca2+ from the internal stores, has been detected in a wide variety of cell types and may be the primary mechanism for Ca2+ entry in nonexcitable cells. TRP forms a supramolecular complex, proposed to be critical for feedback regulation and/or activation, that includes rhodopsin, phospholipase C, protein kinase C, calmodulin, and the PDZ domain-containing protein, INAD. INAD seems to be a scaffolding protein that links TRP with several of these other proteins in the complex. In Drosophila eye, another member of the family is expressed, TRP-like. It is suggested to form a heteromultimer with TRP with conductance characteristics distinct from those of TRP or TRP-like homomultimers. A family of proteins related to TRP is conserved from Caenorhabditis elegans to humans, and recent evidence indicates that at least some of these proteins are SOCs. The human TRP-related proteins may mediate many of the store-operated conductances that have been identified previously in a plethora of human cells. Two new members of the family were shown to be involved in the pain pathway (VR1 in rats) and in olfaction mechanosensation and Olfactory adaptation (OSM-9 in C. elegans).

      Requirements:

      This is a doable project for a MSc student and an interesting non trivial one that may have a chance to become a nice project (alignment by elements vs. by total alignment, understanding functional vs. evolutional development of the family, structure function of the family and informational profiles of the proteins in the family). The infrastructure of the experimental group is also set, if one is interested to develop the project to a full blown PhD.

      Further reading:

      About TRP
      Harteneck et al. Trends in Neurosciences Vol 23(April), pages 159-166, 2000.



      All of the following project suggestions are probably outdated, but reactivation may be possible.

      EBI - SWISS-PROT Software Project Proposals

      The EBI - SWISS-PROT group proposes the following 3 software projects to the Virtual Study Project Agency.

      Project descriptions are still quite rough and preliminary. A more detailed specification will be given (or worked out together with the interested students) before the actual work begins.

      All project results (if successful), will actually be used to improve the content and quality of the TREMBL database. Therefore, output quality, reliability, handling of noisy input, and speed are crucial issues.

      Inital project coordination and additional biological training will take place during a one week stay at the European Bioinformatics Institute in Hinxton, Cambridge, UK. A similar stay at the end of the project will be focused on project presentation, evaluation and possibly the preparation of a publication.


      Project EBI-1: Protein Hunt

      Project description

      In the DDBJ/EMBL/GenBank nucleotide sequence databases there are tens of thousands of sequence entries which do encode a protein, but for some reason or another the coding region is not annotated as a CDS feature, and then the proper translation of the nucleotide sequence into the protein is missing.

      The goal of the project is to filter out as many of these entries as possible and to provide a proper translation for them by using a combination of standard tools and additional scripts. After further postprocessing by SWISS-PROT these entries will then be included in TREMBL, the computer-annotated supplement of SWISS-PROT.

      Requirements

      The project requires usage of standard bioinformatics tools and a good knowledge of gene-protein relations as well as some experience in Perl and C.

      Example

      The
      DDBJ/EMBL/GenBank entry with accession number M77015 is clearly encoding a protein which is even named in the DE line, but the protein sequence is missing.

      The corresponding SWISS-PROT entry for this entry is P26150. But for thousands of DDBJ/EMBL/GenBank entries the corresponding SWISS-PROT or TREMBL entries don't exist. The goal of the project is to enable the automatic generation of most of these entries.

      Further Reading

      • Rolf Apweiler, Alain Gateau, Sergio Contrino, Maria Jesus Martin, Vivien Junker, Claire O'Donovan, Fiona Lang, Nicoletta Mitaritonna, Stephanie Kappus, and Amos Bairoch. 1997. Protein Sequence Annotation in the Genome Era: The Annotation Concept of SWISS-PROT + TREMBL. Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology
      • Bairoch A., and Apweiler R. 1997. The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucleic Acids Res. 25:31-36
      • Stoesser G., Sterk P., Tuli M.A., Stoehr P.J., and Cameron G.N. 1997. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 25:7-13.

      Programs to be used (among others)

      • Sequence Retrieval System V5.x to filter out relevant entries
      • The Protein Machine for translation of open reading frames
      • Fasta3 to check translation results for reliability


      Project EBI-2: Feature Propagation

      Project description

      Different regions in the amino acid chain of a protein are responsible for different properties of the protein. These position-dependent properties are a subset of the features of the protein. By aligning protein sequences derived from new nucleotide sequences to proteins with known features, the features of the new proteins can be deduced from those of known proteins.

      As the alignments rarely produce exact matches, the exact location of the features in the new proteins is a nontrivial task that is to be resolved as far as possible in this project. After the determination of feature locations, the information is to be presented in TREMBL format.

      Requirements

      A good knowledge in multiple alignment, some knowledge of protein biochemistry, and C or Perl knowledge are highly desirable.

      Example

      The image below visualizes the task to be resolved. The new proteins 1 and 2 have regions with high homology to the reference protein. By multiple alignment and subsequent processing, the location of these features in the new proteins are to be determined.



      Further reading

      As above, plus:

      • The BioComputing Hypertext Coursebook, Chapters 1-3


      Project EBI-3: The Protein Machine

      Project Description

      The
      Protein Machine is a tool to translate nucleotide sequences into protein sequences. The goal of the project is to enhance the functionality of this tool in several aspects: instead of a simple direct translation, it should provide an option for a six-frame-translation (translation for all three possible reading frames for the sequence and its complement), ranking of the most probable translation based on sequence length and initiator methionin as well as on automatic FASTA homology searches.

      It should be possible to run the program from the command line as well as in interactive mode from a web interface. The interactive mode should also comprise graphic visualisation of the results.

      Requirements

      A good knowledge in C or Perl and knowledge of nucleic acid translation and HTML/HTTP are highly desirable. The current version of the program relies on the SRS package, a C library for sequence retrieval and sequence manipulation.

      Further reading

      As above.




      BioInformatics Services, Rockville, MD 20854 U.S.A., suggests the following project:

      Project BIS-1: Computational cDNA Libraries

      Project description

      Given a genome database, how do we assemble a list of proteins appropriate for construction of cellular level mechanistic models of particular mammalian cell types? From the perspective of bioinformatics, the most useful and practical definition of a cell type is a list of the proteins it expresses. This definition presupposes the central dogma of molecular cell biology:

      There are several possible approaches to identifying the proteins that are expressed in a particular cell type. One is the EST approach. It relies on Soares-normalized cDNA libraries obtained from tissues of interest. This approach has two major problems for cell biologists and cell physiologists. First, the tissue samples almost always contain multiple cell types and the normalization procedure guarantees that mRNAs from every cell type will be recorded. Second, even when the original cDNA library is obtained from cultured cells, there is considerable uncertainty as to whether the same genes are expressed in vitro as are expressed in vivo. Moreover, genes expressed in one set of culture conditions may not be expressed in another.

      Our project aims to overcome these difficulties by taking advantage of the rapidly increasing information on cell-specific promoter elements. In effect, we propose to construct Computational cDNA (CcDNA) libraries.

      This approach has the great advantage that it will be easily generalized to other cell types once promoters have been identified, but we propose to begin with the vascular smooth muscle cell because of its tremendous physiological and pathophysiological significance. The vascular smooth muscle cell is essential for regulation of blood flow to all tissues and organs, as well as control of arterial and venous blood pressure. Aberrant behavior of this cell type is a key feature of atherosclerotic heart disease, hypertension and stroke. In the industrialized world, these diseases account for more deaths and disabilities than any other human affliction.

      During the past few years a group of MADS-box transcription factors has been shown to control the expression of muscle-specific genes. In particular, the four members of the myocyte enhancer factor-2 (MEF2) family are expressed in developing cardiac, skeletal and smooth muscle cells. Very recently, Eric Olson's laboratory has identified several potential partners for the MEF2 family that may direct the specific program of vascular smooth muscle differentiation.

      We propose to develop a WWW interface to the worldwide genome databases that permits the user to assemble a list of candidate genes containing user-specified upstream promoters or combinations of promoters that are known or hypothesized to control cell-specific expression in vascular smooth muscle. It may also be useful to include promoters that are known to be activated all cell types so as to construct a full computational cDNA (CcDNA) library.

      Upon completion of the project we can carry out two tests of its effectiveness. First, we can search the appropriate subsets of dbEST to determine if a significant number of our CcDNAs are known to be expressed in tissues containing vascular smooth muscle. Second, we can compare our list to a list compiled from the joint experience of a large group of investigators working in the fields of vascular smooth muscle cell physiology and cell biology.

      Requirements

      A working knowledge of elementary molecular cell biology. A good knowledge in C or Perl and knowledge of HTML are highly desirable.

      Further Reading

      Firulli AB, Olson EN, Modular regulation of muscle gene transcription: a mechanism for muscle cell diversity. Trends Genet 1997 Sep;13(9):364-369



      The ``Lehrstuhl für spezielle Zoologie'' of the Ruhr-Universität Bochum suggests the following:

      Project LSZ-1: Intelligent Formatting Software for Biosequence Management.

      Project description

      We need a tool that can read in interactively the specification of new formats (provided their syntax is easy), and can then recognize them and convert them to standard formats. The user should be able to specify the delimiters between species name and sequence, between sequences, etc. More details still to be specified as of Sep 22, 1997. Please inquire.

      Requirements

      Still to be specified as of Sep 22, 1997. Please inquire.



      Edited by Georg Fuellen.
      Back to Study Project Agency Home Page.