Protein Queries

  1. All the proteins of E. coli.

    [x^?name : x <- ecoli^^proteins]

    This can be read as follows: return the name of all x such that x is an object in the class proteins in the database ecoli. In BioVelo, the database name must be an identifier, as defined in the BioVelo language, not containing any spaces or special characters.

    The colon ':' delimits the head of the query, which in this case is x^?name, and the rest of the query, which are qualifiers, that is, generators, filters, or binders. The left arrow <- specifies a generator: on its left is a variable (or tuple) and on its right is a list. The double caret ^^ is an operator that creates a list of all objects of a given database name and class name. In this example, the database name is ecoli and the class name is proteins. Please, consult the BioVelo documentation for more information on the syntax and semantics of the query language.

  2. Give the number of proteins in E. coli. Notice that the following query does not start with a square bracket.
    #ecoli^^proteins
    

  3. All the proteins of E. coli for which the dna-footprint-size is smaller than 10.
    [x^?name : x <- ecoli^^proteins,  
         x^dna-footprint-size < 10]
    

    This can be read as follows: return all x such that x is an object from the proteins class in database ecoli, for which the attribute dna-footprint-size of x is smaller than 10.

  4. All the proteins of E. coli for which the molecular weight is between 10 and 20 kilodaltons, and for which the isoelectric point, attribute PI, is between 3 and 5. Note that attribute PI is a list of integers, not an integer. Similarly, the attribute molecular-weight-kd is a list of numbers not a single number. Therefore, in the query below, subqueries are used on the these lists to find at least one number that have the stated relationship.
    [(x^?name, x^pi, x^molecular-weight-kd) : 
        x <- ecoli^^proteins, 
        #[w : w <- x^molecular-weight-kd, w < 20, w > 10] > 0, 
        #[y : y <- x^pi, y < 5, y > 3] > 0
    ]
    

  5. Find all E. coli proteins that have "No information" (which could also be the string "no information") and "2006" in the comment attribute. Note: the attribute comment is a list of strings, not a string; but the operators instringci and instring handle a list of strings by searching each string in it — if at least one has the test argument as a substring, the operator returns true.
    [ p^?name : p <- ecoli^^proteins, 
      "2006" instring p^comment &  
      "No information" instringci p^comment]
    
    This can also be written as
    [ p^?name : p <- ecoli^^proteins, 
      "2006" instring p^comment, 
      "No information" instringci p^comment]
    
    And to have each comment nicely printed next to every protein in the output table:
    [ (p^?name, p^comment) : p <- ecoli^^proteins, 
      "2006" instring p^comment, 
      "No information" instringci p^comment]
    

    Gene Queries

  6. Find all genes of E. coli that have more than four names. These include synonyms which correspond to other genes that produce the same protein. Return the list of names of each gene found.
    [ x^?names : x <- ecoli^^genes, #x^names > 4]
    

  7. Find all genes of E. coli whose names (attribute names) contain the string 'tr'. Return the list of names of each gene found.
    [ x^?names : x <- ecoli^^Genes, 
        (#[y : y <-x^names, "tr" instring y]) > 0]
    

  8. Find all genes of E. coli for which its left end position is less than 153000. Return also the right-end and left-end positions for each gene.
    [(x^?name, x^left-end-position, x^right-end-position) :
      x <- ecoli^^Genes, 
      x^left-end-position < 153000]
    

  9. Find all genes of E. coli that have less than 200 nucleotides. Return also the right-end, left-end positions and the number of nucleotides for each gene.
    [(x^?name, x^left-end-position, x^right-end-position, k) :
      x <- ecoli^^genes, 
      k := abs(x^left-end-position - x^right-end-position) + 1,
      k < 200]
    

  10. Find all the genes in E. coli that have no reported location. Return also their comment and product attributes to learn more about such genes. Note: when using the arithmetic operator '-', if a numeric attribute has no value, it is interpreted as 0.
    [(x^?name, x^left-end-position, x^right-end-position, 
      x^comment, x^product) :
         x <- ecoli^^genes, 
         0 = x^left-end-position - x^right-end-position]
    

  11. Return the list of genes from E. coli that are part of more than one transcription unit. Return also for each gene the list of transcription units containing it.
    [(x^?name, e) : 
            x <- ecoli^^genes, 
            e := [c^?name: c <- x^component-of, 
                     c isa transcription-units],
            #e > 1]
    

    Reaction Queries

  12. Find the number of reactions in E. coli that have exactly one reactant.
    #[r^?name : r <- ecoli^^reactions, 1 = #r^left ]
    

  13. Find all reactions in E. coli that have at least one left reactant also present as a product (for example, transport reactions). Also, return in a second column one such "on both sides" item for each reaction returned.
    [ (r^?name, c^?name) : 
       r <- ecoli^^reactions, 
       c <- r^left, 
       c in r^right ]
    

  14. Find all the reactions in E. coli that have more than three reactants — return this number of reactants in the second column. There are eight such reactions in E. coli as of May, 2006.
    [(r^?name, #r^left) : 
       r <- ecoli^^reactions, 
       3 < #r^left ]
    

  15. Find all the reactions in E. coli that are catalyzed by more than one enzyme.
    [(x^?name, x^?in-pathway, #x^in-pathway) : 
         x <- ecoli^^reactions,  
         1 < #x^in-pathway]
    

  16. Find all E. coli reactions that are in more than one pathway. Output three columns per row: the reaction itself, the list of pathways the reaction occurs in, and the number of pathways.
    [(x^?name, x^in-pathway, #x^in-pathway) : x <- ecoli^^reactions,  
        ! (x isa binding-reactions), 1 < #x^in-pathway]
    

    Pathway Queries

  17. Find all base pathways — i.e. pathways that are not superpathways — of E. coli that pertain to small-molecule metabolism.
    [x^?name : x <- ecoli^^pathways, 
      !(x isa super-pathways), 
      !(x isa signaling-pathways) ]
    

  18. For each pathway in E. coli, generate a list of all possible pairings of reactions in that pathway (note that these are not biological pairings, but an all-by-all pairing of all reactions in that pathway). For each pathway, display on a given row of a table first the pathway name, then a reaction pair. Also note that reciprocal pairs will be shown (e.g. pathway, A, B and pathway, B, A).
    [(p^?name, r1^?name, r2^?name) : 
      p <- ecoli^^pathways, r1 <- p^reaction-list, 
      r2 <- p^reaction-list, r1 != r2]
    

  19. Find all enzymes of E. coli that catalyze two reactions in the same pathway.
    [([y^enzyme:y<-es], p^?name, r1^?name, r2^?name) : 
      p <- ecoli^^pathways, 
      r1 <- p^reaction-list, 
      r2 <- p^reaction-list, 
      r1 != r2, 
      es := (r1^enzymatic-reaction ** r2^enzymatic-reaction),
      #es > 0]
    
    Note that es is a set of enzymatic-reactions and is not an enzyme. The short sub-query [y^enzyme:y<-es] in the head of the main query lists the enzyme names of es; which are enzymatic-reactions.

    Transcription Unit Queries

  20. Return a list of all the transcription units in E. coli, each with its promoter.
    [(t^?name, [c^?name:c <- t^components, c isa promoters]) :   
      t <- ecoli^^transcription-units]
    

  21. Return the list of transcription units in E. coli which have more than one gene. Display the transcription unit names, the gene names, and their length.
    [(z^?NAME, GL) :  
      z <- ECOLI^^Transcription-Units, 
      G := [x : x <- z^components, x isa genes], 
      #G > 1,
      GL := [(g^?name,k): g <- G, 
             k := abs(g^left-end-position - g^right-end-position)+1]
    ]
    

    Queries That Generate Tables

  22. Find all transcription units of E. coli with their components in one column, and the partitioning of these components into their genes, binding-sites, and promoters in separate columns.
    [(t^?name, t^components, 
     [c: c<-t^components, c isa genes], 
     [c: c<-t^components, c isa DNA-Binding-Sites],
     [c: c<-t^components, c isa promoters]) :
      t <- ecoli^^transcription-units] 
    

  23. Generate two tables, the first one listing all the reactions from E. coli having two reactants on their right side, the second one for three reactants.
    ([r^?name : r <- ecoli^^reactions, 2 = #r^right ],
     [r^?name : r <- ecoli^^reactions, 3 = #r^right ]
    )
    

  24. Generate a table of two columns where for each row the first column contains a reaction and the second column contains the genes that are involved in this reaction according to the "reaction-to-genes" relation. The table will contain all reactions found in E. coli. Note: this is a computationaly expensive query.
    [ (r^?name, reaction-to-genes r) : r <- ecoli^^reactions]
    

  25. Generate a table of two columns where for each row the first column contains a pathway and the second column contains genes that are involved in this pathway according to the "pathway-to-genes" relation. The table will contain all pathways found in E. coli. Note: this is a computationaly expensive query.
    [ (p^?name, pathway-to-genes p) : p <- ecoli^^pathways]
    

    Queries Across Multiple Databases

  26. Find the number of databases accessible from the current server.
    #dbs
    

  27. List the accessible databases (in most cases associated with one organism) from the current server.
    dbs
    

  28. Generate a table containing the number of proteins, genes, pathways, and reactions for all accessible databases.
    [(x, #x^^proteins, #x^^genes, #x^^pathways, #x^^reactions) : 
      x <- dbs]