dictyBase Help: Glossary Terms
A-B | C-E | F-J | K-M | N-O
| P-Q | R-S | T-Z
- Accession number
- This refers to the unique GenBank identifier
a sequence has been assigned. This number can be used to search dictyBase
for a specific sequence.
- Alignment
- A presentation of two compared sequences that show
the regions of greatest statistical similarity.
- AmiGO
- AmiGO is a Browser for GO. With AmiGO the user can search for a GO term
and view all gene products annotated to it, or search for a gene product and view all its
associations. It is also possible to browse the ontologies to view relationships between
terms as well as the number of gene products annotated to a given term.
- Annotation
- A statement generated from the reading of a paper
abstract. An annotation reflects the results and techniques discussed in
the abstract.
- Anonymous FTP
- A method of sharing files on the Internet.
A variety of software that can provide FTP function is available in most
networking software packages. Anonymous FTP simply means a computer will
allow anyone using the FTP software access to a special directory fo files
on its disk drive. This service is called Anonymous FTP because the user
name used is "anonymous." When asked for a password, simply enter your e-mail
address.
- Associate
- In the Colleague
class of information, "Associate" refers to coworkers or collaborators.
- Author
- An author of a paper or personal communication included
in dictyBase. The User may use the "*" wildcard
character (i.e., Fisher*) to achieve the best results.
- Biological process
- One of the three categories used by the Gene Ontology project, biological
process describes broad biological goals, such as mitosis or purine
metabolism.
- Bit Score
- The bit score is derived from the raw alignment score in which the
statistical properties of the scoring system used have been taken into account. Because bit scores
have been normalized with respect to the scoring system, they can be used to compare alignment scores
from different searches.
- BLAST
- Basic Local Alignment Search Tool is a search algorithim
developed by Altschul et al. (1990). It is a very fast search algorithm
that is used by the blastn, blastp, and blastx programs to separately search
protein or DNA databases. BLAST
is best used for sequence similarity searching, rather than for motif searching.
- blastn
- A BLAST
program that compares a nucleotide query sequence against a nucleotide sequence
database. The user must enter a NUCLEOTIDE sequence and select a DNA database
(dictyBase Coding, dictyBase Genomic, GenBank) to search.
- blastp
- A BLAST
program that compares an amino acid query sequence against a protein sequence
database. The user must submit an AMINO ACID sequence and select a PROTEIN
database (dictyBase Protein, SwissProt) for the search.
- blastx
- A BLAST
program that compares the six-frame conceptual translation products of a
nucleotide query sequence (both strands) against a protein sequence database.
The user must enter a NUCLEOTIDE sequence and select a PROTEIN database
for the search.
- BLOSUM80
- An alternative scoring matrix for BLAST searches.
- BLOSUM45
- An alternative scoring matrix for BLAST searches.
- BLOSUM62
- A scoring matrix that is used as the default in
blastp, blastx, tblastx, and tblastn BLAST
searches.
- cds
- In a GenBank DNA sequence entry, "cds" stands
for coding sequence. A coding sequence is a subsequence of a DNA sequence
that is surmised to encode a gene. A coding sequence begins with an "ATG"
and ends with a stop codon. In the cases of spliced genes, all exons and
introns should be within the same cds.
- Cellular Component
- One of the three categories used by the Gene Ontology project, cellular component
encompasses subcellular structures, locations, and macromolecular complexes.
Examples include nucleus, membrane, and ribosome.
- Clustal W
- Clustal W is an alignment program for DNA and proteins
with improved sensitivity for the alignment of divergent protein sequences.
Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving
the sensitivity of progressive multiple sequence alignment through sequence
weighting, position specific gappenalties and weight matrix choice. Nucleic
Acids Res. 22:4673-80. [Clustal W]
- Codon Adaptation Index (CAI)
- Codon adaptation index is a measurement of the relative
adaptiveness of the codon usage of a gene towards the codon usage of highly
expressed genes. The relative adaptiveness (w) of each codon is the ratio
of the usage of each codon, to that of the most abundant codon for the same
amino acid. The CAI index is defined as the geometric mean of these relative
adaptiveness values. Non-synonymous codons and termination codons (dependent
on genetic code) are excluded.
- Codon Bias Index (CBI)
- Codon bias index is another measure of directional codon
bias, it measures the extent to which a gene uses a subset of optimal codons.
CBI is similar to Fop, with
expected usage used as a scaling factor. In a gene with extreme codon bias,
CBI will equal 1.0, in a gene with random codon usage CBI will equal 0.0.
Note that it is possible for the number of optimal codons to be less than
expected by random change. This results in a negative value for CBI.
- Colleagues
- Colleagues
is a searchable list of Dictyostelium researchers with their address (Internet
and Postal) and phone numbers. Colleague information may also include research
interests, web pages, and links to other Colleague entries for lab members,
lab heads, or collaborators.
[Colleague Help]
- Contig
- A stretch of genomic DNA assembled
from raw sequence data. The contig lengths vary and may span many genes or only part of a gene.
When enough overlapping contigs become available they are assembled into whole chromosome sequences.
- Curator
- A keeper of the Dictyostelium Genome Database
information, responsible for collecting and compiling data about Dictyostelium
genetic loci and DNA sequences and providing online assistance to users of
the database. The dictyBase Staff
page lists all current Dictyostelium curators.
- Curated Model
- A gene model that has been entered by a dictyBase curator
after reviewing all available evidence such as ESTs, GenBank records, or sequence similarity.
- DDBJ
- DNA DataBase of Japan. DDBJ is a repository of DNA sequences.
DDBJ is produced in collaboration with GenBank
and EMBL.
- Description
- A brief description of the role that the gene
plays in the cell, or a general description of the gene product.
- dictyBase
- An online informatics recource for Dictyostelium.
The database includes
a variety of genomic and biological information. dictyBase is funded by
the National Institute of Health. DictyBase is located in the Center of Genetic
Medicine at Northwestern University. The dictyBase Homepage is located
at http://dictybase.org/.
- dictyBase ID
- A unique identifying number within dictyBase
which is specific for a single feature.
- DictyDB
- DictyDB, an object oriented database for storing genomic
data for Dictyostelium discoideum, was
developed at UCSD by Doug Smith. The software (ACEDB) was originally created by Richard Durbin
and Jean Thierry-Mieg for the Caenorhabditis elegans genome project, and has been used to set
up many other genome and general biological information databases. The data from
GenBank and DictyDB were used
in the initial population of dictyBase.
- DUST
- A program for filtering low complexity regions from nucleic acid sequences.
DUST filtering is performed by default in blastn searches.
- EC_number
- The number assigned by the Enzyme Commission for
the particular enzyme coded for by the gene. Next to this information are
external links to the gene-specific information in the Enzyme and Kyoto information databases.
- EMBL
- European Molecular Biology Labs. The EMBL Nucleotide
Sequence database is a comprehensive database of DNA and RNA sequences.
The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
- Entrez
- The Entrez Search System was
developed by NCBI. Entrez allows
you to retrieve molecular biology data and bibliographic citations from integrated
nucleotide (GenBank, DDBJ, EMBL),
protein (Swiss-Prot, PIR, PRF, PDB),
and bioliographic (PubMed) databases.
Within dictyBase database pages, external links are provided to one or more
of these databases.
- EST
- 'Expressed Sequence Tags' provided by the Japanese cDNA Project and
published in GenBank. All ESTs in dictyBase
have a locus page.
- E-Value
- In a BLAST
search, an E-Value refers to the Expectation Value. The number of different alignments with
scores equivalent to or better than alignment scores that are expected to occur in a database search by
chance. The lower the E value, the more significant the score.
- Evidence Code
- Every GO annotation must indicate the type of evidence
that supports it; the evidence codes correspond to broad categories of experimental
or other support. The evidence code indicates how annotation to a particular term is
supported.
- Expect Threshold
- The Expect threshold ("E") is a BLAST parameter that reflects the number
of matches expected to be found by chance. If the statistical significance
of a match is greater than the Expect threshold, the match will not be reported.
The E threshold default is set to 10. Decreasing the E threshold will increase
the stringency of the search: fewer matches will be reported. On the other
hand, increasing the E threshold will decrease the stringency of the search
and result in more matches being reported.
- FASTA
- Program used to search simultaneously both protein
and DNA sequence databases (Pearson and Lipman, 1988). FASTA uses a fast
search to initially identify sequences with a high degree of similarity to
the query sequence and then conducts a second comparison on the selected sequences.
FASTA is slower than BLAST, but is
more sensitive/sometimes yields different results.
- FASTA File
- A FASTA file is a simple format primarily used to store genetic
sequence information. FASTA files are easily created in a text editor. It consists
of a header line beginning with a '>', holding a name or identifier and any additional
information about the sequence. The following lines contain the DNA or protein sequence.
- Feature
- A feature is defined as any gene
or other genetic element that resides on a chromosomal sequence.
One or more features can be associated with
a gene. Features include mRNAs, tRNAs,
ESTs, and
ORFs.
- Filter Options
- Filtering masks of portions of a query sequence
that have low compositional complexity (such as short internal repeats or
poly-A sequences) to reduce the frequency of statistically significant but
biologically uninteresting BLAST results.
- Frequency of Optimal Codons (Fop)
- This index is the ratio of optimal codons to synonymous
codons (genetic code dependent). Fop values for the
original index are always between 0 (where no optimal codons are used) and
1 (where only optimal codons are used). When calculating the modified Fop index, negative values are adjusted to zero.
- Genome Browser
- The generic genome browser developed by
GMOD (Generic Model Organism database) is employed by
dictyBase to display gene maps, browse the chromosomes, align genes or gene models with
ESTs or contigs, etc.
[Genome browser Help]
- GCG
- The Genetics Computer
Group is a private company involved in the development of sequence analysis
software.
- GenBank
- GenBank
is the DNA sequence database sponsored by the US National Institutes of Health.
GenBank is produced in collaboration with EMBL
and DDBJ.
- Gene Name
- See
Standard Name .
- Gene Page
- The information contained in the gene page
comprises the "heart" of dictyBase, containing information, both internally-
and externally-linked, about the queried gene. All information
about a given gene is contained under the standard name. If a given locus
has been referred to by a synonym , these names
are included in the gene page.
- Gene Ontology (GO)
- The Gene Ontology (GO)
project was established
to provide a common language to describe aspects of a gene product's biology.
The use of a consistent vocabulary allows genes from different species to
be compared based on their GO annotations. For each of three categories of
biological information--molecular function, biological process, and cellular
component--a set of terms has been selected and organized. Each set of terms
uses a controlled vocabulary, and parent-child relationships between terms
are defined. This combination of a controlled vocabulary with defined relationships
between items is referred to as an ontology. Within an ontology, a child
may be a "part of" or an example ("instance") of its parent. There are three
independently organized controlled vocabularies, or gene ontologies, one
for molecular function, one
for biological process, and one for
cellular component. Many-to-many
parent-child relationships are allowed in the ontologies. A gene may be annotated
to any level in an ontology, and to more than one item within an ontology. The browser for
GO is AmiGO.
- Gene Prediction
- A gene prediction is an automatically predicted gene model.
The gene predictions in dictyBase come from the Sequencing Center Consortium.
- Gene Product
- The name of the protein or RNA product
(and its function, if relevant) that is coded for by the gene.
- Gene Summary Paragraphs
- This is
a summary of published biological information for a gene and its product
which is designed to familiarize both Dictyostelium and non-Dictyostelium
researchers with the general facts and important subtleties regarding a locus.
dictyBase curators compose Gene Summary Paragraphs using natural language
and a controlled vocabulary based on the Gene
Ontology (GO). Gene Summary Paragraphs contain links to references and GO Annotations.
- GO
- See Gene Ontology.
- High Scoring Segment Pairs (HSPs)
- In a BLAST search,
an HSP is two sequence fragments (one from the query sequence and the other
from a database sequence) that show a locally maximal alignment for which
the alignment exceeds a pre-defined cutoff score.
- Hydropathicity of Protein (GRAVY score)
- This index is the general average hydropathicity
or (GRAVY) score for the hypothetical translated gene product. It is calculated
as the arithmetic mean of the sum of the hydropathic indices of each amino
acid (
Kyte and Doolittle 1982).
- Keyword
- A keyword is a word identified as particularly informative
about an object. In a sequence, a keyword often relates to the identity
of a gene or the function of the gene product. References often have a list
of keywords that are Medline MeSH terms. Keywords are good to use in text
searches.
- Kyoto
- An external link (found, if available on the locus page)
to the Kyoto
Encyclopedia of Genes and Genomes. The link goes directly to the information
for that specific enzyme.
- Literature Topics
- Literature Topics are a guide to the literature
for a given locus and
are derived from journal abstracts. dictyBase performs a search through all
PubMed literature (dating back to
1966) for all papers mentioning that locus and any aliases. dictyBase curators
read the abstracts of those papers and assign the papers to one or more Topics
that describe the kind of biological information contained in the abstracts.
The Literature Topics are thus designed to help the user easily find the
papers relevant to a given locus. Please note, however, that since only abstracts
are read, the Literature Topics are not a complete description of the information
contained in the papers.[Literature
Topics Help]
- Low Complexity Region
- Regions of biased composition including homopolymeric
runs, short-period repeats, and more subtle overrepresentation of some residues.
The SEG program is used to mask or filter LCRs in amino acid queries.
The DUST program is used to mask or filter LCRs in nucleic acid queries.
- LTR
- Long Terminal Repeat
- Medline
- Medline is the National Library of Medicine's database
of biomedical papers; it contains all citation information for each paper,
as well as abstracts for most of the papers.
- Molecular Function
- One of the three categories used by the Gene Ontology project, molecular
function describes the tasks performed by individual gene products; examples
are transcription factor and DNA binding.
- motif
- A meaningful pattern of nucleotides or amino acids
that is shared by two or more molecules.
- NCBI
- The National
Center for Biotechnology Information (NCBI) is part of the National Library
of Medicine (NLM) in the National Institutes of Health (NIH). Its mission
is to develop new information technologies to aid in the understanding of
fundamental molecular and genetic processes that control health and disease.
NCBI developed and maintains the Entrez Search
System and PubMed database.
- ORF
- An ORF (Open Reading Frame) corresponds to a stretch
of DNA that could potentially be translated into a polypeptide; i.e., it
begins with an ATG "start" codon and terminates with one of the 3 "stop" codons.
In dictyBase, currently all feature
types are ORFs. There will be different features in the future.
- Orthologs
- Sequences from different species that perform the
same biological function and are likely to be evolved from a common ancenstral
sequence. See Paralogs.
- P(N)
- In the results of a BLAST search, the lowest P-value given to
any set of HSPs found in a database are listed in the "P(N)" column.
- PAM30
- Sequence alignment matrix that allows 30 accepted
point mutations per 100 amino acids. A higher PAM is more suitable for comparing
distantly related sequences, while a lower PAM is suitable for comparing
closely related sequences (Swartz and Dayhoff, 1978).
- PAM70
- Sequence alignment matrix that allows 70 accepted
point mutations per 100 amino acids. PAM250 is suitable for comparing distantly
related sequences, while a lower PAM is suitable for comparing more closely
related sequences (Swartz and Dayhoff, 1978).
- PAM250-Gonnet
- Sequence alignment matrix that allows 250 accepted
point mutations per 100 amino acids using scoring tables recalculated since
the creation of PAM250 (Gonnet et al., 1992). PAM250-Gonnet is better than
PAM250 for comparing distantly related sequences.
- Paralogs
- Sequences that perform different biological functions
in the same species that likely arose by duplication and divergence from
a common ancestral sequence. See orthologs.
- PDB
- The Protein Data
Bank (PDB) is an archive of experimentally determined three-dimensional
structures of biological macromolecules, based at the Brookhaven National
Laboratory.
- Phenotype
- In the locus page, "phenotype"
refers to the observable traits of strains that carry a mutation at that locus.
- PIR
- PIR is a protein database. The PIR database has two sites.
(PIR-US) in the United States, and
PIR-JP based in Japan.
- Primary Feature
- A primary feature is the best available gene sequence at any given time.
When a gene has not been curated, the primary feature is the gene prediction
from the Sequencing Center.
If a curator makes a curated model for a gene,
the curated model becomes the the primary feature.
Thus, a protein coding gene has several primary features: coding sequence, genomic sequence, and protein sequence.
- Protein Info
- This is a general category within
the Locus page that contains information pertaining to the protein produced
by the gene.
- PubMed
- PubMed
is a database of bibliographic information developed by NCBI.
- Query Sequence
- A sequence, either amino acid or nucleotide,
chosen by the user to use in a BLAST search. A query sequence can
be typed or pasted into the query window on the search form. BLAST
searches require a minimum query sequence length of 15 nucleotides or amino
acids.
- RAW Format
- A format in which the nucleotide sequence appears without
headers or comments. RAW format must be used when performing an D. discoideum
search in BLAST or FASTA.
- Reference
- Within the dictyBase, a "reference" is most often a published
article in a scientific journal or book; however some references are unpublished
results, GenBank records, or personal communications to dictyBase. A comprehensive list of
references may be obtained for a given locus
within its literature topics section.
- Related Sequences
- A feature of Entrez that finds related nucleotide (GenBank)
or protein (GenPept) sequences using similarity searches.
- Research Interest
- In the Colleague class of information, Research_interest
refers to the broad areas of study the colleague is pursuing. Examples might
be: protein translocation, DNA replication, or cytoskeleton.
- Reserved Gene Name
- Gene names that are soon to be published
can be reserved by sending an e-mail to dictyBase.
- SEG
- A program for filtering low complexity regions in amino acid sequences.
Residues that have been masked are represented as "X" in an alignment. SEG filtering is performed
by default in blastp, blastx,
tblastx, and tblastn searches.
- Standard Name
- Following the guidelines for the Dictyostelium
Genetic Nomenclature, standard gene names are recommended to follow the Demerec
Nomenclature (four letters: three lower case, one upper case, e.g. dagA, myoB). If
a Demerec name is not suitable, modifications are acceptable (e.g act15).
All information in the database concerning this
gene will be listed within the standard name's locus window. Any other names
that have been used for this gene are listed as alias within the
standard name locus page.
- Stock Center
- The Dictyostelium
Stock Center stores all
available D. discoideum mutants, which can be ordered through dictyBase.
The Stock Center is located in the Dept. of Anatomy & Cell Biology at Columbia
University.
- Synonym
- An alternative to the standard name that has been agreed on by
curators and researchers. The synonym field is being
searched in any dictyBase search.
- tblastn
- A BLAST program
that compares a protein query sequence against a nucleotide sequence dataset
dynamically translated in all six reading frames (both strands). The user
must enter a AMINO ACID sequence and select one of the NUCLEOTIDE datasets
(i.e., genoSc or GenBank) for the search.
- tblastx
- A BLAST program
that compares the six-frame translations of a nucleotide sequence to the
six-frame translations of a nucleotide sequence dataset. The user must enter
a NUCLEOTIDE sequence and select one of the NUCLEOTIDE datasets (i.e., genoSc
or GenBank) for the search.
- Topic
- Biological information ascertained by dictyBase curators
from abstracts for a given gene name are categorized under pre-determined
"topic" tags. These topics comprise the literature topics guide to literature in dictyBase.
- UniProt
- UniProt (Universal Protein Resource)
is the most comprehensive catalog of information on proteins. It is a central repository of protein
sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
- Wildcard Character
- dictyBase uses an asterisk "*" as a wildcard
symbol. In a search, the wildcard character shows where any text can be
tolerated. For example, searching for the locus "cdc*" will produce all
cdc genes. Searching for the Author "Johns*" will produce all authors whose
last name begins with those letters. Since the database requires exact matches
to its format for searches to be productive, wise use of the "*" wildcard
character is needed for many types of searches.
- Word Size
- The Word Size (W) is a BLAST parameter that determines the minimum
length of a match. The query sequence is split up into
every possible 'word' of a selected size. BLAST first searches for a perfect match of at least the
word length. Once a match is found then it tries to extend the HSP.
|