Molecular biology Glossary


3' end/5' end- A nucleic acid strand is inherently directional, and the "5 prime end" has a free hydroxyl (or phosphate) on a 5' carbon and the "3 prime end" has a free hydroxyl (or phosphate) on a 3' carbon (carbon atoms in the sugar ring are numbered from 1' to 5';). That's simple enough for an RNA strand or for single-stranded (ss) DNA. However, for double-stranded (ds) DNA it's not so obvious - each strand has a 5' end and a 3' end, and the 5' end of one strand is paired with the 3' end of the other strand (it is "antiparallel"; ). One would talk about the 5' end of ds DNA only if there was some reason to emphasize one strand over the other - for example if one strand is the sense strand of a gene. In that case, the orientation of the sense strand establishes the direction.

3' flanking region- A region of DNA which is NOT copied into the mature mRNA, but which is present adjacent to 3' end of the gene. It was originally thought that the 3' flanking DNA was not transcribed at all, but it was discovered to be transcribed into RNA, but quickly removed during processing of the primary transcript to form the mature mRNA. The 3' flanking region often contains sequences which affect the formation of the 3' end of the message. It may also contain enhancers or other sites to which proteins may bind.

3' untranslated region- A region of the DNA which IS transcribed into mRNA and becomes the 3' end or the message, but which does not contain protein coding sequence. Everything between the stop codon and the polyA tail is considered to be 3' untranslated. The 3' untranslated region may affect the translation efficiency of the mRNA or the stability of the mRNA. It also has sequences which are required for the addition of the poly (A) tail to the message (including one known as the "hexanucleotide", AAUAAA).

5' flanking region- A region of DNA which is NOT transcribed into RNA, but rather is adjacent to 5' end of the gene. The 5'-flanking region contains the promoter, and may also contain enhancers or other protein binding sites.

5' untranslated region- A region of a gene which IS transcribed into mRNA, becoming the 5' end of the message, but which does not contain protein coding sequence. The 5'-untranslated region is the portion of the DNA starting from the cap site and extending to the base just before the ATG translation initiation codon. While not itself translated, this region may have sequences which alter the translation efficiency of the mRNA, or which affect the stability of the mRNA.

Ablation experiment- An experiment designed to produce an animal deficient in one or a few cell types, in order to study cell lineage or cell function. The idea is to make a transgenic mouse with a toxin gene (often diphtheria toxin) under control of a specialized promoter which activates only in the target cell type. When embryo development progresses to the point where it starts to form the target tissue, the toxin gene is activated, and that specific tissue dies. Other tissues are unaffected.

Acrylamide gels- A polymer gel used for electrophoresis of DNA or protein to measure their sizes (in daltons for proteins, or in base pairs for DNA). See "Gel Electrophoresis". Acrylamide gels are especially useful for high resolution separations of DNA in the range of tens to hundreds of nucleotides in length.

Agarose gel electrophoresis - a method for separating nucleic acids (DNA or rna) within a gel made of agarose in a suitable buffer under the influence of an electrical field. Suitable for separation of large fragments of nucleic acid, separation is based primarily upon the size of the nucleic acid.

Agarose gels- A polysaccharide gel used to measure the size of nucleic acids (in bases or base pairs). See "Gel Electrophoresis". This is the gel of choice for DNA or RNA in the range of thousands of bases in length, or even up to 1 megabase if you are using pulsed field gel electrophoresis.

Allele - one of several alternate forms of a gene occupying a given locus on a chromosome or plasmid.

Amino acids - the 20 basic building blocks of proteins, consisting of the basic formula nh2-chr-cooh, where "r" is the side chain which defines the amino acid.

Amino terminus - refers to the nh2 end of a peptide chain (by custom drawn at the left of a protein sequence)

Amp resistance- Ampicillin Resistant, for more See "Antibiotic resistance".

Amplification - refers to the production of additional copies of a chromosomal sequence, found either as intrachromosomal or extrachromosomal DNA. Also refers to the in vitro process in the polymerase chain reaction.

Amplimer - region of DNA sequence which is amplified during a pcr reaction and which is defined by a pair of pcr primers (these primer pairs are sometimes called amplimers).

Anchor sequence - a hydrophobic amino acid sequence which fixes a segment of a newly synthesized, translocating protein within the lipid bilayer membrane of the endoplasmic reticulum.

Anneal- Generally synonymous with "hybridize".

Antibiotic resistance- Plasmids generally contain genes which confer on the host bacterium the ability to survive a given antibiotic. If the plasmid pBR322 is present in a host, that host will not be killed by (moderate levels of) ampicillin or tetracycline. By using plasmids containing antibiotic resistance genes, the researcher can kill off all the bacteria which have not taken up his plasmid, thus ensuring that the plasmid will be propagated as the surviving cells divide.

Antisense strand (or primer) - refers to the rna or DNA strand of a duplex molecule which is complementary to that encoding a polypeptide. More specifically, the DNA strand which serves as template for the synthesis of rna and which is complementary to it. "antisense oligonucleotides" hybridize to mrna, and are used to prime cDNA synthesis.

Anti-sense strand- See discussion under "Sense strand".

AP-1 site- The binding site on DNA at which the transcription "factor" AP-1 binds, thereby altering the rate of transcription for the adjacent gene. AP-1 is actually a complex between c-fos protein and c-jun protein, or sometimes is just c-jun dimers. The AP-1 site consensus sequence is (C/G)TGACT(C/A)A. Also known as the TPA-response element (TRE). [TPA is a phorbol ester, tetradecanoyl phorbol acetate, which is a chemical tumor promoter]

Assembled epitope - see conformational epitope.

ATG or AUG- The codon for methionine; the translation initiation codon. Usually, protein translation can only start at a methionine codon (although this codon may be found elsewhere within the protein sequence as well). In eukaryotic DNA, the sequence is ATG; in RNA it is AUG. Usually, the first AUG in the mRNA is the point at which translation starts, and an open reading frame follows - i.e. the nucleotides taken three at a time will code for the amino acids of the protein, and a stop codon will be found only when the protein coding region is complete.

Autoradiography - a process to detect radioactively labeled molecules (which usually have been separated in an sds-page or agarose gel) based on their ability to create an image on photographic or x-ray film. This process does not result in a linear relationship between the intensity of the signal and the amount of radioactivity unless special steps are taken. There is now increasing use of phosphorimagers and other modern devices to detect and quantitate radioactive molecules which have been separated in gels.

Avidin - a glycoprotein which binds to biotin with very high affinity (kd = 10-15).

BAC- Bacterial Artificial Chromosome — a cloning vector capable of carrying between 100 and 300 kilobases of target sequence. They are propagated as a mini-chromosome in a bacterial host. The size of the typical BAC is ideal for use as an intermediate in large-scale genome sequencing projects. Entire genomes can be cloned into BAC libraries, and entire BAC clones can be shotgun-sequenced fairly rapidly.

Back mutation - reverse the effect of a point or frame-shift mutation that had altered a gene; thus it restores the wild-type phenotype (see revertant).

Bacteriophage - a virus that infects bacteria; often simply called a phage. The phages which are most often used in molecular biology are the e. Coli viruses lambda, m13 and t7.

Bacteriophage lambda- A virus which infects E. coli , and which is often used in molecular genetics experiments as a vector, or cloning vehicle. Recombinant phages can be made in which certain non-essential l DNA is removed and replaced with the DNA of interest. The phage can accommodate a DNA "insert" of about 15-20 kb. Replication of that virus will thus replicate the investigator's DNA. One would use phage l rather than a plasmid if the desired piece of DNA is rather large.

Band shift assay- see Gel shift assay.

Base - the purine or pyrimidine component of a nucleotide; often used to refer to a nucleotide residue within a nucleic acid chain.

Base pair - one pair of complementary nucleotides within a duplex strand of a nucleic acid. Under watson-crick rules, these pairs consist of one pyrimidine and one purine- i.e., c-g, a-t (DNA) or a-u (rna). However, "noncanonical" base pairs (e.g., g-u) are common in rna secondary structure.

Binding site- A place on cellular DNA to which a protein (such as a transcription factor) can bind. Typically, binding sites might be found in the vicinity of genes, and would be involved in activating transcription of that gene (promoter elements), in enhancing the transcription of that gene (enhancer elements), or in reducing the transcription of that gene (silencers). NOTE that whether the protein in fact performs these functions may depend on some condition, such as the presence of a hormone, or the tissue in which the gene is being examined. Binding sites could also be involved in the regulation of chromosome structure or of DNA replication.

Biotin - a coenzyme which is essential for carboxylation reactions (see avidin).

Blotting- A technique for detecting one RNA within a mixture of RNAs (a Northern blot) or one type of DNA within a mixture of DNAs (a Southern blot). A blot can prove whether that one species of RNA or DNA is present, how much is there, and its approximate size. Basically, blotting involves gel electrophoresis, transfer to a blotting membrane (typically nitrocellulose or activated nylon), and incubating with a radioactive probe. Exposing the membrane to X-ray film produces darkening at a spot correlating with the position of the DNA or RNA of interest. The darker the spot, the more nucleic acid was present there.

Blunt end - a terminus of a duplex DNA molecule which ends precisely at a base pair, with no overhang (unpaired nucleotide) in either strand. Some but not all restriction endonucleases leave blunt ends after cleaving DNA. Blunt-ended DNA can be ligated nonspecifically to other blunt-ended DNA molecules (compare with sticky end).

Box - refers to a short nucleic acid consensus sequence or motif that is universal within kingdoms of organisms. Examples of DNA boxes are the pribow box (tataat) for rna polymerase, the hogness box (tata) that has a similar function in eukaryotic organisms, and the homeo box. Rna boxes have also been described, such as pilipenko's box-a motif that may be involved in ribosome binding in some viral rnas.

BP- Abbreviation for base pair(s). Double stranded DNA is usually measured in bp rather than nucleotides (nt).

C terminus - see carboxyl terminus.

Cap- All eukaryotes have at the 5' end of their messages a structure called a "cap", consisting of a 7-methylguanosine in 5'-5' triphosphate linkage with the first nucleotide of the mRNA. It is added post-transcriptionally, and is not encoded in the DNA.

Cap site- Two usages- In eukaryotes, the cap site is the position in the gene at which transcription starts, and really should be called the "transcription initiation site". The first nucleotide is transcribed from this site to start the nascent RNA chain. That nucleotide becomes the 5' end of the chain, and thus the nucleotide to which the cap structure is attached (see "Cap"). In bacteria, the CAP site (note the capital letters) is a site on the DNA to which a protein factor (the Catabolite Activated Protein) binds.

Carboxyl terminus - refers to the cooh end of a peptide chain (by custom drawn at the right of a protein sequence)

CAT assay - An enzyme assay. CAT stands for chloramphenicol acetyl transferase, a bacterial enzyme which inactivates chloramphenicol by acetylating it. CAT assays are often performed to test the function of a promoter. The gene coding for CAT is linked onto a promoter (transcription control region) from another gene, and the construct is "transfected" into cultured cells. The amount of CAT enzyme produced is taken to indicate the transcriptional activity of the promoter (relative to other promoters which must be tested in parallel). It is easier to perform a CAT assay than it is to do a Northern blot, so CAT assays were a common method for testing the effects of sequence changes on promoter function. Largely supplanted by the reporter gene luciferase.

CCAAT box- (CAT box, CAAT box, other variants) A sequence found in the 5' flanking region of certain genes which is necessary for efficient expression. A transcription factor (CCAAT-binding protein, CBP) binds to this site.

CDNA - complementary DNA. A DNA molecule which was originally copied from an rna molecule by reverse transcription. The term "cDNA" is commonly used to describe double-stranded DNA which originated from a single-stranded rna molecule, even though only one strand of the DNA is truly complementary to the rna.

cDNA clone- "complementary DNA"; a piece of DNA copied from an mRNA. The term "clone" indicates that this cDNA has been spliced into a plasmid or other vector in order to propagate it. A cDNA clone may contain DNA copies of such typical mRNA regions as coding sequence, 5'-untranslated region, 3' untranslated region or poly(A) tail. No introns will be present, nor any promoter sequences (or other 5' or 3' flanking regions). A "full-length" cDNA clone is one which contains all of the mRNA sequence from nucleotide #1 through to the poly(A) tail.

cDNA library a collection of cDNA fragments, each of which has been cloned into a separate vector molecule. Which is usually just a mixture of bacteria, where each bacteria carries a different plasmid. Inserted into the plasmids (one per plasmid) are thousands of different pieces of cDNA (each typ. 500-5000 bp) copied from some source of mRNA, for example, total liver mRNA. The basic idea is that if you have a large enough number of different liver-derived cDNAs carried in those bacteria, there is a 99% probability that a cDNA copy of any given liver mRNA exists somewhere in the tube. The real trick is to find the one you want out of that mess - a process called screening (see "Screening").

Chain terminator - see dideoxynucleotide.

Chaperone proteins - a series of proteins present in the endoplasmic reticulum which guide the proper folding of secreted proteins through a complex series of binding and release reactions.

Chromosome walking- A technique for cloning everything in the genome around a known piece of DNA (the starting probe). You screen a genomic library for all clones hybridizing with the probe, and then figure out which one extends furthest into the surrounding DNA. The most distal piece of this most distal clone is then used as a probe, so that ever more distal regions can be cloned. This has been used to move as much as 200 kb away from a given starting point (an immense undertaking). Typically used to "walk" from a starting point towards some nearby gene in order to clone that gene. Also used to obtain the remainder of a gene when you have isolated a part of it.

Cis - as used in molecular biology, an interaction between two sites which are located within the same molecule. However, a cis-acting protein can either be one which acts only on the molecule of DNA from which it was expressed, or a protein which acts on itself (e.g., self-proteolysis).

Cistron - a nucleic acid segment corresponding to a polypeptide chain, including the relevant translational start (initiation) and stop (termination) codons.

Clone - The term "clone" can refer either to a large number of cells, viruses, or molecules which are identical and which are derived from a single ancestral cell, virus or molecule. The term can be used to refer to the process of isolating single cells or viruses and letting them proliferate (as in a hybridoma clone, which is a "biological clone"), or the process of isolating and replicating a piece of DNA by recombinant DNA techniques ("molecular clone"). The use of the word as a verb is acceptable for the former meaning, but not necessarily the latter meaning.

Clone (verb) - To "clone" something is to produce copies of it. To clone a piece of DNA, one would insert it into some type of vector (say, a plasmid) and put the resultant construct into a host (usually a bacterium) so that the plasmid and insert replicate with the host. An individual bacterium is isolated and grown and the plasmid containing the "cloned" DNA is re-isolated from the bacteria, at which point there will be many millions of copies of the DNA - essentially an unlimited supply. Actually, an investigator wishing to clone some gene or cDNA rarely has that DNA in a purified form, so practically speaking, to "clone" something involves screening a cDNA or genomic library for the desired clone. See also "Probe" for a description of how one might start a cloning project, and "Screening" for how the probe in used.

Coding sequence- The portion of a gene or an mRNA which actually codes for a protein. Introns are not coding sequences; nor are the 5' or 3' untranslated regions (or the flanking regions, for that matter - they are not even transcribed into mRNA). The coding sequence in a cDNA or mature mRNA includes everything from the AUG (or ATG) initiation codon through to the stop codon, inclusive.

Coding strand- an ambiguous term intended to refer to one specific strand in a double-stranded gene. See "Sense strand".

Codon bias - the tendency for an organism or virus to use certain codons more than others to encode a particular amino acid. An important detrminant of codon bias is the guanosine-cytosine (gc) content of the genome. An organism that has a relatively low g+c content of 30% will be less likely to have a g or c at the third position of a codon (wobble position) than a a or t to specify an amino acid that can be represented by more than one codon.

Codon- In an mRNA, a codon is a sequence of three nucleotides which codes for the incorporation of a specific amino acid into the growing protein. The sequence of codons in the mRNA unambiguously defines the primary structure of the final protein. Of course, the codons in the mRNA were also present in the genomic DNA, but the sequence may be interrupted by introns.

Competent - bacterial cells which are capable of accepting foreign extra-chromosomal DNA. There are a variety of processes by which cells may be made competent.

Complementary - see base pair.

Conformational epitope - an epitope which is dependent upon folding of a protein; amino acid residues present in the antibody binding site are often located at sites in the primary sequence of the protein which are at some distance from each other. The vast majority of b-cell (antibody binding) epitopes are conformational.

Consensus sequence- a linear series of nucleotides or a ‘nominal’ sequence inferred from multiple, imperfect examples. Multiple lanes of shotgun sequence can be merged to show a consensus sequence. The optimal sequence of nucleotides recognized by some factor. A DNA binding site for a protein may vary substantially, but one can infer the consensus sequence for the binding site by comparing numerous examples. For example, the (fictitious) transcription factor ZQ1 usually binds to the sequences AAAGTT, AAGGTT or AAGATT. The consensus sequence for that factor is said to be AARRTT (where R is any purine, i.e. A or G). ZQ1 may also be able to weakly bind to ACAGTT (which differs by one base from the consensus).

Conservative substitution - a nucleotide mutation which alters the amino acid sequence of the protein, but which causes the substitution of one amino acid with another which has a side chain with similar charge/polarity characteristics (see amino acid). The size of the side chain may also be an important consideration. Conservative mutations are generally considered unlikely to profoundly alter the structure or function of a protein, but there are many exceptions (see nonconservative substitution).

Conserved - similar in structure or function.

Contig- Several uses, all nouns. The term comes from a shortening of the word ‘contiguous’. A ‘contig’ may refer to a map showing placement of a set of clones that completely, contiguously cover some segment of DNA in which you are interested. Also called the ‘minimal tiling path’. More often, the term ‘contig’ is used to refer to the final product of a shotgun sequencing project. When individual lanes of sequence information are merged to infer the sequence of the larger DNA piece, the product consensus sequence is called a ‘contig’. In other words a series of two or more individual DNA sequence determinations that overlap. In a sequencing project the contigs get larger and larger until the gaps between the contigs are filled in.

Cosmid- a genetically-engineered plasmid containing bacteriophage lambda packaging signals and potentially very large pieces of inserted foreign DNA (up to 50 kb). These are plasmids carrying a phage l cos site (which allows packaging into l capsids), an origin of replication and an antibiotic resistance gene. A plasmid of 40 kb is very difficult to put into bacteria, but can replicate once there. Cosmids, however, have a cos site, and thus can be packaged into l phage heads (a reaction which can be performed in vitro ) to allow efficient introduction into bacteria (you'll have to look up the cos site elsewhere).

Database search - once an open reading frame or a partial amino acid sequence has been determined, the investigator compares the sequence with others in the databases using a computer and a search algorithm. This is usually done in a protein database such as pir or swiss-prot. Nucleic acid sequences are in genbank and embl databases. The search algorithms most commonly used are blast and fasta.

Degeneracy - refers to the fact that multiple different codons in mrna can specify the same amino acid in an encoded protein.

Denaturation - with respect to nucleic acids, refers to the conversion from double-stranded to the single-stranded state, often achieved by heating or alkaline conditions. This is also called "melting" DNA. With respect to proteins, refers to the disruption of tertiary and secondary structure, often achieved by heat, detergents, chaotropes, and sulfhydryl-reducing agents.

Denaturing gel - an agarose or acrylamide gel run under conditions which destroy secondary or tertiary protein or rna structure. For protein, this usually means the inclusion of 2-me (which reduces disulfide bonds between cysteine residues) and sds and/or urea in an acrylamide gel. For rna, this usually means the inclusion of formaldehyde or glyoxal to destroy higher ordered rna structures. In DNA sequencing gels, urea is included to denature dsDNA to ssDNA strands. In denaturing gels, macromolecules tend to be separated on the basis of size and (to some extent) charge, while shape and oligomerization of molecules are not important. Contrast with native gel.

Deoxyribonuclease (DNAse) - an enzyme which specifically catalyzes the hydrolysis of DNA.

Deoxyribonucleotide - nucleotides which are the building blocks of DNA and which lack the 2' hydroxyl moiety present in the ribonucleotides of rna.

Dideoxy sequencing - enzymatic determination of DNA or rna sequence by the method of sanger and colleagues, based on the incorporation of chain terminating dideoxynucleotides in a growing nucleic acid strand copied by DNA polymerase or reverse transcriptase from a DNA or rna template. Separate reactions include dideoxynucleotides containing a,c, g, or t bases. The reaction products represent a collection of new. Labeled DNA strands of varying lengths, all terminating with a dideoxynucleotide at the 3' end (at the site of a complementary base in the template nucleic acid), and are separated in a polyacrylamide/urea gel to generate a sequence "ladder". This method is more commonly used than "maxam-gilbert" (chemical) sequencing.

Dideoxyribonucleotide - a nucleotide which lacks both 3' and 2' hydroxyl groups. Such dideoxynucleotides can be added to a growing nucleic acid chain, but do not then present a 3' -oh group which can support further propagation of the nucleic acid chain. Thus such compounds are also called "chain terminators", and are useful in DNA and rna sequencing reactions (see deoxyribo-nucleotide).

Direct repeats - identical or related sequences present in two or more copies in the same orientation in the same molecule of DNA; they are not necessarily adjacent.

DNA ligase - an enzyme (usually from the t4 bacteriophage) which catalyzes formation of a phosphodiester bond between two adjacent bases from double-stranded DNA fragments. Rna ligases also exist, but are rarely used in molecular biology.

DNA polymerase - a polymerase which synthesizes DNA (see polymerase).

DNase- Deoxyribonuclease, a class of enzymes which digest DNA. The most common is DNase I, an endonuclease which digests both single and double-stranded DNA.

Dot blot- A technique for measuring the amount of one specific DNA or RNA in a complex mixture. The samples are spotted onto a hybridization membrane (such as nitrocellulose or activated nylon, etc.), fixed and hybridized with a radioactive probe. The extent of labeling (as determined by autoradiography and densitometry) is proportional to the concentration of the target molecule in the sample. Standards provide a means of calibrating the results.

Downstream - identifies sequences proceeding farther in the direction of expression; for example, the coding region is downstream from the initiation codon, toward the 3' end of an mrna molecule. Sometimes used to refer to a position within a protein sequence, in which case downstream is toward the carboxyl end which is synthesized after the amino end during translation.

Downstream- See "Upstream/Downstream".

ds - "double-stranded"

Duplex - a nucleic acid molecule in which two strands are base paired with each other.

E. coli- A common Gram-negative bacterium useful for cloning experiments. Present in human intestinal tract. Hundreds of strains of E. coli exist. One strain, K-12, has been completely sequenced.

Electrophoresis- See "Gel electrophoresis".

Electroporation - a method for introducing foreign nucleic acid into bacterial or eukaryotic cells that uses a brief, high voltage dc charge which renders the cells permeable to the nucleic acid. Also useful for introducing synthetic peptides into eucaryotic cells.

End labeling - the technique of adding a radioactively labeled group to one end (5' or 3' end) of a DNA strand.

Endonuclease - cleaves bonds within a nucleic acid chain; they may b especific for rna or for single-stranded or double-stranded DNA. A restriction enzyme is a type of endonuclease.

Endonuclease- An enzyme which digests nucleic acids starting in the middle of the strand (as opposed to an exonuclease, which must start at an end). Examples include the restriction enzymes, DNase I and RNase A.

Enhancer- A enhancer is a nucleotide sequence to which transcription factor(s) bind, and which increases the transcription of a gene. It is NOT part of a promoter; the basic difference being that an enhancer can be moved around anywhere in the general vicinity of the gene (within several thousand nucleotides on either side or even within an intron), and it will still function. It can even be clipped out and spliced back in backwards, and will still operate. A promoter, on the other hand, is position- and orientation-dependent. Some enhancers are "conditional" - in other words, they enhance transcription only under certain conditions, for example in the presence of a hormone.

Epitope - as related to protein antigens, b-cell epitopes consist of the amino acid residues of a protein molecule which interact directly through noncovalent bonds with the amino acid residues of a particular antibody molecule (complementarity determining region). The average epitope probably involves about 15-20 contact amino acid residues, but one or two of these may be critical to the epitope's specificity and the avidity of the antibody-antigen reaction. B-cell epitopes may be either linear or conformational in nature. T-cell epitopes represent the small, processed peptides which bind to mhc class i and ii molecules on the surface of t cells.

ERE- Estrogen Response Element. A binding site in a promoter to which the activated estrogen receptor can bind. The estrogen receptor is essentially a transcription factor which is activated only in the presence of estrogens. The activated receptor will bind to an ERE, and transcription of the adjacent gene will be altered. See also "Response element".

Ethidium bromide - intercalates within the structure of nucleic acids in such a way that they fluoresce under uv light. Ethidium bromide staining is commonly used to visualize rna or DNA in agarose gels placed on uv light boxes. Proper precautions are required, because the ethidium bromide is highly mutagenic and the uv light damaging to the eyes. Ethidium bromide is also included in cesium chloride gradients during ultracentrifugation, to separate supercoiled circular DNA from linear and relaxed circular DNA.

Evolutionary clock - defined by the rate at which mutations accumulate within a given gene.

Evolutionary Footprinting- One can infer which portions of a gene are important by comparing the sequence of that gene with its cognates from other species. A plot showing the regions of high conservation will presumably reflect the regions that are functional in all the test species. In theory, the more species involved in the comparison, the more stringent the result can be (i.e. the more the conserved regions will reflect truly important sequences). Care must be taken, however, to use species in which the function of the gene has not diverged excessively, or the outcome will be uninformative.

Exon- Those portions of a genomic DNA sequence which WILL be represented in the final, mature mRNA. The term "exon" can also be used for the equivalent segments in the final RNA. Exons may include coding sequences, the 5' untranslated region or the 3' untranslated region.

Exonuclease - an enzyme which hydroylzes DNA beginning at one end of a strand, releasing nucleotides one at a time (thus, there are 3' or 5' exonucleases)

Expression - usually used to refer to the entire process of producing a protein from a gene, which includes transcription, translation, post-translational modification and possibly transport reactions.

Expression clone- This is a clone (plasmid in a bacteria, or maybe a l phage in bacteria) which is designed to produce a protein from the DNA insert. Mammalian genes do not function in bacteria, so to get bacterial expression from your mammalian cDNA, you would place its coding region (i.e. no introns) immediately adjacent to bacterial transcription/translation control sequences. That artificial construct (the "expression clone") will produce a pseudo-mammalian protein if put back into bacteria. Often, that protein can be recognized by antibodies raised against the authentic mammalian protein, and vice versa.

Expression- To "express" a gene is to cause it to function. A gene which encodes a protein will, when expressed, be transcribed and translated to produce that protein. A gene which encodes an RNA rather than a protein (for example, a rRNA gene) will produce that RNA when expressed.

Expression vector - a plasmid or phage designed for production of a polypeptide from inserted foreign DNA under specific controls. Often an inducer is used. The vector always provides a promoter and often the transcriptional start site, ribosomal binding sequence, and initiation codon. In some cases the product is a fusion protein.

Footprinting - a technique for identifying the site on a DNA (or rna) molecule which is bound by some protein by virtue of the protection afforded phosphodiester bonds in this region against attack by nuclease or nucleolytic compounds.

Frameshift mutation - a mutation (deletion or insertion, never a simple substitution) of one or more nucleotides but never a multiple of 3 nucleotides, which shortens or lengthens a trinucleotide sequence representing a codon; the result is a shift from one reading frame to another reading frame. The amino acid sequence of the protein downstream of the mutation is completely altered, and may even be much shorter or longer due to a change in the location of the first termination (stop) codon.

Fusion protein - a product of recombinant DNA in which the foreign gene product is juxtaposed ("fused") to either the carboxyl-terminal or amino-terminal portion of a polypeptide encoded by the vector itself. Use of fusion proteins often facilitates expression of otherwise lethal products and the purification of recombinant proteins.

Gel electrophoresis- A method to analyze the size of DNA (or RNA) fragments. In the presence of an electric field, larger fragments of DNA move through a gel slower than smaller ones. If a sample contains fragments at four different discrete sizes, those four size classes will, when subjected to electrophoresis, all migrate in groups, producing four migrating "bands". Usually, these are visualized by soaking the gel in a dye (ethidium bromide) which makes the DNA fluoresce under UV light.

Gel shift - a method by which the interaction of a nucleic acid (DNA or rna) with a protein is detected. The mobility of the nucleic acid is monitored in an agarose gel in the presence and absence of the protein- if the protein binds to the nucleic acid, the complex migrates more slowly in the gel (hence "gel shift"). A "supershift" allows determination of the specific protein, by virtue of a second shift in mobility that accompanies binding of a specific antibody to the nucleic acid-protein complex.

Gel shift assay- (aka gel mobility shift assay (GMSA), band shift assay (BSA), electrophoretic mobility shift assay (EMSA)) A method by which one can determine whether a particular protein preparation contains factors which bind to a particular DNA fragment. When a radiolabeled DNA fragment is run on a gel, it shows a characteristic mobility. If it is first incubated with a cellular extract of proteins (or with purified protein), any protein-DNA complexes will migrate slower than the naked DNA - a shifted band.

Gene - A unit of DNA which performs one function. Usually, this is equated with the production of one RNA or one protein. A gene contains coding regions, introns, untranslated regions and control regions. Or, the genomic nucleotide sequence that codes for a particular polypeptide chain, including relevant transcriptional control sequences and introns (if a eukaryote). However, the term is often loosely used to refer to only the relevant coding sequence.

Gene conversion - the alteration of all or part of a gene by a homologous donor DNA that is itself not altered in the process.

Genetic marker- A known site on the chromosome. It might for example be the site of a locus with some recognizable phenotype, or it may be the site of a polymorphism that can be experimentally discerned. See 'Microsatellite', 'SNP', 'Genotyping'.

Genome- The complete set of genetic information. Or the total DNA contained in each cell of an organism. Mammalian genomic DNA (including that of humans) contains 6x109 base pairs of DNA per diploid cell. There are somewhere in the order of a hundred thousand genes, including coding regions, 5' and 3' untranslated regions, introns, 5' and 3' flanking DNA. Also present in the genome are structural segments such as telomeric and centromeric DNAs and replication origins, and intergenic DNA.

Genomic blot- A type of Southern blot specifically used to analyze a mixture of DNA fragments derived from total genomic DNA. Because genomic DNA is very complicated, when it has been digested with restriction enzymes, it produces a complex set of fragments ranging from tens of bp to tens of thousands of bp. However, any specific gene will be reproducibly found on only one or a few specific fragments. A million identical cells will produce a million identical restriction fragments for any given gene, so probing a genomic Southern with a gene-specific probe will produce a pattern of perhaps one or just a few bands.

Genomic clone- A piece of DNA taken from the genome of a cell or animal, and spliced into a bacteriophage or other cloning vector. A genomic clone may contain coding regions, exons, introns, 5' flanking regions, 5' untranslated regions, 3' flanking regions, 3' untranslated regions, or it may contain none of these...it may only contain intergenic DNA (usually not a desired outcome of a cloning experiment!).

Genomic library is similar in concept to a cDNA library, but differs in three major ways - 1) the library carries pieces of genomic DNA (and so contains introns and flanking regions, as well as coding and untranslated); 2) you need bacteriophage l or cosmids, rather than plasmids, because... 3) the inserts are usually 5-15 kb long (in a l library) or 20-40 kb (in a cosmid library). Therefore, a genomic library is most commonly a tube containing a mixture of l phages. Enough different phages must be present in the library so that any given piece of DNA from the source genome has a 99% probability of being present.

Genotype - the genetic constitution of an organism; determined by its nucleic acid sequence. As applied to viruses, the term implies a group of evolutionarily related viruses possessing a defined degree of nucleotide sequence relatedness.

Glycoprotein - a glycosylated protein.

Glycosylation - the covalent addition of sugar moities to n or o atoms present in the side chains of certain amino acids of certain proteins, generally occuring within the golgi apparatus during secretion of a protein. Glycosylation sites are only partially predictable by current computer searches for relevant motifs in protein sequence. Glycosylation may have profound but very unpredictable effects on the folding, stability, and antigenicity of secreted proteins. Glycosylation is a property of eukaryotic cells, and differs among different cell types (i.e., it may be very different in yeast or insect cells used for protein expression, when compared with chinese hamster ovary (cho) cells).

Golgi apparatus - a membranous, vesicular structure which is in continuity with the endoplasmic reticulum of eukaryotic cells and generally in close proximity to the nucleus, the golgi plays an important role in the posttranslational processing and transport of secreted proteins.

GRE- Glucocorticoid Response Element- A binding site in a promoter to which the activated glucocorticoid receptor can bind. The glucocorticoid receptor is essentially a transcription factor which is activated only in the presence of glucocorticoids. The activated receptor will bind to a GRE, and transcription of the adjacent gene will be altered. See also "Response element".

Hairpin - a helical (duplex) region formed by base pairing between adjacent (inverted) complementary sequences within a single strand of rna or DNA.

Helix-loop-helix- A protein structural motif characteristic of certain DNA-binding proteins.

Heteroduplex DNA - generated by base pairing between complementary single strands derived from different parental duplex molecules; heteroduplex DNA molecules occur during genetic recombination in vivo and during hydridization of different but related DNA strands in vitro. Since the sequences of the two strands in a heteroduplex differ, the molecule is not perfectly base-paired; the melting temperature of a heteroduplex DNA is dependent upon the number of mismatched base pairs..

hnRNA- Heterogeneous nuclear RNA; refers collectively to the variety of RNAs found in the nucleus, including primary transcripts, partially processed RNAs and snRNA. The term hnRNA is often used just for the unprocessed primary transcripts, however.

Homologous recombination - the exchange of sequence between two related but different DNA (or rna) molecules, with the result that a new "chimeric" molecule is created. Several mechanisms may result in recombination, but an essential requirement is the existence of a region of homology in the recombination partners. In DNA recombination, breakage of single strands of DNA in the two recombination partners is followed by joining of strands present in opposing molecules, and may involve specific enzymes. Recombination of rna molecules may occur by other mechanisms.

Homology - indicates similarity between two different nucleotide or amino acid sequences, often with potential evolutionary significance. It is probably better to use more quantitative and descriptive terms such as nucleotide "identity" or, in the case of proteins, amino acid "identity" or "relatedness" (the latter refers to the presence of amino acids residues with similar polarity/charge characteristics at the same position within a protein).

Host strain (bacterial)- The bacterium used to harbor a plasmid. Typical host strains include HB101 (general purpose E. coli strain), DH5a (ditto), JM101 and JM109 (suitable for growing M13 phages), XL1-Blue (general-purpose, good for blue/white lacZ screening). Note that the host strain is available in a form with no plasmids (hence you can put one of your own into it), or it may have plasmids present (especially if you put them there). Hundreds, perhaps thousands, of host strains are available.

Hybridization- The reaction by which the pairing of complementary strands of nucleic acid occurs. DNA is usually double-stranded, and when the strands are separated they will re-hybridize under the appropriate conditions. Hybrids can form between DNA-DNA, DNA-RNA or RNA-RNA. They can form between a short strand and a long strand containing a region complementary to the short one. Imperfect hybrids can also form, but the more imperfect they are, the less stable they will be (and the less likely to form). To "anneal" two strands is the same as to "hybridize" them.

Hybridoma - a clone of plasmacytoma cells which secrete a monoclonal antibody; usually produced by fusion of peripheral or splenic plasma cells taken from an immunized mouse with an immortalized murine plasmacytoma cell line (fusion partner), followed by cloning and selection of appropriate antibody-producing cells.

Hydrophilicity plot - a computer plot which examines the relative summed hyrophobicity/hydrophilicity of adjacent amino acid sidechains (usually within a moving window of about 6 amino acid residues) along the primary sequence of a polypeptide chain. Values for the contribution of sidechains of each the 20 common amino acids to hydrophobicity/hydrophilicity have been developed by hopp & woods, and kyte & doolittle, and these plots are often named after these workers. Generally, hydrophobic regions of proteins are considered likely to be in the interior of the native protein, while hydrophilic domains are likely to be exposed on the surface and thus possibly antigenic sites (epitopes). At best, these are crude predictions.

Immunoblot - see western blot.

Immunoprecipitation - a process whereby a particular protein of interest is isolated by the addition of a specific antibody, followed by centrifugation to pellet the resulting immune complexes. Often, staphylococcal proteins a or g, bound to sepharose or some other type of macroscopic particle, is added to the reaction mix to increase the size and ease collection of the complexes. Usually, the precipitated protein is subsequently examined by sds-page.

In vitro translation - see reticulocyte lysate.

Inducer - a small molecule, such as iptg, that triggers gene transcription by binding to a regulator protein, such as lacz.

Initiation codon - the codon at which translation of a polypeptide chain is initiated. This is usually the first aug triplet in the mrna molecule from the 5' end, where the ribosome binds to the cap and begins to scan in a 3' direction. However, the surrounding sequence context is important and may lead to the first aug being bypassed by the scanning ribosome in favor of an alternative, downstream aug. Also called a "start codon". Occasionally other codons may serve as initiation codons, e.g. Uug.

Insert - foreign DNA placed within a vector molecule.

Insertion sequence - a small bacterial transposon carrying only the genetic functions involved in transposition. There are usually inverted repeats at the ends of the insertion sequence.

Intergenic- Between two genes; e.g. intergenic DNA is the DNA found between two genes. The term is often used to mean non-functional DNA (or at least DNA with no known importance to the two genes flanking it). Alternatively, one might speak of the "intergenic distance" between two genes as the number of base pairs from the polyA site of the first gene to the cap site of the second. This usage might therefore include the promoter region of the second gene.

Intron - intervening sequences in eukaryotic genes which do not encode protein but which are transcribed into rna. Removed from pre-mrna during nuclear splicing reactions. They are not present in the mature mRNA. Note that although the 3' flanking region is often transcribed, it is removed by endonucleolytic cleavage and not by splicing. It is not an intron.

Inverted repeats - two copies of the same or related sequence of DNA repeated in opposite orientation on the same molecule (contrast with direct repeats). Adjacent inverted repeats constitute a palindrome.

KB- abbreviation for kilobase, one thousand bases.

Kilobase - unit of 1000 nucleotide bases, either rna or DNA.

Kinase- A kinase is in general an enzyme that catalyzes the transfer of a phosphate group from ATP to something else. In molecular biology, it has acquired the more specific verbal usage for the transfer onto DNA of a radiolabeled phosphate group. This would be done in order to use the resultant "hot" DNA as a probe.

Klenow fragment - the large fragment of e. Coli DNA polymerase i which lacks 5' -> 3' exonuclease activity. Very useful for sequencing reactions, which proceed in a 5' -> 3' fashion (addition of nucleotides to templated free 3' ends of primers).

Knock-out - the excision or inactivation of a gene within an intact organism or even animal (e.g., "knock-out mice"), usually carried out by a method involving homologous recombination.

Knock-out experiment- A technique for deleting, mutating or otherwise inactivating a gene in a mouse. This laborious method involves transfecting a crippled gene into cultured embryonic stem cells, searching through the thousands of resulting clones for one in which the crippled gene exactly replaced the normal one (by homologous recombination), and inserting that cell back into a mouse blastocyst. The resulting mouse will be chimaeric but, if you are lucky (and if you've gotten this far, you obviously are), its germ cells will carry the deleted gene. A few rounds of careful breeding can then produce progeny in which both copies of the gene are inactivated.

Lambda- see Bacteriophage Lambda.

Leucine zipper- A motif found in certain proteins in which Leu residues are evenly spaced through an a-helical region, such that they would end up on the same face of the helix. Dimers can form between two such proteins. The Leu zipper is important in the function of transcription factors such as Fos and Jun and related proteins.

Library- A library might be either a genomic library, or a cDNA library. In either case, the library is just a tube carrying a mixture of thousands of different clones - bacteria or l phages. Each clone carries an "insert" - the cloned DNA. It can be cDNA library and Genomic library.

Ligase- An enzyme, T4 DNA ligase, which can link pieces of DNA together. The pieces must have compatible ends (both of them blunt, or else mutually compatible sticky ends), and the ligation reaction requires ATP.

Ligation- The process of splicing two pieces of DNA together. In practice, a pool of DNA fragments are treated with ligase (see "Ligase") in the presence of ATP, and all possible splicing products are produced, including circularized forms and end-to-end ligation of 2, 3 or more pieces. Usually, only some of these products are useful, and the investigator must have some way of selecting the desirable ones.

Linear epitope - an epitope formed by a series of amino acids which are adjacent to each other within the primary structure of the protein. Such epitopes can be successfully modelled by synthetic peptides, but comprise only a small proportion of all epitopes. The minimal epitope size is about 5 amino acid residues. Also called a sequential epitope.

Linkage - the tendency of genes to be inherited together as a result of their relatively close proximity on the same chromosome, or location on the same plasmid.

Linker - a short oligodeoxyribonucleotide, usually representing a specific restriction endonuclease recognition sequence, which may be ligated onto the termini of a DNA molecule to facilitate cloning. Following the ligation reaction, the product is digested with the endonuclease, generating a DNA fragment with the desired sticky or blunt ends.

Lipofectin - a commercially marketed liposome suspension which is mixed with DNA or rna to facilitate uptake of the nucleic acid by eukaryotic cells (see transfection).

M13- A bacteriophage which infects certain strains of E. coli . The salient feature of this phage is that it packages only a single strand of DNA into its capsid. If the investigator has inserted some heterologous DNA into the M13 genome, copious quantities of single-stranded DNA can subsequently be isolated from the phage capsids. M13 is often used to generate templates for DNA sequencing.

Marker- Anything which is use to identify.

Massager RNA- see mRNA.

Melting - the dissociation of a duplex nucleic acid molecule into single strands, usually by increasing temperature. See denaturation.

Microsatellite- A microsatellite is a simple sequence repeat (SSR). It might be a homopolymer ('...TTTTTTT...'), a dinucleotide repeat ('....CACACACACACACA.....'), trinucleotide repeat ('....AGTAGTAGTAGTAGT...') etc. Due to polymerase slip (a.k.a. polymerase chatter), during DNA replication there is a slight chance these repeat sequences may become altered; copies of the repeat unit can be created or removed. Consequently, the exact number of repeat units may differ between unrelated individuals. Considering all the known microsatellite markers, no two individuals are identical. This is the basis for forensic DNA identification and for testing of familial relationships (e.g. paternity testing).

Missense mutation - a nucleotide mutation which results in a change in the amino acid sequence of the encoded protein (contrast with silent mutation).

Mobility shift - see gel shift.

Molecular weight size marker- a piece of DNA of known size, or a mixture of pieces with known size, used on electrophoresis gels to determine the size of unknown DNA’s by comparison.

Monoclonal antibody - an antibody with very specific and often unique binding specificity which is secreted by a biologically cloned line of plasmacytoma cells in the absence of other related antibodies with different binding specificities. Differs from polyclonal antibodies, which are mixed populations of antibody molecules such as may be present in a serum specimen, within which many different individual antibodies have different binding specificities.

Motif - a recurring pattern of short sequence of DNA, rna, or protein, that usually serves as a recognition site or active site. The same motif can be found in a variety of types of organisms.

mRNA- "messenger RNA" or sometimes just "message"; an RNA which contains sequences coding for a protein. The term mRNA is used only for a mature transcript with polyA tail and with all introns removed, rather than the primary transcript in the nucleus. As such, an mRNA will have a 5' untranslated region, a coding region, a 3' untranslated region and (almost always) a poly(A) tail. Typically about 2% of the total cellular RNA is mRNA.

Multicistronic message - an mrna transcript with more than one cistron and thus encoding more than one polypeptide. These generally do not occur in eukaryotic organisms, due to differences in the mechanism of translation initiation.

Multicopy plasmids - present in bacteria at amounts greater than one per chromosome. Vectors for cloning DNA are usually multicopy; there are sometimes advantages in using a single copy plasmid.

Multiple cloning site - an artificially constructed region within a vector molecule which contains a number of closely spaced recognition sequences for restriction endonucleases. This serves as a convenient site into which foreign DNA may be inserted.

N terminus - see amino terminus.

Native gel - an electrophoresis gel run under conditions which do not denature proteins (i.e., in the absence of sds, urea, 2-mercaptoethanol, etc.).

Nested pcr - a very sensitive method for amplfication of DNA, which takes part of the product of a single pcr reaction (after 30-35 cycles), and subjects it to a new round of pcr using a different set of pcr primers which are nested within the region flanked by the original primer pair (see polymerase chain reaction).

Nick - in duplex DNA, this refers to the absence of a phosphodiester bond between two adjacent nucleotides on one strand.

Nick translation- A method for incorporating radioactive isotopes (typically 32P) into a piece of DNA. The DNA is randomly nicked by DNase I, and then starting from those nicks DNA polymerase I digests and then replaces a stretch of DNA. Radiolabeled precursor nucleotide triphosphates can thus be incorporated.

Non-coding strand- Anti-sense strand. See "Sense strand" for a discussion of sense strand vs. anti-sense strand.

Nonconservative substitution - a mutation which results in the substitution of one amino acid within a polypeptide chain with an amino acid belonging to a different polarity/charge group (see amino acids, conservative mutation)

Nonsence codon - see stop codon.

Nonsense mutation - a change in the sequence of a nucleic acid that causes a nonsense (stop or termination) codon to replace a codon representing an amino acid.

Nontranslated rna (ntr) - the segments located at the 5' and 3' ends of a mrna molecule which do not encode any part of the polyprotein; may contain important translational control elements.

Northern blot- A technique for analyzing mixtures of RNA, whereby the presence and rough size of one particular type of RNA (usually an mRNA) can be ascertained. See "Blotting" for more information. After Dr. E. M. Southern invented the Southern blot, it was adapted to RNA and named the "Northern" blot.

nt- Abbreviation for nucleotide; i.e. the monomeric unit from which DNA or RNA are built. One can express the size of a nucleic acid strand in terms of the number of nucleotides in its chain; hence ‘nt’ can be a measure of chain length.

Nuclear run-on- A method used to estimate the relative rate of transcription of a given gene, as opposed to the steady-state level of the mRNA transcript (which is influenced not just by transcription rates, but by the stability of the RNA). This technique is based on the assumption that a highly-transcribed gene should have more molecules of RNA polymerase bound to it than will the same gene in a less-active state. If properly prepared, isolated nuclei will continue to transcribe genes and incorporate 32P into RNA, but only in those transcripts that were in progress at the time the nuclei were isolated. Once the polymerase molecules complete the transcript they have in progress, they should not be able to re-initiate transcription. If that is true, then the amount of radiolabel incorporated into a specific type of mRNA is theoretically proportional to the number of RNA polymerase complexes present on that gene at the time of isolation. A very difficult technique, rarely applied appropriately from what I understand.

Nuclease- An enzyme which degrades nucleic acids. A nuclease can be DNA-specific (a DNase), RNA-specific (RNase) or non-specific. It may act only on single stranded nucleic acids, or only on double-stranded nucleic acids, or it may be non-specific with respect to strandedness. A nuclease may degrade only from an end (an exonuclease), or may be able to start in the middle of a strand (an endonuclease). To further complicate matters, many enzymes have multiple functions; for example, Bal31 has a 3'-exonuclease activity on double-stranded DNA, and an endonuclease activity specific for single-stranded DNA or RNA.

Nuclease protection assay- See "RNase protection assay".

Nucleoside - the composite sugar and purine or pyrimidine base which are present in nucleotides which are the basic building blocks of DNA and rna. Compare with nucleotide- nucleoside = base + sugar

Nucleotide - the composite phosphate, sugar, and purine or pyrimidine base which are the basic building blocks of the nucleic acids DNA and rna. The five nucleotides are adenylic acid, guanylic acid (contain purine bases), and cytidylic acid, thymidylic acid, and uridylic acid (contain pyrimidine bases). Nucleotide = base + sugar + phosphate.

Oligodeoxyribonucleotide - a short, single-stranded DNA molecule, generally l5-50 nucleotides in length, which may be used as a primer or a hybridization probe. Oligodeoxyribonucleotides are synthesized chemically under automated conditions.

Oligonucleotide - see oligodeoxyribonucleotide.

Oncogene- A gene in a tumor virus or in cancerous cells which, when transferred into other cells, can cause transformation (note that only certain cells are susceptible to transformation by any one oncogene). Functional oncogenes are not present in normal cells. A normal cell has many "proto-oncogenes" which serve normal functions, and which under the right circumstances can be activated to become oncogenes. The prefix "v-" indicates that a gene is derived from a virus, and is generally an oncogene (like v-src , v-ras, v-myb , etc). See also "Transformation (with respect to cultured cells)".

Open reading frame - a region within a reading frame of an mrna molecule that potentially encodes a polypeptide; and which does not contain a translational stop codon (see reading frame).

Open reading frame- Any region of DNA or RNA where a protein could be encoded. In other words, there must be a string of nucleotides (possibly starting with a Met codon) in which one of the three reading frames has no stop codons. See "Reading frame" for a simple example.

Operator - the site on DNA at which a repressor protein binds to prevent transcription from initiating at the adjacent promoter.

Operon - a complete unit of bacterial gene expression and regulation, including the structural gene or genes, regulator gene(s), and control elements in DNA recognized by regulator gene products(s).

Origin - a site within a DNA sequence of a chromosome, plasmid, or non-integrated virus at which replication of the DNA is initiated.

Origin of replication- Nucleotide sequences present in a plasmid which are necessary for that plasmid to replicate in the bacterial host. (Abbr. "ori")

Overhang - a terminus of a duplex DNA molecule which has one or more unpaired nucleotides in one of the two strands (hence either a 3' or 5' overhang). Cleavage of DNA with many restriction endonucleases leaves such overhangs (see sticky end).

Package - in recombinant DNA procedures, refers to the step of incorporation of cosmid or other lambda vector DNA with an insert into a phage head for transduction of DNA into host.

Palindromic sequence - a nucleotide sequence which is the same when read in either direction, usually consisting of adjacent inverted repeats.

pBR322- A common plasmid. Along with the obligatory origin of replication, this plasmid has genes which make the E. coli host resistant to ampicillin and tetracycline. It also has several restriction sites (BamHI, PstI, EcoRI, HindIII etc.) into which DNA fragments could be spliced in order to clone them.

Pcr - see polymerase chain reaction

PCR- see Polymerase Chain Reaction.

Peptide - a molecule formed by peptide bonds covalently linking two or more amino acids. Short peptides (generally less than 60 amino acid residues, and usually only half that length) can be chemically synthesized by one of several different methods; larger peptides (more correctly, polypeptides) are usually expressed from recombinant DNA.

Peptide bond - a covalent bond between two amino acids, in which the carboxyl group of one amino acid (x1--cooh) and the amino group of an adjacent amino acid (nh2--x2) react to form x1-co-nh-x2 plus h2o.

Phage - see bacteriophage.

Phagemid- A type of plasmid which carries within its sequence a bacteriophage replication origin. When the host bacterium is infected with "helper" phage, the phagemid is replicated along with the phage DNA and packaged into phage capsids.

Phenotype - the appearance of other characteristics of an organism resulting from the interaction of its genetic constitution with the environment.

Phosphatase, alkaline - an enzyme which catalyzes the hydrolysis of phosphomonoesters of the 5' nucleotides. Used to dephosphorylate (remove phosphate groups from) the 5' ends of DNA or rna molecules, to facilitate 5' end-labeling with 32p added back by t4 polynucleotide kinase; or to dephosphorylate the 5' ends of DNA molecules to prevent unwanted ligation reactions during cloning.

Phosphodiester bond - the covalent bond between the 3' hydroxyl in the sugar ring of one nucleotide and the 5' phosphate group of the sugar ring of the adjacent nucleotide residue within a nucleic acid.

Phosphorylation - the addition of a phosphate monoester to a macromolecule, catalyzed by a specific kinase enzyme. With respect to proteins, certain amino acid side chains (serine, threonine, tyrosine) are subject to phosphorylation catalyzed by protein kinases; altering the phosphorylation status of a protein may have dramatic effects on its biologic properties, and is a common cellular control mechanism. With respect to DNA, 5' ends must be phosphorylated for ligation.

Plasmid- A circular piece of DNA present in bacteria or isolated from bacteria. Escherichia coli, the usual bacteria in molecular genetics experiments, has a large circular genome, but it will also replicate smaller circular DNAs as long as they have an "origin of replication". Plasmids may also have other DNA inserted by the investigator. A bacterium carrying a plasmid and replicating a million-fold will produce a million identical copies of that plasmid. Common plasmids are pBR322, pGEM, pUC18.

Point mutation - a single nucleotide substitution within a gene; there may be several point mutations within a single gene. Point mutations do not lead to a shift in reading frames, thus at most cause only a single amino acid substitution (see frameshift mutation).

PolyA tail- After an mRNA is transcribed from a gene, the cell adds a stretch of A residues (typically 50-200) to its 3' end. It is thought that the presence of this "polyA tail" increases the stability of the mRNA (possibly by protecting it from nucleases). Note that not all mRNAs have a polyA tail; the histone mRNAs in particular do not.

Poly-a track - a lengthy adenylic acid polymer (rna) which is covalently linked to the 3' end of newly synthesized mrna molecules in the nucleus. Function not polymerase chain reaction (pcr) - a DNA amplification reaction involving multiple (30 or more) cycles of primer annealing, extension, and denaturation, usually using a heat-stable DNA polymerase such as taq polymerase. Paired primers are used, which are complementary to opposing strands of the DNA and which flank the area to be amplified. Under optimal conditions, single DNA sequence can be amplified a million-fold.

Polyacrylamide gel (page) - used to separate proteins and smaller DNA fragments and oligonucleotides by electrophoresis. When run under conditions which denature proteins (i.e., in the presence of 2-mercaptoethanol, sds, and possibly urea), molecules are separated primarily on the basis of size.

Polyclonal antibody - see monoclonal antibody.

Polymerase - an enzyme which catalyzes the addition of a nucleotide to a nucleic acid molecule. There are a wide variety of rna and DNA polymerases which have a wide range of specific activities and which operate optimally under different conditions. In general, all polymerases require templates upon which to build a new strand of DNA or rna; however, DNA polymerases also require a primer to initiate the new strand, while rna polymerases start synthesis at a specific promoter sequence.

Polymerase- An enzyme which links individual nucleotides together into a long strand, using another strand as a template. There are two general types of polymerase — DNA polymerases (which synthesize DNA) and RNA polymerase (which makes RNA). Within these two classes, there are numerous sub-types of polymerase, depending on what type of nucleic acid can function as template and what type of nucleic acid is formed. A DNA-dependant DNA polymerase will copy one DNA strand starting from a primer, and the product will be the complementary DNA strand. A DNA-dependant RNA polymerase will use DNA as a template to synthesize an RNA strand.

Polymerase chain reaction- A technique for replicating a specific piece of DNA in-vitro , even in the presence of excess non-specific DNA. Primers are added (which initiate the copying of each strand) along with nucleotides and Taq polymerase. By cycling the temperature, the target DNA is repetitively denatured and copied. A single copy of the target DNA, even if mixed in with other undesirable DNA, can be amplified to obtain billions of replicates. PCR can be used to amplify RNA sequences if they are first converted to DNA via reverse transcriptase. This two-phase procedure is known as ‘RT-PCR’.

Polymorphism - variation within a DNA or rna sequence.

Polynucleotide kinase - enzyme which catalyzes the transfer of the terminal phosphate of atp to 5' hydroxyl termini of polynucleotides, either DNAor rna. Usually derived from t4 bacteriophage.

Polypeptide - see peptide.

Post-transcriptional regulation- Any process occurring after transcription which affects the amount of protein a gene produces. Includes RNA processing efficiency, RNA stability, translation efficiency, protein stability. For example, the rapid degradation of an mRNA will reduce the amount of protein arising from it. Increasing the rate at which an mRNA is translated will increase the amount of protein product.

Post-translational modification - modifications made to a polypeptide molecule after its initial synthesis, this includes proteolytic cleavages, phosphorylation, glycosylation, carboxylation, addition of fatty acid moieties, etc.

Post-translational processing- The reactions which alter a protein's covalent structure, such as phosphorylation, glycosylation or proteolytic cleavage.

Post-translational regulation- Any process which affects the amount of protein produced from a gene, and which occurs AFTER translation in the grand scheme of genetic expression. Actually, this is often just a buzz-word for regulation of the stability of the protein. The more stable a protein is, the more it will accumulate.

PRE- Progesterone Response Element- A binding site in a promoter to which the activated progesterone receptor can bind. The progesterone receptor is essentially a transcription factor which is activated only in the presence of progesterone . The activated receptor will bind to a PRE, and transcription of the adjacent gene will be altered. See also "Response element".

Pre-mrna - an rna molecule which is transcribed from chromosomal DNA in the nucleus of eukaryotic cells, and subsequently processed through splicing reactions to generate the mrna which directs protein synthesis in the cytoplasm.

Primary structure -refers to the sequence of amino acid residues or nucleotides within protein or nucleic acid molecules, respectively (also see secondary and tertiary structure).

Primary transcript- When a gene is transcribed in the nucleus, the initial product is the primary transcript, an RNA containing copies of all exons and introns. This primary transcript is then processed by the cell to remove the introns, to cleave off unwanted 3' sequence, and to polyadenylate the 5' end. The mature message thus formed is then exported to the cytoplasm for translation.

Primer - an oligonucleotide which is complementary to a specific region within a DNA or rna molecule, and which is used to prime (initiate) synthesis of a new strand of complementary DNA at that specific site, in a reaction or series of reactions catalyzed by a DNA polymerase. The newly synthesized DNA strand will contain the primer at its 5' end. Typically, primers are chemically synthesized oligonucleotides 15-50 nucleotides in length, selected on the basis of a known sequence. However, "random primers" (shorter oligonucleotides, about 6 nucleotides in length, and comprising all possible sequences) may be used to prime DNA synthesis from DNA or rna of unknown sequence.completely known, but probably serves to enhance stability of the rna. Is frequently used to select mrna for cloning purposes by annealing to a column containing a matrix bound to poly-uridylic acid.

Primer- A small oligonucleotide (anywhere from 6 to 50 nt long) used to prime DNA synthesis. The DNA polymerases are only able to extend a pre-existing strand along a template; they are not able to take a naked single strand and produce a complementary copy of it de-novo. A primer which sticks to the template is therefore used to initiate the replication. Primers are necessary for DNA sequencing and PCR.

Primer extension- This is a method used to figure out how far upstream from a fixed site the start of an mRNA is. For example, perhaps you have isolated a cDNA clone, but you don't think that the clone has all of the 5' untranslated region. To find out how much is missing, you would first sequence the part you have, and figure out which strand is coding strand (usually the coding strand will have a large open reading frame). Next, you ask the DNA Synthesis Facility to make an oligonucleotide complementary to the 5'-most region of the coding strand (and thus complementary to the mRNA). This "primer" is hybridized to mRNA (say, a mixture of mRNA containing the one in which you are interested), and reverse transcriptase is added to copy the mRNA from the primer out to the 5' end.

Probe- A fragment of DNA or RNA which is labeled in some way (often incorporating 32P or 35S), and which is used to hybridize with the nucleic acid in which you are interested. For example, if you want to quantitate the levels of alpha subunit mRNA in a preparation of pituitary RNA, you might make a radiolabeled RNA in-vitro which is complementary to the mRNA, and then use it to probe a Northern blot of the pit RNA. A probe can be radiolabeled, or tagged with another functional group such as biotin. A probe can be cloned DNA, or might be a synthetic DNA strand. As an example of the latter, perhaps you have isolated a protein for which you wish to obtain a cDNA or genomic clone. You might (pay to) microsequence a portion of the protein, deduce the nucleic acid sequence, (pay to) synthesize an oligonucleotide carrying that sequence, radiolabel it and use it as a probe to screen a cDNA library or genomic library. A better way is to call up someone who already has the clone.

Processing - with respect to proteins, generally used to refer to proteolytic post-translational modifications of a polypeptide. In the case of rna, processing may involve the addition of a 5' cap and 3' poly-a tracks as well as splicing reactions in the nucleus.

Processivity - the extent to which an rna or DNA polymerase adhers to a template before dissociating, determines the average length (in kilobases) of the newly synthesized nucleic acid strands. Also applies to the action of exonucleases in digesting from the ends to the middle of a nucleic acid.

Promoter- The first few hundred nucleotides of DNA "upstream" (on the 5' side) of a gene, which control the transcription of that gene. The promoter is part of the 5' flanking DNA, i.e. it is not transcribed into RNA, but without the promoter, the gene is not functional. Note that the definition is a bit hazy as far as the size of the region encompassed, but the "promoter" of a gene starts with the nucleotide immediately upstream from the cap site, and includes binding sites for one or more transcription factors which can not work if moved farther away from the gene.

Proto-oncogene - a cellular oncogene-like sequence which is thought to play a role in controlling normal cellular growth and differentiation.

Proto-oncogene- A gene present in a normal cell which carries out a normal cellular function, but which can become an oncogene under certain circumstances. The prefix "c-" indicates a cellular gene, and is generally used for proto-oncogenes (examples- c-myb , c-myc , c-fos , c-jun , etc).

Pseudogene - inactive but stable components of the genome which derived by duplication and mutation of an ancestral, active gene. Pseudogenes can serve as the donor sequence in gene conversion events.

Pseudoknot - a feature of rna tertiary structure; best visualized as two overlapping stem-loops in which the loop of the first stem-loop participates as half of the stem in the second stem-loop.

Pseudorevertant - a mutant virus or organism which has recovered a wildtype phenotype due to a second-site mutation (potentially located in a different region of the genome, or involving a different polypeptide) which has eliminated the effect of the initial mutation.

Pulsed field gel electrophoresis- (PFGE) A gel technique which allows size-separation of very large fragments of DNA, in the range of hundreds of kb to thousands of kb. As in other gel electrophoresis techniques, populations of molecules migrate through the gel at a speed related to their size, producing discrete bands. In normal electrophoresis, DNA fragments greater than a certain size limit all migrate at the same rate through the gel. In PFGE, the electrophoretic voltage is applied alternately along two perpendicular axes, which forces even the larger DNA fragments to separate by size.

Pulsed-field gel electrophoresis (pfge) - separation of large (>50 kb) pieces of DNA, including complete chromosomes and genomes, by rapidly alternating the direction of electrophoretic migration in agarose gels.

Purine bases - adenine (a) or guanine (g) (see nucleotide).

Pyrimidine bases - cytosine (c), thymine (t) or uracil (u) (see nucleotide).

Random primed synthesis- If you have a DNA clone and you want to produce radioactive copies of it, one way is to denature it (separate the strands), then hybridize to that template a mixture of all possible 6-mer oligonucleotides. Those oligos will act as primers for the synthesis of labeled strands by DNA polymerase (in the presence of radiolabeled precursors).

Reading frame- When mRNA is translated by the cell, the nucleotides are read three at a time. By starting at different positions, the groupings of three that are produced can be entirely different. The following example shows a DNA sequence and the three reading frames in which it could be read. Not only is an entirely different amino acid sequence specified by the different reading frames, but two of the three frames have stop codons, and thus are not open reading frames.

Recognition sequence - a specific palindromic sequence within a double-stranded DNA molecule which is recognized by a restriction endonuclease, and at which the restriction endonuclease specifically cleaves the DNA molecule.

Recombination - see homologous recombination.

Recombination-repair - a mode of filling a gap in one strand of duplex DNA by retrieving a homologous single strand from another duplex. Usually the underlying mechanism behind homologous recombination and gene conversion.

Relaxed DNA - see supercoil.

Repetitive DNA- A surprising portion of any genome consists not of genes or structural elements, but of frequently repeated simple sequences. These may be short repeats just a few nt long, like CACACA etc. They can also range up to a few hundred nt long. Examples of the latter include Alu repeats, LINEs, SINEs. The function of these elements is often unknown. In shorter repeats like di- and tri-nucleotide repeats, the number of repeating units can occasionally change during evolution and descent. They are thus useful markers for familial relationships and have been used in paternity testing, forensic science and in the identification of human remains.

Replication - the copying of a nucleic acid molecule into a new nucleic acid molecule of similar type (i.e., DNA --> DNA, or rna --> rna).

Reporter gene - the use of a functional enzyme, such as beta-galactosidase, luciferase, or chloramphenicol acetyltransderase, downstream of a gene, promoter, or translational control element of interest, to more easily identify successful introduction of the gene into a host and to measure transcription and/or translation.

Repression - inhibition of transcription (or translation) by the binding of a repressor protein to a specific site on DNA (or mrna).

Residue - as applied to proteins, what remains of an amino acid after its incorporation into a peptide chain, with subsequent loss of a water molecule (see peptide bond).

Response element- By definition, a "response element" is a portion of a gene which must be present in order for that gene to respond to some hormone or other stimulus. Response elements are binding sites for transcription factors. Certain transcription factors are activated by stimuli such as hormones or heat shock. A gene may respond to the presence of that hormone because the gene has in its promoter region a binding site for hormone-activated transcription factor. Example- the glucocorticoid response element (GRE).

Restriction endonuclease - a bacterial enzyme which recognizes a specific palindromic sequence (recognition sequence) within a double-stranded DNA molecule and then catalyzes the cleavage of both strands at that site. Also called a restriction enzyme. Restriction endonucleases may generate either blunt or sticky ends at the site of cleavage.

Restriction enzyme- A class of enzymes ("restriction endonucleases") generally isolated from bacteria, which are able to recognize and cut specific sequences ("restriction sites") in DNA

Restriction fragment length polymorphism (rflp) - variations in the lengths of fragments of DNA generated by digestion of different DNAs with a specific restriction endonuclease, reflecting genetic variation (polymorphism) in the DNAs.

Restriction fragment length polymorphism- See "RFLP".

Restriction fragment- The piece of DNA released after restriction digestion of plasmids or genomic DNA. See "Restriction enzyme". One can digest a plasmid and isolate one particular restriction fragment (actually a set of identical fragments). The term also describes the fragments detected on a genomic blot which carry the gene of interest.

Restriction map - a linear array of sites on a particular DNA which are cleaved by various selected restriction endonucleases.

Restriction site - see recognition sequence.

Restriction- To "restrict" DNA means to cut it with a restriction enzyme. See "Restriction Enzyme".

Reticulocyte lysate - a lysate of rabbit reticulocytes, which has been extensively digested with micrococcal nuclease to destroy the reticulocyte mrnas. With the addition of an exogenous, usually synthetic, mrna, amino acids and a source of energy (atp), the translational machinery of the reticulocyte (ribosomes, eukaryotic translation factors, etc.) Will permit in vitro translation of the added mrna with production of a new polypeptide. This is only one of several available in vitro translation systems.

Reverse transcriptase- An enzyme which will make a DNA copy of an RNA template - a DNA-dependant RNA polymerase. RT is used to make cDNA; one begins by isolating polyadenylated mRNA, providing oligo-dT as a primer, and adding nucleotide triphosphates and RT to copy the RNA into cDNA.

Reverse transcription - copying of an rna molecule into a DNA molecule.

Revertant - see back mutation.

RFLP- Restriction fragment length polymorphism; the acronym is pronounced "riflip". Although two individuals of the same species have almost identical genomes, they will always differ at a few nucleotides. Some of these differences will produce new restriction sites (or remove them), and thus the banding pattern seen on a genomic Southern will thus be affected. For any given probe (or gene), it is often possible to test different restriction enzymes until you find one which gives a pattern difference between two individuals - a RFLP. The less related the individuals, the more divergent their DNA sequences are and the more likely you are to find a RFLP.

Ribonuclease (rnase) - an enzyme which catalyzes the hydrolysis of rna. There are many different rnases, some of the more important include-ex 1)rnase a cleaves ssrna 3' of pyrimidines 2) rnase t1 cleaves ssrna at guanasine nucleotides 3) rnase v1 cleaves dsrna (helical regions) 4) rnase h degrades the rna part of rna-DNA hybrids.

Riboprobe- A strand of RNA synthesized in-vitro (usually radiolabeled) and used as a probe for hybridization reactions. An RNA probe can be synthesized at very high specific activity, is single stranded (and therefore will not self anneal), and can be used for very sensitive detection of DNA or RNA.

Ribosomal binding sequence (shine-dalgarno sequence) - in prokaryotic organisms, part or all of the polypurine sequence aggagg located on mrna just upstream of an aug initiation codon; it is complementary to the sequence at the 3' end of 16s rrna; and involved in binding of the ribosome to mrna. The internal ribosomal entry site found in some viruses may be an analogous eukaryotic genetic element.

Ribosome - a complex ribonucleoprotein particle (eukaryotic ribosomes contain 4 rnas and at least 82 proteins) which is the "machine" which translates mrna into protein molecules. In eukaryotic cells, ribosomes are often in close proximity to the endoplasmic reticulum.

Ribozyme - a catalytically active rna. A good example is the hepatitis delta virus rna which is capable of self-cleavage and self-ligation in the absence of protein enzymes.

Rna polymerase - a polymerase which synthesizes rna (see polymerase).

Rna splicing - a complex and incompletly understood series of reactions occuring in the nucleus of eukaryotic cells in which pre-mrna transcribed from chromosomal DNA is processed such that noncoding regions of the pre-mrna (introns) are excised, and coding regions (exons) are covalently linked to produce an mrna molecule ready for transport to the cytoplasm. Because of splicing, eukaryotic DNA representing a gene encoding any given protein is usually much larger than the mrna from which the protein is actually translated.

RNAi- 'RNA interference' (a.k.a. 'RNA silencing') is the mechanism by which small double-stranded RNAs can interfere with expression of any mRNA having a similar sequence. Those small RNAs are known as 'siRNA', for short interfering RNAs. The mode of action for siRNA appears to be via dissociation of its strands, hybridization to the target RNA, extension of those fragments by an RNA-dependent RNA polymerase, then fragmentation of the target. Importantly, the remnants of the target molecule appears to then act as an siRNA itself; thus the effect of a small amount of starting siRNA is effectively amplified and can have long-lasting effects on the recipient cell.

RNase protection assay- This is a sensitive method to determine (1) the amount of a specific mRNA present in a complex mixture of mRNA and/or (2) the sizes of exons which comprise the mRNA of interest. A radioactive DNA or RNA probe (in excess) is allowed to hybridize with a sample of mRNA (for example, total mRNA isolated from tissue), after which the mixture is digested with single-strand specific nuclease. Only the probe which is hybridized to the specific mRNA will escape the nuclease treatment, and can be detected on a gel. The amount of radioactivity which was protected from nuclease is proportional to the amount of mRNA to which it hybridized. If the probe included both intron and exons, only the exons will be protected from nuclease and their sizes can be ascertained on the gel.

RNase- Ribonuclease; an enzyme which degrades RNA. It is ubiquitous in living organisms and is exceptionally stable. The prevention of RNase activity is the primary problem in handling RNA.

rRNA- "ribosomal RNA"; any of several RNAs which become part of the ribosome, and thus are involved in translating mRNA and synthesizing proteins. They are the most abundant RNA in the cell (on a mass basis).

Rt/pcr reaction - a series of reactions which result in rna being copied into DNA and then amplified. A single primer is used to make single-stranded cDNA copies from an rna template under direction of reverse transcriptase. A second primer complementary to this "first strand" cDNA is added to the reaction mix along with taq polymerase, resulting in synthesis of double-stranded DNA. The reaction mix is then cycled (denaturation, annealing of primers, extension) to amplify the DNA by conventional pcr.

Runoff transcript - rna which has been synthesized from plasmid DNA (usually by a bacteriophage rna polymerase such as t7 or sp6) and which terminates at a specific 3' site because of prior cleavage of the plasmid DNA with a restriction endonuclease.

Run-on- see Nuclear run-on.

S1 end mapping- A technique to determine where the end of an RNA transcript lies with respect to its template DNA (the gene). Can't be described in a short paragraph. See "RNAse Protection assay" for a closely related technique.

S1 nuclease- An enzyme which digests only single-stranded nucleic acids.

Screening- To screen a library (see "Library") is to select and isolate individual clones out of the mixture of clones. For example, if you needed a cDNA clone of the pituitary glycoprotein hormone alpha subunit, you would need to make (or buy) a pituitary cDNA library, then screen that library in order to detect and isolate those few bacteria carrying alpha subunit cDNA. Screening methods are 1) Screening by hybridization involves spreading the mixture of bacteria out on a dozen or so agar plates to grow several ten thousand isolated colonies. Membranes are laid onto each plate, and some of the bacteria from each colony stick, producing replicas of each colony in their original growth position. The membranes are lifted and the adherent bacteria are lysed, then hybridized to a radioactive piece of alpha DNA (the source of which is a story in itself - see "Probe"). When X-ray film is laid on the filter, only colonies carrying alpha sequences will "light up". Their position on the membranes show where they grew on the original plates, so you now can go back to the original plate (where the remnants of the colonies are still alive), pick the colony off the plate and grow it up. You now have an unlimited source of alpha cDNA. And 2) Screening by antibody is an option if the bacteria and plasmid are designed to express proteins from the cDNA inserts (see "Expression clones"). The principle is similar to hybridization, in that you lift replica filters from bacterial plates, but then you use the antibody (perhaps generated after olde tyme protein purification rituals) to show which colony expresses the desired protein.

SDS-page - denaturing protein gel electrophoresis (see polyacrylamide gel electrophoresis).

Secondary structure - (also see primary and tertiary structure) local structure within a protein which is conferred by the nature of the side chains of adjacent amino acids (e.g., alpha helix, beta sheet, random coil); local structure within an rna molecule which is conferred by base pairing of nucleotides which are relatively closely positioned within the sequence (e.g., hairpins, stem-loop structures).

Selection - the use of particular conditions, such as the presence of ampicillin, to allow survival only of cells with a particular phenotype, such as production of beta-lactamase.

Sense strand- A gene has two strands- the sense strand and the anti-sense strand. The Sense strand is, by definition, the same 'sense' as the mRNA; that is it can be translated exactly as the mRNA sequence can. the term ‘coding strand’ and ‘non-coding strand’ to refer to the sense and antisense strands, respectively. Unfortunately, many people interpret these terms in exactly the opposite way. I consider the terms ‘coding strand’ and ‘non-coding strand’ to be too ambiguous. Some people use the exact opposite definition for ‘sense’ and ‘anti-sense’ that I have given here. Be aware of the possibility of a discrepancy. Textbooks I have consulted generally agree with the nomenclature given herein, albeit some avoid defining these terms at all.

Sequence- As a noun, the sequence of a DNA is a buzz word for the structure of a DNA molecule, in terms of the sequence of bases it contains. As a verb, "to sequence" is to determine the structure of a piece of DNA;

Sequence polymorphism - see polymorphism.

Sequential epitope - see linear epitope.

Shotgun cloning or sequencing - cloning of an entire genome or large piece of DNA in the form of randomly generated small fragments. The individual sequences obtained from the clones will be used to construct contigs.

Shotgun cloning- The practice of randomly clipping a larger DNA fragment into various smaller pieces, cloning everything, and then studying the resulting individual clones to figure out what happened. For example, if one was studying a 50 kb gene, it "may" be a bit difficult to figure out the restriction map. By randomly breaking it into smaller fragments and mapping those, a master restriction map could be deduced. See also Shotgun sequencing.

Shotgun sequencing- A way of determining the sequence of a large DNA fragment which requires little brainpower but lots of late nights. The large fragment is shotgun cloned (see above), and then each of the resulting smaller clones ("subclones") is sequenced. By finding out where the subclones overlap, the sequence of the larger piece becomes apparent. Note that some of the regions will get sequenced several times just by chance.

Shuttle vector - a small plasmid capable of transfection into both prokaryotic and eukaryotic cells.

Side chain - see amino acid.

Sigma factor - certain small ancillary proteins in bacteria that increase the binding affinity of rna polymerase to a promoter. Different sigma factors recognize different promoter sequences.

Signal peptidase - an enzyme present within the lumen of the endoplasmic reticulum which proteolytically cleaves a secreted protein at the site of a signal sequence.

Signal sequence - a hydrophobic amino acid sequence which directs a growing peptide chain to be secreted into the endoplasmic reticulum.

Silent mutation - a nucleotide substitution (never a single deletion or insertion) which does not alter the amino acid sequence of an encoded protein due to the degeneracy of the genetic code. Such mutations usually involve the third base (wobble position) of codons.

siRNA- Small Inhibitory RNA; a.k.a. 'RNAi'. See 'RNAi'.

Site-directed mutagenesis - the introduction of a mutation, usually a point mutation or an insertion, into a particular location in a cloned DNA fragment. This mutated fragment may be used to "knock out" a gene in the organism of interest by homologous recombination.

Site-specific recombination - occurs between two specific but not necessarily homologous sequences. Usually catalyzed by enzymes not involved in general or homologous recombination.

Slot blot- Similar to a dot blot, but the analyte is put onto the membrane using a slot-shaped template. The template produces a consistently shaped spot, thus decreasing errors and improving the accuracy of the analysis. See Dot blot.

SNP- Single Nucleotide Polymorphism (SNP) - a position in a genomic DNA sequence that varies from one individual to another. It is thought that the primary source of genetic difference between any two humans is due to the presence of single nucleotide polymorphisms in their DNA. Furthermore, these SNPs can be extremely useful in genetic mapping (see 'Genetic Mapping') to follow inheritance of specific segments of DNA in a lineage. SNP-typing is the process of determining the exact nucleotide at positions known to be polymorphic.

snRNP- "snerps", Small Nuclear RiboNucleoProtein particles, which are complexes between small nuclear RNAs and proteins, and which are involved in RNA splicing and polyadenylation reactions.

Solution hybridization- A method closely related to RNase protection (see "RNase protection assay"). Solution hybridization is designed to measure the levels of a specific mRNA species in a complex population of RNA. An excess of radioactive probe is allowed to hybridize to the RNA, then single-strand specific nuclease is used to destroy the remaining unhybridized probe and RNA. The "protected" probe is separated from the degraded fragments, and the amount of radioactivity in it is proportional to the amount of mRNA in the sample which was capable of hybridization. This can be a very sensitive detection method.

Southern blot - DNA is separated by electrophoresis (usually in agarose gels), then transferred to nitrocellulose paper or other suitable solid-phase matrix (e.g., nylon membrane), and denatured into single strands so that it can be hybridized with a specific probe. The southern blot was developed by e.m. Southern, a molecular biologist in edinburgh. Northern and western blots were given contrasting names to reflect the different target substances (rna and proteins, respectively) that are subjected in these procedures to electrophoresis, blotting and subsequent detection with specific probes.

Sp6 rna polymerase - a bacteriophage rna polymerase which is commonly used to transcribe plasmid DNA into rna. The plasmid must contain an sp6 promoter upstream of the relevant sequence.

Splicing - see rna splicing.

Ss - single stranded.

SSR- Simple Sequence Repeat. See 'Microsatellite'.

Stable transfection- A form of transfection experiment designed to produce permanent lines of cultured cells with a new gene inserted into their genome. Usually this is done by linking the desired gene with a "selectable" gene, i.e. a gene which confers resistance to a toxin (like G418, aka Geneticin). Upon putting the toxin into the culture medium, only those cells which incorporate the resistance gene will survive, and essentially all of those will also have incorporated the experimenter's gene.

Start codon - see initiation codon.

Stem-loop - a feature of rna secondary structure, in which two complementary, inverted sequences which are separated by a short-intervening sequence within a single strand of rna base pair to form a '"stem" with a "loop" at one end. Similar to a hairpin, but these usually have very small loops and longer stems.

Sticky end - the terminus of a DNA molecule which has either a 3' or 5' overhang, and which typically results from a cut by a restriction endonuclease. Such termini are capable of specific ligation reactions with other termini which have complementary overhangs. A sticky end can be "blunt ended" either by the removal of an overhang, or a "filling in" reaction which adds additional nucleotides complementary to the overhang

Stop codon - a codon (uaa, uag, uga) which terminates translation.

Streptavidin - a bacterial analog of egg white avidin.

Stringency- A term used to describe the conditions of hybridization. By varying the conditions (especially salt concentration and temperature) a given probe sequence may be allowed to hybridize only with its exact complement (high stringency), or with any somewhat related sequences (relaxed or low stringency). Increasing the temperature or decreasing the salt concentration will tend to increase the selectivity of a hybridization reaction, and thus will raise the stringency.

Sub-cloning- If you have a cloned piece of DNA (say, inserted into a plasmid) and you need unlimited copies of only a part of it, you might "sub-clone" it. This involves starting with several million copies of the original plasmid, cutting with restriction enzymes, and purifying the desired fragment out of the mixture. That fragment can then be inserted into a new plasmid for replication. It has now been subcloned.

Supercoil - double-stranded circular DNA which is twisted about itself. Commonly observed with plasmids and circular viral DNA genomes (such as that of hepatitis b virus). A nick in one strand of the plasmid may remove the twist, resulting in a relaxed, circular DNA molecule. A complete break in the DNA puts the plasmid in a linear form. Supercoils, relaxed circular DNA, and linear DNA all have different migration properties in agarose gels, even though they contain the same number of base pairs.

T7 rna polymerase - a bacteriophage rna polymerase which is commonly used to transcribe plasmid DNA into rna. The plasmid must contain a t7 promoter upstream of the relevant sequence.

Taq polymerase- A DNA polymerase isolated from the bacterium Thermophilis aquaticus and which is very stable to high temperatures. It is used in PCR procedures and high temperature sequencing.

TATA box- A sequence found in the promoter (part of the 5' flanking region) of many genes. Deletion of this site (the binding site of transcription factor TFIID) causes a marked reduction in transcription, and gives rise to heterogeneous transcription initiation sites.

Template - a nucleic acid strand, upon which a primer has annealed and a nascent rna stand is being extended.

Termination codon - see stop codon.

Terminator - a sequence downstream from the 3' end of an open reading frame that serves to halt transcription by the rna polymerase. In bacteria these are commonly sequences that are palindromic and thus capable of forming hairpins. Sometimes termination requires the action of a protein, such as rho factor in e. Coli.

Tertiary structure - (also see primary and secondary structure) refers to higher ordered structures conferred on proteins or nucleic acids by interactions between amino acid residues or nucleotides which are not closely positioned within the sequence (primary structure) of the molecule.

Tet resistance- See "Antibiotic resistance".

Tissue-specific expression- Gene function which is restricted to a particular tissue or cell type. For example, the glycoprotein hormone alpha subunit is produced only in certain cell types of the anterior pituitary and placenta, not in lungs or skin; thus expression of the glycoprotein hormone alpha-chain gene is said to be tissue-specific. Tissue specific expression is usually the result of an enhancer which is activated only in the proper cell type.

Tm- The melting point for a double-stranded nucleic acid. Technically, this is defined as the temperature at which 50% of the strands are in double-stranded form and 50% are single-stranded, i.e. midway in the melting curve. A primer has a specific Tm because it is assumed that it will find an opposite strand of appropriate character.

Trans - as used in molecular biology, an interaction that involves two sites which are located on separate molecules.

Transcript - a newly made rna molecule which has been copied from DNA.

Transcription - the copying of a DNA template into a single-stranded rna molecule. The processes whereby the transcriptional activity of eukaryotic genes are regulated are complex, involve a variety of accessory transcriptional factors which interact with promoters and polymerases, and constitute one of the most important areas of biological research today.

Transcription factor- A protein which is involved in the transcription of genes. These usually bind to DNA as part of their function (but not necessarily). A transcription factor may be general (i.e. acting on many or all genes in all tissues), or tissue-specific (i.e. present only in a particular cell type, and activating the genes restricted to that cell type). Its activity may be constitutive, or may depend on the presence of some stimulus; for example, the glucocorticoid receptor is a transcription factor which is active only when glucocorticoids are present.

Transcription- The process of copying DNA to produce an RNA transcript. This is the first step in the expression of any gene. The resulting RNA, if it codes for a protein, will be spliced, polyadenylated, transported to the cytoplasm, and by the process of translation will produce the desired protein molecule.

Transcription/translation reaction - an in vitro series of reactions, involving the synthesis (transcription) of an mrna from a plasmid (usually with t7 or sp6 rna polymerase), followed by use of the mrna to program translation in a cell-free system such as a rabbit reticulocyte lysate. The polypeptide product of translation in usually labelled with [35s]-methionine, and examined in an sds-page gel with or without prior immunoprecipitation. This series of reactions permits the synthesis of a polypeptide from DNA in vitro.

Transcriptional start site - the nucleotide of a gene or cistron at which transcription (rna synthesis) starts; the most common triplet at which transcription begins in e. Coli is cat. Primer extension identifies the transcriptional start site.

Transfection- A method by which experimental DNA may be put into a cultured mammalian cell. Such experiments are usually performed using cloned DNA containing coding sequences and control regions (promoters, etc) in order to test whether the DNA will be expressed. Since the cloned DNA may have been extensively modified (for example, protein binding sites on the promoter may have been altered or removed), this procedure is often used to test whether a particular modification affects the function of a gene.

Transformation (with respect to bacteria)- The process by which a bacteria acquires a plasmid and becomes antibiotic resistant. This term most commonly refers to a bench procedure performed by the investigator which introduces experimental plasmids into bacteria.

Transformation (with respect to cultured cells)- A change in cell morphology and behavior which is generally related to carcinogenesis. Transformed cells tend to exhibit characteristics known collectively as the "transformed phenotype" (rounded cell bodies, reduced attachment dependence, increased growth rate, loss of contact inhibition, etc). There are different "degrees" of transformation, and cells may exhibit only a subset of these characteristics. Not well understood, the process of transformation is the subject of intense research.

Transgene - a foreign gene which has been introduced into the germ line of an animal species.

Transgenic - an animal (usually a mouse) or plant into which a foreign gene has been introduced in the germ line. An example- transgenic mice expressing the human receptor for poliovirus are susceptible to human polioviruses.

Transgenic mouse- A mouse which carries experimentally introduced DNA. The procedure by which one makes a transgenic mouse involves the injection of DNA into a fertilized embryo at the pro-nuclear stage. The DNA is generally cloned, and may be experimentally altered. It will become incorporated into the genome of the embryo. That embryo is implanted into a foster mother, who gives birth to an animal carrying the new gene. Various experiments are then carried out to test the functionality of the inserted DNA.

Transient transfection- When DNA is transfected into cultured cells, it is able to stay in those cells for about 2-3 days, but then will be lost (unless steps are taken to ensure that it is retained - see Stable transfection). During those 2-3 days, the DNA is functional, and any functional genes it contains will be expressed. Investigators take advantage of this transient expression period to test gene function.

Transition - a nucleotide substitution in which one pyrimidine is replaced by the other pyrimidine, or one purine replaced by the other purine (e.g., a is changed to g, or c is changed to t) (contrast with transversion) .

Translation- The process of decoding a strand of mRNA, thereby producing a protein based on the code. This process requires ribosomes (which are composed of rRNA along with various proteins) to perform the synthesis, and tRNA to bring in the amino acids. Sometimes, however, people speak of "translating" the DNA or RNA when they are merely reading the nucleotide sequence and predicting from it the sequence of the encoded protein. This might be more accurately termed "conceptual translation".

Translocation - the process by which a newly synthesized protein is directed toward a specific cellular compartment (i.e, the nucleus, the endoplasmic reticulum).

Transposition - the movement of DNA from one location to another location on the same molecule, or a different molecule within a cell.

Transposon - a transposable genetic element; certain sequence elements which are capable of moving from one site to another in a DNA molecule without any requirement for sequence relatedness at the donor and acceptor sites. Many transposons carry antibiotic resistance determinants and have insertion sequences at both ends, and thus have two sets of inverted repeats.

Transversion - a nucleotide substitution in which a purine replaces a pyrimidine, or vice versa (e.g., a is changed to t, or t is changed to g) (see transition)

Triplet - a three-nucleotide sequence; a codon.

tRNA - small, tightly folded rna molecules which act to bring specific amino acids into translationally active ribosomes in a fashion which is dependent upon the mrna sequence. One end of the trna molecule recognizes the nucleotide triplet which is the codon of the mrna, while the other end (when activated) is covalently linked to the relevant amino acid.

Tumor suppressor- A gene that inhibits progression towards neoplastic transformation. The best-known examples of tumor suppressors are the proteins p53 and Rb.

Untranslated rna - see nontranslated rna.

Upstream - identifies sequences located in a direction opposite to that of expression; for example, the bacterial promoter is upstream of the initiation codon. In an mrna molecule, upstream means toward the 5' end of the molecule. Occasionally used to refer to a region of a polypeptide chain which is located toward the amino terminus of the molecule.

Upstream activator sequence- A binding site for transcription factors, generally part of a promoter region. A UAS may be found upstream of the TATA sequence (if there is one), and its function is (like an enhancer) to increase transcription. Unlike an enhancer, it can not be positioned just anywhere or in any orientation.

Upstream/Downstream- In an RNA, anything towards the 5' end of a reference point is "upstream" of that point. This orientation reflects the direction of both the synthesis of mRNA, and its translation - from the 5' end to the 3' end. In DNA, the situation is a bit more complicated. In the vicinity of a gene (or in a cDNA), the DNA has two strands, but one strand is virtually a duplicate of the RNA, so it's 5' and 3' ends determine upstream and downstream, respectively. NOTE that in genomic DNA, two adjacent genes may be on different strands and thus oriented in opposite directions. Upstream or downstream is only used on conjunction with a given gene.

Vector- The DNA "vehicle" used to carry experimental DNA and to clone it. The vector provides all sequences essential for replicating the test DNA. Typical vectors include plasmids, cosmids, phages and YACs.

Western blot- A technique for analyzing mixtures of proteins to show the presence, size and abundance of one particular type of protein. Similar to Southern or Northern blotting (see "Blotting"), except that (1) a protein mixture is electrophoresed in an acrylamide gel, and (2) the "probe" is an antibody which recognizes the protein of interest, followed by a radioactive secondary probe (such as 125I-protein A).

Wildtype - the native or predominant genetic constitution before mutations, usually referring to the genetic consitution normally existing in nature.

Wobble position - the third base position within a codon, which can often (but not always) be altered to another nucleotide without changing the encoded amino acid (see degeneracy).

YAC- Yeast artificial chromosome. This is a method for cloning very large fragments of DNA. Genomic DNA in fragments of 200-500 kb are linked to sequences which allow them to propagate in yeast as a mini-chromosome (including telomeres, a centromere and an ARS - an autonomous replication sequence). This technique is used to clone large genes and intergenic regions, and for chromosome walking.

Zinc finger- A protein structural motif common in DNA binding proteins. Four Cys residues are found for each "finger" and one finger can bind a molecule of zinc. A typical configuration is- CysXxxXxxCys--(intervening 12 or so aa's)--CysXxxXxxCys.