Hello. After seven year old Philip saw the Jurassic Park movie, he had a dream of seeing a real dinosaur one day. The world's most famous paleontologist, Jack Horner, is now 70 years old, but he has the same dream. But how can we recreate the dinosaur if we don't know its genome? Jack Horner was shy and introverted when he was growing up. He progressed so slowly in reading and mathematics that other kids called him dumb. However, his high school project on dinosaurs won the science fair and he was admitted to the University of Montana. However, after failing five consecutive quarters, he dropped out. Fortunately for Horner, he eventually found his calling. After working as a truck driver, he accepted a job as a technician at Princeton, where he quickly established a reputation as a brilliant researcher. He would go on to become an advisor to Steven Spielberg for the Jurassic Park movie. By that time, Horner had learned that he suffers from dyslexia, a disorder that is characterized by difficulty with mathematics. He was able to succeed despite dyslexia because paleontologists hardly ever use mathematics. However, Horner's own student would show that even paleontology is not immune from computing. 15 years ago, Horner was exploring his favorite Dinosaur graveyard and discovered a 68 million years old T-Rex fossil. He gave a chunk of this fossil to his student, Mary Schweitzer, who de-mineralized it and sent it to mass spectrometrist John Asara. In 2007, after analyzing thousands of spectra, Asara, Horner, and Schweitzer published a paper in science announcing the discovery of T-Rex peptides. Amazingly, these T-Rex peptides were nearly identical to chicken peptides. Thus supporting the controversial hypothesis that birds evolved from dinosaurs. Horner even published a book called "How to Build a Dinosaur," to explain how to genetically modify a chicken to re-create a dinosaur. Yet, some scientists remain skeptical. While previous dinosaur studies did not require much computation, T-Rex analysis was powered by a complex and error prone algorithm. But how can we know which side is correct? Today, we will investigate the T-Rex peptide by developing a protein, rather than DNA, sequencing algorithm. We have already talked about Frederick Sanger and his invention of DNA sequencing technology four decades ago. Yet, Sanger had already won his first Nobel prize six decades ago for determining the amino acid sequence of insulin. Similar to how scientists sequence genomes, Sanger broke multiple molecules of insulin into short peptides, sequenced those peptides, and then assembled them into the amino acid sequence of insulin. Also, protein sequencing was difficult in the 1950s, but DNA sequencing was impossible. Today, DNA sequencing is trivial, but protein sequencing remains difficult. That is why most proteins are discovered by first sequencing a genome and then predicting all of the genes that this genome encodes. By translating the nucleotide sequence of each protein coding gene into an amino acid sequence, biologists derive a putative proteome of a species. However, different cells in an organism express different proteins. For example, brain cells express proteins giving rise to neuropeptides. But kidney cells do not. An important problem in the study of proteins, or proteomics, is to identify which specific proteins are present in each biological tissue, and how they interact. Today, instead of using Sanger's old protein sequencing approach, biologists use mass spectrometers: Expensive and very accurate molecular scales. But modern mass spectrometers cannot read individual amino acids. Instead, they generate a cryptic fingerprint of each peptide, called a "mass spectrum". Our goal today is to decode these fingerprints. To analyze proteins, we need to start breaking them into pieces and measuring masses of the resulting fragments, using mass spectrometers, of course. Let's recall that different amino acids have different masses. For example, the mass of glycine is 57, but the mass of alanine is 71. A mass spectrometer generally breaks each protein molecule into two parts, that we call "suffix" and "prefix" fragments, and measures their masses. It's important to realize that, when biologists analyze samples, there are millions of different molecules of the same peptides in the sample, and each of these peptides may break individually over individual bonds. Mass spectrometers measure the masses of all these fragments. This simple scenario is a little bit more complicated in practice because, in reality, most mass spectrometers can only measure masses of relatively short fragments, maybe 30-40 amino acid long. To bypass this limitation, the biologists usually use proteases, such as trypsin, to break proteins into smaller pieces called "peptides". Afterwards, a mass spectrometer breaks this peptide into even smaller charged fragment ions and measures the mass-to-charge ratio and intensity of each ion. Intensity is a proxy for the number of fragments in the sample observed in the experiment. Please note that, for simplicity in this chapter, we will assume that all masses are integers and all charges are one. Our goal is to reconstruct the peptide from this rather complex fingerprint. Here is one of T-Rex's spectra published in 2007. Try to figure out what peptide generated this spectrum, and you will learn something about T-Rex proteins.