Home

Course Description

Rapid developments in biotechnology and computing are changing the way that biomedical scientists interact with data. Traditionally, data were the end result of laborious experimentation, and their interpretation mostly involved careful thought and background knowledge. Today, data are increasingly generated much earlier in the scientific workflow and are much larger in scale. Also, before the data can be interpreted, extensive computational processing is often necessary. Thus, the data deluge now requires the mining and modeling of biomedical data at a large scale - ie biomedical data science. 

This course aims to equip students with some of the concepts and skills relevant to biomedical data science, with an emphasis on bioinformatics, a sub-discipline of this broader field, through examples of mining and modeling of genomic and proteomic data. More specifically, bioinformatics encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, mining of functional genomics data sets, and machine learning approaches for data integration.

Overall Flow of the Class:

(Module = Group of Lectures)

Lectures:

Discussion Section:

Different headings for this class (4 variants)

This graduate-level version of the course consists of lectures, in-class tests, discussion section, programming assignments, and a final programming project.

This graduate-level version of the course consists of lectures, in-class tests, 

 discussion section, written problem sets, and a final (semi-computational section and a literature survey) project. Unlike CBB752, there is no programming required.

For graduate students the course can be broken up into two "modules" (each counting 0.5 credit towards MB&B course requirement):

753 - Biomedical Data Science: Mining (1st half of term)

754 - Biomedical Data Science: Modeling (2nd half of term)

Each module consists of lectures, in-class tests, written problem sets, and a final, graduate level written project that is half the length of the full course's final project.

Prerequisites

The course is keyed towards CBB graduate students as well as advanced undergraduates and graduate students wishing to learn about types of large-scale quantitative analysis that whole-genome sequencing and forms of large-scale biological data will make possible. It would also be suitable for students from other fields such as computer science, statistics or physics wanting to learn about an important new biological application for computation.

Students should have:

These can be fulfilled by: MBB 200 and Mathematics 115 or permission of the instructor.

Class Requirements

Discussion Section / Readings

Papers will be assigned throughout the course. These papers will be presented and discussed in weekly 60-minute sections with the TFs. A brief summary (a half-page per article) should be submitted at the beginning of the discussion session.

In-class tests: Midterm & Quiz

For references, please refer the previous quizzes and answer keys from Fall 2012

Programming Assignments (Req'd for CBB and CS grad. students)

Non-programming Assignments 

The course syllabus as a single PDF can be found HERE

Pages from previous years

2018 is the 20th time Bioinformatics has been taught at Yale. Pages for the 19 previous iterations of the class are available. Look at how things evolve!