Antibody | Sequence Activity Relationships | SARvision | Numbering
Studying Sequence Activity Relationships in Antibodies
by Mark Hansen, Ph.D.
To study antibodies, getting a well defined numbering scheme for the proteins is critical to align, annotate and navigate the data. There are a number of well documented antibody number schemes, most of which are implemented in open source tools such as Anarci (Bioinformatics: 2016 Jan 15;32(2):298-300. doi: 10.1093/bioinformatics/btv552. Epub 2015 Sep 30). Using a tool to number antibodies in a consistent way can help construct a sequence alignment table that can be coupled with activity data to study Sequence Activity Relationships within families of antibodies. Annotated sequences are grouped into rows and the sequences are labeled by light vs. heavy chain, sequence motifs by variable (CDR) vs. framework regions, and finally sequence positions by the the numbering algorithm of a given naming scheme (Kabat, Clothia….). Arranging antibody sequences in rows and being able to focus analysis on specific regions of interest, can help elucidate correlations between sequence and activity.
In the example below, a sequence table is created in which sequences are labeled, numbered and aligned based on the Kabat numbering scheme. Biological data and other information about each sequence is shown in columns to the left of the alignment table for easy reference. The alignment table has the sequence columns filtered to focus on just the 3 CDR domains of light chain and to compare substitution patterns in these 3 domains to the activity: Neutralization potency (ug/mL). Residues are color coded based on type (e.g. neutral, positively charged, aliphatic and aromatic), the activity column is heat-mapped by potency and the rows sorted from most to least active. On the top are labels for the 3 light chain CDR motifs, the labels for the sequence position numbers, and finally a consensus sequence for the set.
SARvision|Biologics is a sequence analysis tool that can build antibody alignment tables to facilitate sequence activity relationship analysis. This program can import antibody sequences, number and label them using a program that mimics Anarci as close as possible and load these into a table with associated data. The best format for the data is tabular where the light chain and heavy chain are concatenated into a single sequence column, delimited by a ‘|’ symbol. The pipe symbol represents chain breaks in SARvision. This data can be loaded (main menu->File->Import) from file (*.csv Excel formatted), CDDVault or Oracle. In the import dialog check the Analyze antibodies… box and select the naming methodology (Kabat, IMTG, Chothia….) in the following popup.
Once data is inside SARvision, views can be added and optimize for interpretation. Right click->Sequence table tab->Columns to format column text, show and hide columns, change the order of columns and to scroll-lock columns to the left of the display table. Right clicking->Column header opens a menu to perform sorting and heat-mapping of data in the column. The sequence alignment can be colored by a number of possible options using the Sequence Table control located to the right of the table. Right click->template->edit template will control which parts of the antibody (CDRs vs frameworks) that would be visible in a view. With a few short manipulations, the Sequence Table view can be modified to highlight trends for easy interpretation and presentation. Last, add any views that would be useful for further analysis including invariant maps, logo plots and mutation cliffs.
If the data set is an antibody conjugate project, the chemical warheads and any linkers can be simply added and saved with the monomer table. These can be added to the sequence by simply appending the warhead and linker to the end of the sequence. These will appear as a new column appended to the end of the alignment table. This is an advanced operation: contact us for any help.
The recommended format for Antibody sequences is light chain sequence concatenated with heavy chain sequence, delimited with a chain break: ‘|’ (pipe).
DIQM…..LTVL|QVQLV….TVSS
With a drug congugate the chemical warhead can be concatenated on the end with a pipe to make its own chain:
DIQM…..LTVL|QVQLV….TVSS|[DrugName] where ‘DrugName’ could be registered in the monomer database with structure and details.
This sequence would reside in the Sequence column of the input file which could look something like below.