Last edited: 06/07/2011
By: VLG
Sourceforge: SA2 website
Help: SA2 forums
Documentation index
This page will explain some important concepts and terminologies used by SA2. These concepts form the basis of the software, and knowing them is mandatory to take the best from SA2.
Providers were already introduced in the first version of SA to group
molecules by commercial vendors that propose collections of compounds intended to be used for
HTS campaigns or as simple building blocks (e.g. Ambinter, Prestwick...).
This concept of Provider is still available in SA2, so that similar analysis could be performed.
In practice however, there is no restriction on what a Provider is supposed to represent.
In SA2, the notion of Provider has thus been extended to represent
"where your compounds come from".
When you import a new SDF file in a SA2 database, you will systematically be asked to associate
your molecules with a Provider, whatever or whoever this Provider is.
For example, if you have two SDF files that belong to two different medicinal chemistry projects, and you want to differenciate between them somehow, you can associate each input file with a different Provider that will actually represent your projects. (or you can use specific name patterns, perform a name search, and save the results in different Libraries; your call!)
Libraries have been introduced in SA2 to represent subsets of molecules. New libraries can be created using various methods, such as simple filtering procedure, diversity analysis... Once you have created your own library, you can further analyse and compare it to the entire database, or to other libraries using the various vizualisation and chemoinformatics facilities included in SA2. You can of course export a particular library in various format.
When importing a new set of molecules in SA2, various operations will be performed by default: calculation of simple molecular descriptors (molecular weight, acceptor / donor count...), calculation of various HTS-related flags (reactive, warhead...), and the calculation of scaffolds and frameworks. The handler represents the *entity* responsible for all these calculations.
Behind the scene, each handler is associated with a particular chemistry-aware programming library (e.g. the CDK or the JOELib library) that is able to perform chemoinformatics tasks such as descriptors calculation, SMART sustructures matching etc. As detailed in Good-to-know, the choice of the handler can have significant impact on various aspects of SA2. Read carefully the documentation!
So why the concept of handler ? Well we wanted the software to be as much flexible as possible, and ideally independant of a particular toolkit. Thus, if you have your own / prefered chemoinformatics library which best suits your needs (in terms of performances, of descriptors specifications...), well you can create your own handler, and register it as a new service. Once done, you will be able to use it to perform your own operations. You can even introduce your own standardization methods directely within the handler instead of Transformers.
We also wanted to have the possibility to switch from a library to another without too much pain (i.e. SA2 do not depends directely on a particular library), e.g. if a very important function is missing or not working etc.
Note that the handler should at least be able to read SDF molecules, compute the default descriptors and perform SMART matching.
Workers are basically additional sets of operations that can optionally
be performed on each molecule when it is imported in the database. In practice, once a
molecule has been saved, each selected worker will perform its own
additional operations the molecule.
There are several workers available by default in SA2, which allow you to compute various
descriptors and fingerprints. When importing your molecules, you will have the possibility to
enable (or not!) each worker, depending on your needs.
Learn more about the available workers in the
Workers section of this documentation.
Note that you have the possibility to compute workers even after importing molecules (e.g. if you want to have a quick overview of your molecules, and eventually perform time consuming calculations - fingerprints, descriptors... - later on)
One important feature of SA2 is that it has been designed to associate to each molecule, any number of properties.
In SA2, Properties are organized by Tables. A property can by anything: biological activity, molecular descriptors... They are stored in different Tables, where you will want to group related properties together. Each Table can be seen as a simple spreadsheet table (e.g. in Excel), where rows correspond to molecules, and columns correspond to properties.
Fingerprints on the other hand, are stored in dedicated Tables. They are stored in a binary format. This is probably not the best choice in term of storage, but hey... a gigabyte is not the most expensive thing that you will need during your drug discovery pipeline ;)
Learn more about properties in the dedicated section of this documentation.
Molecules can be refered to using 3 things in the database:
Note: When a newly imported molecule already exists in the database, the name associated with this molecule will be associated with the existing molecule. As a consequence, each molecule can be associated to one or several unique original ID, and each original ID can also be associated with one or several molecules.
Note: When you export molecules or properties, the original ID as well as the database ID will be exported too.