Contact: Andreas Karwath
|Input format:||SMILES files or Weka's ARFF format|
|Output format:||Plain text|
|User-specified parameters:||Evaluation heuristic: compute_v or wracc (weighted relative accuracy) Minimum number of instances covered Minimum number of seeds Stopping error rate (default – apriori distribution)|
SMIREP/SMIPPER [KAR06] is based on combining feature generation and rule learning into one integrated
package. It constructs features, or sub graphs, by defragmenting the SMILES representations of the training
data, and refining these on the fly during the learning process. The underlying learning algorithm is similar to
that of the IREP rule learner employing a reduced error pruning approach. SMIREP is able to incorporate
external, predefined SMART patterns – like functional groups – as well as able to incorporate physico-chemical
properties during rule construction. The resulting models learned by SMIREP are sets of rules. SMIPPER employs
essentially a similar approach, by refining the found rule set repeatedly. The system can be run in three modes:
train/test, k-fold cross validation, or leave-one-out cross validation. Optionally, for each test set or fold
receivers operating characteristic curves are constructed for visualization purposes.
The software is implemented in the Python programming language and was developed for the Linux operating
system. The SMIREP software is dependent on the OpenBabel (http://www.openbabel.org) chemistry toolbox.
SMIREP is executed via a command line interface. The input format accepted are plain SMILES file or Weka's
[WIT99] ARFF format – containing the attribute SMILES and the pre-computed physico-chemical properties. The
additional SMARTS file for functional groups is a plain ASCII text file, containing the SMARTS pattern as well as
a group identifier.
For further information, we refer to the original publication [KAR06] and the website
Background (publication date, popularity/level of familiarity, rationale of approach, further comments)
Published 2006. Employs heuristic way of determining activity by defragmenting
SMILES strings of instances and refines the resulting fragments during rule
construction. Does not require pre-constructed fragments or features.
Bias (instance-selection bias, feature-selection bias, combined instance-selection/feature-selection bias, independence assumptions?, ...)
Lazy learning/eager learning
Interpretability of models (black box model?, ...)
Very good (sets of rules of SMILES string (or constraints based on physico-chemical
properties and/or predefined SMARTS pattern))
Type of Descriptor:
External components: OpenBabel
Programming language(s): Python
Operating system(s): Linux
Input format: SMILES files or Weka's ARFF format
Output format: Plain text
[KAR06] A. Karwath and L. De Raedt: SMIREP: Predicting Chemical Activity from SMILES. In: Journal of Chemical Information and Modeling, 46(6), pp. 2432-2444. (2006)
[WIT99] Witten, I.H. Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, 1999).