Appendix#

Which descriptors are computed#

The Chemistry Development Kit provides a large list of molecular descriptors, as API for the org.openscience.cdk.qsar.descriptors.molecular package reports.

We can get the list of such descriptors by running the following code:

import re
from urllib.request import urlopen

API_URL = 'https://cdk.github.io/cdk/2.9/docs/api/org/openscience/cdk/qsar/descriptors/molecular/package-summary.html'
PATTERN = re.compile(r'href="(\w[^"]+).html" title="class in')

cdk_descriptors = set()
with urlopen(API_URL) as inf:
  for line in inf.read().decode('utf-8').splitlines():
    if m := PATTERN.search(line):
      cdk_descriptors.add(m.group(1))

cdk_descriptors
{'ALOGPDescriptor',
 'APolDescriptor',
 'AcidicGroupCountDescriptor',
 'AminoAcidCountDescriptor',
 'AromaticAtomsCountDescriptor',
 'AromaticBondsCountDescriptor',
 'AtomCountDescriptor',
 'AutocorrelationDescriptorCharge',
 'AutocorrelationDescriptorMass',
 'AutocorrelationDescriptorPolarizability',
 'BCUTDescriptor',
 'BPolDescriptor',
 'BasicGroupCountDescriptor',
 'BondCountDescriptor',
 'CPSADescriptor',
 'CarbonTypesDescriptor',
 'ChiChainDescriptor',
 'ChiClusterDescriptor',
 'ChiPathClusterDescriptor',
 'ChiPathDescriptor',
 'EccentricConnectivityIndexDescriptor',
 'FMFDescriptor',
 'FractionalCSP3Descriptor',
 'FractionalPSADescriptor',
 'FragmentComplexityDescriptor',
 'GravitationalIndexDescriptor',
 'HBondAcceptorCountDescriptor',
 'HBondDonorCountDescriptor',
 'HybridizationRatioDescriptor',
 'IPMolecularLearningDescriptor',
 'JPlogPDescriptor',
 'KappaShapeIndicesDescriptor',
 'KierHallSmartsDescriptor',
 'LargestChainDescriptor',
 'LargestPiSystemDescriptor',
 'LengthOverBreadthDescriptor',
 'LongestAliphaticChainDescriptor',
 'MDEDescriptor',
 'MannholdLogPDescriptor',
 'MomentOfInertiaDescriptor',
 'PetitjeanNumberDescriptor',
 'PetitjeanShapeIndexDescriptor',
 'RotatableBondsCountDescriptor',
 'RuleOfFiveDescriptor',
 'SmallRingDescriptor',
 'SpiroAtomCountDescriptor',
 'TPSADescriptor',
 'VABCDescriptor',
 'VAdjMaDescriptor',
 'WHIMDescriptor',
 'WeightDescriptor',
 'WeightedPathDescriptor',
 'WienerNumbersDescriptor',
 'XLogPDescriptor',
 'ZagrebIndexDescriptor'}

However, not all of these descriptors are computed by the jp²rt package, more precisely, the set of not computed descriptors can be obtained by the difference:

from jp2rt import descriptors

jp2rt_descriptors = set(descriptors())
not_computed = cdk_descriptors - jp2rt_descriptors

not_computed
{'CPSADescriptor',
 'GravitationalIndexDescriptor',
 'IPMolecularLearningDescriptor',
 'LengthOverBreadthDescriptor',
 'LongestAliphaticChainDescriptor',
 'MomentOfInertiaDescriptor',
 'VABCDescriptor',
 'WHIMDescriptor'}

The reason why such descriptors are not computed is that their computation returns just NaN values or raise exceptions, as one can easily check with the compute_single_descriptor() function.

import numpy as np 
from jp2rt import compute_single_descriptor

smiles = 'O=C(O)C(N)CC1=CC=C(O)C=C1'

for descriptor in not_computed:
  print(descriptor, all(np.isnan(f) for f in compute_single_descriptor(descriptor, smiles)))
CPSADescriptor True
GravitationalIndexDescriptor True
LongestAliphaticChainDescriptor True
VABCDescriptor True
WHIMDescriptor True
IPMolecularLearningDescriptor True
MomentOfInertiaDescriptor True
LengthOverBreadthDescriptor True
Apr 04, 2024 10:49:09 AM it.unimi.di.jp2rt.WrappedMolecularDescriptor calculate
WARNING: Ignoring exception during clone/calculate/getValue of LongestAliphaticChainDescriptor, descriptors replaced with 1 NaN

How this documentation is produced#

This documentation is generated using Jupiter Book, the source of the documentation is available in the jp²rt repository, in the docs directory.

Every code sample (both in Python and shell) is executed during the build of the documentation, so all the output present in the documentation is up-to-date and corresponds exactly to the output produced by the current version of the package.

If you want to run the code of this documentation besides the jp²rt package (with plot dependencies included) you need to install Jupiter Book. Otherwise you can download a precompiled copy of the documentation from the Releases page of the jp²rt repository.

The following table reports the computation time of the various code samples for every section of this documentation.

Document

Modified

Method

Run Time (s)

Status

example/descriptors

2024-04-04 10:48

cache

155.98

example/estimate

2024-04-04 10:48

cache

35.01

example/example-data

2024-04-04 10:48

cache

0.94

example/predict

2024-04-04 10:49

cache

7.8

install

2024-04-04 10:49

cache

2.33

reference/appendix

2024-04-04 10:49

cache

3.32

reference/command-line

2024-04-04 10:49

cache

8.85

Changelog#

You can find the CHANGELOG in the jp²rt repository.