Command line usage

Command line usage#

Compute the descriptors#

Computing descriptors is the preliminary step for both retention time prediction and model estimation. The related subcommand is

$ jp2rt compute-descriptors --help

Usage: jp2rt compute-descriptors [OPTIONS] SRC DST

  Computes molecular descriptions.

  Reads a tab separated values file with SMILES and producing another tab
  separated values file appending molecular descriptor values.

  SRC   The source tab separated values file (must contain SMILES on the last column).
  DST   The destination tab separated values file (will have the same columns of SRC, followed by molecular descriptor values).

Options:
  --help  Show this message and exit.

If you want to know the names of the computed descriptors (i.e. the names for the columns added by the previous subcommand), you can use the subcommand

$ jp2rt list-descriptors

AcidicGroupCountDescriptor
nAcid
ALOGPDescriptor
ALogP
ALogp2
AMR
AminoAcidCountDescriptor
nA
nR
nN
nD
nC
nF
nQ
nE
nG
nH
nI
nP
nL
nK
nM
nS
nT
nY
nV
nW
APolDescriptor
apol
AromaticAtomsCountDescriptor
naAromAtom
AromaticBondsCountDescriptor
nAromBond
AtomCountDescriptor
nAtom
AutocorrelationDescriptorCharge
ATSc1
ATSc2
ATSc3
ATSc4
ATSc5
AutocorrelationDescriptorMass
ATSm1
ATSm2
ATSm3
ATSm4
ATSm5
AutocorrelationDescriptorPolarizability
ATSp1
ATSp2
ATSp3
ATSp4
ATSp5
BasicGroupCountDescriptor
nBase
BCUTDescriptor
BCUTw-1l
BCUTw-1h
BCUTc-1l
BCUTc-1h
BCUTp-1l
BCUTp-1h
BondCountDescriptor
nB
BPolDescriptor
bpol
CarbonTypesDescriptor
C1SP1
C2SP1
C1SP2
C2SP2
C3SP2
C1SP3
C2SP3
C3SP3
C4SP3
ChiChainDescriptor
SCH-3
SCH-4
SCH-5
SCH-6
SCH-7
VCH-3
VCH-4
VCH-5
VCH-6
VCH-7
ChiClusterDescriptor
SC-3
SC-4
SC-5
SC-6
VC-3
VC-4
VC-5
VC-6
ChiPathClusterDescriptor
SPC-4
SPC-5
SPC-6
VPC-4
VPC-5
VPC-6
ChiPathDescriptor
SP-0
SP-1
SP-2
SP-3
SP-4
SP-5
SP-6
SP-7
VP-0
VP-1
VP-2
VP-3
VP-4
VP-5
VP-6
VP-7
EccentricConnectivityIndexDescriptor
ECCEN
FMFDescriptor
FMF
FractionalCSP3Descriptor
Fsp3
FractionalPSADescriptor
tpsaEfficiency
FragmentComplexityDescriptor
fragC
HBondAcceptorCountDescriptor
nHBAcc
HBondDonorCountDescriptor
nHBDon
HybridizationRatioDescriptor
HybRatio
JPlogPDescriptor
JPLogP
KappaShapeIndicesDescriptor
Kier1
Kier2
Kier3
KierHallSmartsDescriptor
khs.sLi
khs.ssBe
khs.ssssBe
khs.ssBH
khs.sssB
khs.ssssB
khs.sCH3
khs.dCH2
khs.ssCH2
khs.tCH
khs.dsCH
khs.aaCH
khs.sssCH
khs.ddC
khs.tsC
khs.dssC
khs.aasC
khs.aaaC
khs.ssssC
khs.sNH3
khs.sNH2
khs.ssNH2
khs.dNH
khs.ssNH
khs.aaNH
khs.tN
khs.sssNH
khs.dsN
khs.aaN
khs.sssN
khs.ddsN
khs.aasN
khs.ssssN
khs.sOH
khs.dO
khs.ssO
khs.aaO
khs.sF
khs.sSiH3
khs.ssSiH2
khs.sssSiH
khs.ssssSi
khs.sPH2
khs.ssPH
khs.sssP
khs.dsssP
khs.sssssP
khs.sSH
khs.dS
khs.ssS
khs.aaS
khs.dssS
khs.ddssS
khs.sCl
khs.sGeH3
khs.ssGeH2
khs.sssGeH
khs.ssssGe
khs.sAsH2
khs.ssAsH
khs.sssAs
khs.sssdAs
khs.sssssAs
khs.sSeH
khs.dSe
khs.ssSe
khs.aaSe
khs.dssSe
khs.ddssSe
khs.sBr
khs.sSnH3
khs.ssSnH2
khs.sssSnH
khs.ssssSn
khs.sI
khs.sPbH3
khs.ssPbH2
khs.sssPbH
khs.ssssPb
LargestChainDescriptor
nAtomLC
LargestPiSystemDescriptor
nAtomP
MannholdLogPDescriptor
MLogP
MDEDescriptor
MDEC-11
MDEC-12
MDEC-13
MDEC-14
MDEC-22
MDEC-23
MDEC-24
MDEC-33
MDEC-34
MDEC-44
MDEO-11
MDEO-12
MDEO-22
MDEN-11
MDEN-12
MDEN-13
MDEN-22
MDEN-23
MDEN-33
PetitjeanNumberDescriptor
PetitjeanNumber
PetitjeanShapeIndexDescriptor
topoShape
geomShape
RotatableBondsCountDescriptor
nRotB
RuleOfFiveDescriptor
LipinskiFailures
SmallRingDescriptor
nSmallRings
nAromRings
nRingBlocks
nAromBlocks
nRings3
nRings4
nRings5
nRings6
nRings7
nRings8
nRings9
SpiroAtomCountDescriptor
nSpiroAtoms
TPSADescriptor
TopoPSA
VAdjMaDescriptor
VAdjMat
WeightDescriptor
MW
WeightedPathDescriptor
WTPT-1
WTPT-2
WTPT-3
WTPT-4
WTPT-5
WienerNumbersDescriptor
WPATH
WPOL
XLogPDescriptor
XLogP
ZagrebIndexDescriptor
Zagreb

Predict retention times#

To predict the retention times you need a model and the descriptors; the subcommand to run the prediction is

$ jp2rt predict-rt --help

Usage: jp2rt predict-rt [OPTIONS] MODEL SRC DST

  Uses the model to predict the retention time.

  Given a tab separated values containing the molecular descriptors and a
  model, produces another tab separated values file prepending the predicted
  value.

  MODEL The model file.
  SRC   The source tab separated values file (the molecular descriptors must be on the last columns).
  DST   The destination tab separated values file (will have the predicted retention time, followed by the same columns of SRC).

Options:
  --help  Show this message and exit.

Estimate the model#

As explained in Estimate the model it is usually better to perform a manual analysis of the estimation process, a convenient shortcut is given by the subcommand

$ jp2rt estimate-model --help

Usage: jp2rt estimate-model [OPTIONS] NAME SRC DST

  Estimates a model using the given ensemble regressor.

  NAME  The ensemble regressor name.
  SRC   The source tab separated values file (the retention times must be on the first column, and molecular descriptors must be on the last columns).
  DST   The destination model file.

Options:
  -e, --evaluate  Evaluates the model using cross-validation.
  --help          Show this message and exit.

If you want to know the list of valid ensemble model names, just use the subcommand

$ jp2rt list-models

AdaBoost
Bagging
ExtraTrees
GradientBoosting
HistGradientBoosting
RandomForest

Directly using the Java library#

In case you need to avoid installing Python you can run the molecular descriptors computation with the following command

java -jar jp2rt-all.jar INPUT.tsv OUTPUT.tsv

where INPUT.tsv is the input file, OUTPUT.tsv is the output file, and jp2rt-all.jar is the uber jar installed following the specific installation instructions.

You can also run

java -jar jp2rt-all.jar --list-descriptors

to get a list of descriptors names.