Command line usage#

Compute the descriptors#

Computing descriptors is the preliminary step for both retention time prediction and model estimation. The related subcommand is

$ jp2rt compute-descriptors --help
Usage: jp2rt compute-descriptors [OPTIONS] SRC DST

  Computes molecular descriptions.

  Reads a tab separated values file with SMILES and producing another tab
  separated values file appending molecular descriptor values.

  SRC   The source tab separated values file (must contain SMILES on the last column).
  DST   The destination tab separated values file (will have the same columns of SRC, followed by molecular descriptor values).

Options:
  --help  Show this message and exit.

If you want to know the names of the computed descriptors (i.e. the names for the columns added by the previous subcommand), you can use the subcommand

$ jp2rt list-descriptors
AcidicGroupCountDescriptor
	1: nAcid
ALOGPDescriptor
	2: ALogP
	3: ALogp2
	4: AMR
AminoAcidCountDescriptor
	5: nA
	6: nR
	7: nN
	8: nD
	9: nC
	10: nF
	11: nQ
	12: nE
	13: nG
	14: nH
	15: nI
	16: nP
	17: nL
	18: nK
	19: nM
	20: nS
	21: nT
	22: nY
	23: nV
	24: nW
APolDescriptor
	25: apol
AromaticAtomsCountDescriptor
	26: naAromAtom
AromaticBondsCountDescriptor
	27: nAromBond
AtomCountDescriptor
	28: nAtom
AutocorrelationDescriptorCharge
	29: ATSc1
	30: ATSc2
	31: ATSc3
	32: ATSc4
	33: ATSc5
AutocorrelationDescriptorMass
	34: ATSm1
	35: ATSm2
	36: ATSm3
	37: ATSm4
	38: ATSm5
AutocorrelationDescriptorPolarizability
	39: ATSp1
	40: ATSp2
	41: ATSp3
	42: ATSp4
	43: ATSp5
BasicGroupCountDescriptor
	44: nBase
BCUTDescriptor
	45: BCUTw-1l
	46: BCUTw-1h
	47: BCUTc-1l
	48: BCUTc-1h
	49: BCUTp-1l
	50: BCUTp-1h
BondCountDescriptor
	51: nB
BPolDescriptor
	52: bpol
CarbonTypesDescriptor
	53: C1SP1
	54: C2SP1
	55: C1SP2
	56: C2SP2
	57: C3SP2
	58: C1SP3
	59: C2SP3
	60: C3SP3
	61: C4SP3
ChiChainDescriptor
	62: SCH-3
	63: SCH-4
	64: SCH-5
	65: SCH-6
	66: SCH-7
	67: VCH-3
	68: VCH-4
	69: VCH-5
	70: VCH-6
	71: VCH-7
ChiClusterDescriptor
	72: SC-3
	73: SC-4
	74: SC-5
	75: SC-6
	76: VC-3
	77: VC-4
	78: VC-5
	79: VC-6
ChiPathClusterDescriptor
	80: SPC-4
	81: SPC-5
	82: SPC-6
	83: VPC-4
	84: VPC-5
	85: VPC-6
ChiPathDescriptor
	86: SP-0
	87: SP-1
	88: SP-2
	89: SP-3
	90: SP-4
	91: SP-5
	92: SP-6
	93: SP-7
	94: VP-0
	95: VP-1
	96: VP-2
	97: VP-3
	98: VP-4
	99: VP-5
	100: VP-6
	101: VP-7
EccentricConnectivityIndexDescriptor
	102: ECCEN
FMFDescriptor
	103: FMF
FractionalCSP3Descriptor
	104: Fsp3
FractionalPSADescriptor
	105: tpsaEfficiency
FragmentComplexityDescriptor
	106: fragC
HBondAcceptorCountDescriptor
	107: nHBAcc
HBondDonorCountDescriptor
	108: nHBDon
HybridizationRatioDescriptor
	109: HybRatio
JPlogPDescriptor
	110: JPLogP
KappaShapeIndicesDescriptor
	111: Kier1
	112: Kier2
	113: Kier3
KierHallSmartsDescriptor
	114: khs.sLi
	115: khs.ssBe
	116: khs.ssssBe
	117: khs.ssBH
	118: khs.sssB
	119: khs.ssssB
	120: khs.sCH3
	121: khs.dCH2
	122: khs.ssCH2
	123: khs.tCH
	124: khs.dsCH
	125: khs.aaCH
	126: khs.sssCH
	127: khs.ddC
	128: khs.tsC
	129: khs.dssC
	130: khs.aasC
	131: khs.aaaC
	132: khs.ssssC
	133: khs.sNH3
	134: khs.sNH2
	135: khs.ssNH2
	136: khs.dNH
	137: khs.ssNH
	138: khs.aaNH
	139: khs.tN
	140: khs.sssNH
	141: khs.dsN
	142: khs.aaN
	143: khs.sssN
	144: khs.ddsN
	145: khs.aasN
	146: khs.ssssN
	147: khs.sOH
	148: khs.dO
	149: khs.ssO
	150: khs.aaO
	151: khs.sF
	152: khs.sSiH3
	153: khs.ssSiH2
	154: khs.sssSiH
	155: khs.ssssSi
	156: khs.sPH2
	157: khs.ssPH
	158: khs.sssP
	159: khs.dsssP
	160: khs.sssssP
	161: khs.sSH
	162: khs.dS
	163: khs.ssS
	164: khs.aaS
	165: khs.dssS
	166: khs.ddssS
	167: khs.sCl
	168: khs.sGeH3
	169: khs.ssGeH2
	170: khs.sssGeH
	171: khs.ssssGe
	172: khs.sAsH2
	173: khs.ssAsH
	174: khs.sssAs
	175: khs.sssdAs
	176: khs.sssssAs
	177: khs.sSeH
	178: khs.dSe
	179: khs.ssSe
	180: khs.aaSe
	181: khs.dssSe
	182: khs.ddssSe
	183: khs.sBr
	184: khs.sSnH3
	185: khs.ssSnH2
	186: khs.sssSnH
	187: khs.ssssSn
	188: khs.sI
	189: khs.sPbH3
	190: khs.ssPbH2
	191: khs.sssPbH
	192: khs.ssssPb
LargestChainDescriptor
	193: nAtomLC
LargestPiSystemDescriptor
	194: nAtomP
MannholdLogPDescriptor
	195: MLogP
MDEDescriptor
	196: MDEC-11
	197: MDEC-12
	198: MDEC-13
	199: MDEC-14
	200: MDEC-22
	201: MDEC-23
	202: MDEC-24
	203: MDEC-33
	204: MDEC-34
	205: MDEC-44
	206: MDEO-11
	207: MDEO-12
	208: MDEO-22
	209: MDEN-11
	210: MDEN-12
	211: MDEN-13
	212: MDEN-22
	213: MDEN-23
	214: MDEN-33
PetitjeanNumberDescriptor
	215: PetitjeanNumber
PetitjeanShapeIndexDescriptor
	216: topoShape
	217: geomShape
RotatableBondsCountDescriptor
	218: nRotB
RuleOfFiveDescriptor
	219: LipinskiFailures
SmallRingDescriptor
	220: nSmallRings
	221: nAromRings
	222: nRingBlocks
	223: nAromBlocks
	224: nRings3
	225: nRings4
	226: nRings5
	227: nRings6
	228: nRings7
	229: nRings8
	230: nRings9
SpiroAtomCountDescriptor
	231: nSpiroAtoms
TPSADescriptor
	232: TopoPSA
VAdjMaDescriptor
	233: VAdjMat
WeightDescriptor
	234: MW
WeightedPathDescriptor
	235: WTPT-1
	236: WTPT-2
	237: WTPT-3
	238: WTPT-4
	239: WTPT-5
WienerNumbersDescriptor
	240: WPATH
	241: WPOL
XLogPDescriptor
	242: XLogP
ZagrebIndexDescriptor
	243: Zagreb

Predict retention times#

To predict the retention times you need a model and the descriptors; the subcommand to run the prediction is

$ jp2rt predict-rt --help
Usage: jp2rt predict-rt [OPTIONS] MODEL SRC DST

  Uses the model to predict the retention time.

  Given a tab separated values containing the molecular descriptors and a
  model, produces another tab separated values file prepending the predicted
  value.

  MODEL The model file.
  SRC   The source tab separated values file (the molecular descriptors must be on the last columns).
  DST   The destination tab separated values file (will have the predicted retention time, followed by the same columns of SRC).

Options:
  --help  Show this message and exit.

Estimate the model#

As explained in Estimate the model it is usually better to perform a manual analysis of the estimation process, a convenient shortcut is given by the subcommand

$ jp2rt estimate-model --help
Usage: jp2rt estimate-model [OPTIONS] NAME SRC DST

  Estimates a model using the given ensemble regressor.

  NAME  The ensemble regressor name.
  SRC   The source tab separated values file (the retention times must be on the first column, and molecular descriptors must be on the last columns).
  DST   The destination model file.

Options:
  -e, --evaluate  Evaluates the model using cross-validation.
  --help          Show this message and exit.

If you want to know the list of valid ensemble model names, just use the subcommand

$ jp2rt list-models
AdaBoost
Bagging
ExtraTrees
GradientBoosting
HistGradientBoosting
RandomForest

Directly using the Java library#

In case you need to avoid installing Python you can run the molecular descriptors computation with the following command

java -jar jp2rt-all.jar INPUT.tsv OUTPUT.tsv

where INPUT.tsv is the input file, OUTPUT.tsv is the output file, and jp2rt-all.jar is the uber jar installed following the specific installation instructions.

You can also run

java -jar jp2rt-all.jar --list-descriptors

to get a list of descriptors names.