Command line usage#
Compute the descriptors#
Computing descriptors is the preliminary step for both retention time prediction and model estimation. The related subcommand is
$ jp2rt compute-descriptors --help
Usage: jp2rt compute-descriptors [OPTIONS] SRC DST
Computes molecular descriptions.
Reads a tab separated values file with SMILES and producing another tab
separated values file appending molecular descriptor values.
SRC The source tab separated values file (must contain SMILES on the last column).
DST The destination tab separated values file (will have the same columns of SRC, followed by molecular descriptor values).
Options:
--help Show this message and exit.
If you want to know the names of the computed descriptors (i.e. the names for the columns added by the previous subcommand), you can use the subcommand
$ jp2rt list-descriptors
AcidicGroupCountDescriptor
1: nAcid
ALOGPDescriptor
2: ALogP
3: ALogp2
4: AMR
AminoAcidCountDescriptor
5: nA
6: nR
7: nN
8: nD
9: nC
10: nF
11: nQ
12: nE
13: nG
14: nH
15: nI
16: nP
17: nL
18: nK
19: nM
20: nS
21: nT
22: nY
23: nV
24: nW
APolDescriptor
25: apol
AromaticAtomsCountDescriptor
26: naAromAtom
AromaticBondsCountDescriptor
27: nAromBond
AtomCountDescriptor
28: nAtom
AutocorrelationDescriptorCharge
29: ATSc1
30: ATSc2
31: ATSc3
32: ATSc4
33: ATSc5
AutocorrelationDescriptorMass
34: ATSm1
35: ATSm2
36: ATSm3
37: ATSm4
38: ATSm5
AutocorrelationDescriptorPolarizability
39: ATSp1
40: ATSp2
41: ATSp3
42: ATSp4
43: ATSp5
BasicGroupCountDescriptor
44: nBase
BCUTDescriptor
45: BCUTw-1l
46: BCUTw-1h
47: BCUTc-1l
48: BCUTc-1h
49: BCUTp-1l
50: BCUTp-1h
BondCountDescriptor
51: nB
BPolDescriptor
52: bpol
CarbonTypesDescriptor
53: C1SP1
54: C2SP1
55: C1SP2
56: C2SP2
57: C3SP2
58: C1SP3
59: C2SP3
60: C3SP3
61: C4SP3
ChiChainDescriptor
62: SCH-3
63: SCH-4
64: SCH-5
65: SCH-6
66: SCH-7
67: VCH-3
68: VCH-4
69: VCH-5
70: VCH-6
71: VCH-7
ChiClusterDescriptor
72: SC-3
73: SC-4
74: SC-5
75: SC-6
76: VC-3
77: VC-4
78: VC-5
79: VC-6
ChiPathClusterDescriptor
80: SPC-4
81: SPC-5
82: SPC-6
83: VPC-4
84: VPC-5
85: VPC-6
ChiPathDescriptor
86: SP-0
87: SP-1
88: SP-2
89: SP-3
90: SP-4
91: SP-5
92: SP-6
93: SP-7
94: VP-0
95: VP-1
96: VP-2
97: VP-3
98: VP-4
99: VP-5
100: VP-6
101: VP-7
EccentricConnectivityIndexDescriptor
102: ECCEN
FMFDescriptor
103: FMF
FractionalCSP3Descriptor
104: Fsp3
FractionalPSADescriptor
105: tpsaEfficiency
FragmentComplexityDescriptor
106: fragC
HBondAcceptorCountDescriptor
107: nHBAcc
HBondDonorCountDescriptor
108: nHBDon
HybridizationRatioDescriptor
109: HybRatio
JPlogPDescriptor
110: JPLogP
KappaShapeIndicesDescriptor
111: Kier1
112: Kier2
113: Kier3
KierHallSmartsDescriptor
114: khs.sLi
115: khs.ssBe
116: khs.ssssBe
117: khs.ssBH
118: khs.sssB
119: khs.ssssB
120: khs.sCH3
121: khs.dCH2
122: khs.ssCH2
123: khs.tCH
124: khs.dsCH
125: khs.aaCH
126: khs.sssCH
127: khs.ddC
128: khs.tsC
129: khs.dssC
130: khs.aasC
131: khs.aaaC
132: khs.ssssC
133: khs.sNH3
134: khs.sNH2
135: khs.ssNH2
136: khs.dNH
137: khs.ssNH
138: khs.aaNH
139: khs.tN
140: khs.sssNH
141: khs.dsN
142: khs.aaN
143: khs.sssN
144: khs.ddsN
145: khs.aasN
146: khs.ssssN
147: khs.sOH
148: khs.dO
149: khs.ssO
150: khs.aaO
151: khs.sF
152: khs.sSiH3
153: khs.ssSiH2
154: khs.sssSiH
155: khs.ssssSi
156: khs.sPH2
157: khs.ssPH
158: khs.sssP
159: khs.dsssP
160: khs.sssssP
161: khs.sSH
162: khs.dS
163: khs.ssS
164: khs.aaS
165: khs.dssS
166: khs.ddssS
167: khs.sCl
168: khs.sGeH3
169: khs.ssGeH2
170: khs.sssGeH
171: khs.ssssGe
172: khs.sAsH2
173: khs.ssAsH
174: khs.sssAs
175: khs.sssdAs
176: khs.sssssAs
177: khs.sSeH
178: khs.dSe
179: khs.ssSe
180: khs.aaSe
181: khs.dssSe
182: khs.ddssSe
183: khs.sBr
184: khs.sSnH3
185: khs.ssSnH2
186: khs.sssSnH
187: khs.ssssSn
188: khs.sI
189: khs.sPbH3
190: khs.ssPbH2
191: khs.sssPbH
192: khs.ssssPb
LargestChainDescriptor
193: nAtomLC
LargestPiSystemDescriptor
194: nAtomP
MannholdLogPDescriptor
195: MLogP
MDEDescriptor
196: MDEC-11
197: MDEC-12
198: MDEC-13
199: MDEC-14
200: MDEC-22
201: MDEC-23
202: MDEC-24
203: MDEC-33
204: MDEC-34
205: MDEC-44
206: MDEO-11
207: MDEO-12
208: MDEO-22
209: MDEN-11
210: MDEN-12
211: MDEN-13
212: MDEN-22
213: MDEN-23
214: MDEN-33
PetitjeanNumberDescriptor
215: PetitjeanNumber
PetitjeanShapeIndexDescriptor
216: topoShape
217: geomShape
RotatableBondsCountDescriptor
218: nRotB
RuleOfFiveDescriptor
219: LipinskiFailures
SmallRingDescriptor
220: nSmallRings
221: nAromRings
222: nRingBlocks
223: nAromBlocks
224: nRings3
225: nRings4
226: nRings5
227: nRings6
228: nRings7
229: nRings8
230: nRings9
SpiroAtomCountDescriptor
231: nSpiroAtoms
TPSADescriptor
232: TopoPSA
VAdjMaDescriptor
233: VAdjMat
WeightDescriptor
234: MW
WeightedPathDescriptor
235: WTPT-1
236: WTPT-2
237: WTPT-3
238: WTPT-4
239: WTPT-5
WienerNumbersDescriptor
240: WPATH
241: WPOL
XLogPDescriptor
242: XLogP
ZagrebIndexDescriptor
243: Zagreb
Predict retention times#
To predict the retention times you need a model and the descriptors; the subcommand to run the prediction is
$ jp2rt predict-rt --help
Usage: jp2rt predict-rt [OPTIONS] MODEL SRC DST
Uses the model to predict the retention time.
Given a tab separated values containing the molecular descriptors and a
model, produces another tab separated values file prepending the predicted
value.
MODEL The model file.
SRC The source tab separated values file (the molecular descriptors must be on the last columns).
DST The destination tab separated values file (will have the predicted retention time, followed by the same columns of SRC).
Options:
--help Show this message and exit.
Estimate the model#
As explained in Estimate the model it is usually better to perform a manual analysis of the estimation process, a convenient shortcut is given by the subcommand
$ jp2rt estimate-model --help
Usage: jp2rt estimate-model [OPTIONS] NAME SRC DST
Estimates a model using the given ensemble regressor.
NAME The ensemble regressor name.
SRC The source tab separated values file (the retention times must be on the first column, and molecular descriptors must be on the last columns).
DST The destination model file.
Options:
-e, --evaluate Evaluates the model using cross-validation.
--help Show this message and exit.
If you want to know the list of valid ensemble model names, just use the subcommand
$ jp2rt list-models
AdaBoost
Bagging
ExtraTrees
GradientBoosting
HistGradientBoosting
RandomForest
Directly using the Java library#
In case you need to avoid installing Python you can run the molecular descriptors computation with the following command
java -jar jp2rt-all.jar INPUT.tsv OUTPUT.tsv
where INPUT.tsv
is the input file, OUTPUT.tsv
is the output
file, and jp2rt-all.jar
is the uber jar installed following the
specific installation instructions.
You can also run
java -jar jp2rt-all.jar --list-descriptors
to get a list of descriptors names.