Predict the Retention Time#
Now assume we have a list of compounds of which we know only the SMILES; since
we have no other data than the PlaSMA dataset, we’ll take the fourth column of
the first 100 lines of plasma.tsv
file:
$ cut -f 4 plasma.tsv | head -n 100 > smiles.tsv
So the steps we need to perform are:
compute the molecular descriptors for each SMILES,
load the model, and the descriptors,
predict the retention time with the model.
Use the API#
Molecular descriptors can be added not only using the command line as shown in Compute the descriptors, but also using the add_descriptors_via_tsv()
function.
from jp2rt import add_descriptors_via_tsv
add_descriptors_via_tsv('smiles.tsv', 'smiles+descriptors.tsv')
Show code cell output
Computing 0% │ │ 0/100 (0:00:00 / ?)
Computing 7% │██▎ │ 7/100 (0:00:01 / 0:00:13)
Computing 36% │███████████▉ │ 36/100 (0:00:02 / 0:00:03)
Computing 84% │███████████████████████████▋ │ 84/100 (0:00:03 / 0:00:00)
Computing 100% │█████████████████████████████████│ 100/100 (0:00:03 / 0:00:00)
Computing 100% │█████████████████████████████████│ 100/100 (0:00:03 / 0:00:00)
We are now ready to load the computed descriptors, the model we have estimated and saved and to use it to predict the retention time:
from jp2rt import load_model, load_descriptors
X = load_descriptors('smiles+descriptors.tsv')
model = load_model('extratrees')
y = model.predict(X)
The command line#
The same steps can be performed using the command line, we have already seen in Compute the descriptors how to add the descriptors, so we just need to predict the retention time:
$ jp2rt predict-rt extratrees.jp2rt smiles+descriptors.tsv rt+smiles+descriptors.tsv
Read 100 molecules with 243 descriptor values each...
Predicted retention times written to /home/runner/work/jp2rt/jp2rt/docs/example/rt+smiles+descriptors.tsv...