Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.
SARS-CoV-2 inhibitor plays an important role in COVID-19 preclinical drug discovery. As the existing SARS-COV-2 inhibitors showed more or less deficiencies, it is urgent to develop new SARS-COV-2 candidate inhibitors. De Novo Molecular Design plays a very important role in drug discovery. Most of the existing method use SMILES (Simplified Molecular Input Line Entry System) as the input of deep learning models. One popular way is utilizing deep learning models to automatically generate candidate drug molecules, and most of the existing models use SMILES as the input. In this study, we embed SMILES using a sub-word algorithm named BPE (Byte Pair Encoding) instead of One-Hot. First of all, the sub-word algorithm BPE learns a vocabulary of high frequency SMILES substrings from a large SMILES dataset, SMILES are then tokened according to the vocabulary learned by the BPE algorithm. Results show that the BPE algorithm can effectively learn the SMILES grammars and can help our generative model generate potential SARS-COV-2 inhibitors after transfer learning using the known 1253 SARS-COV-2 inhibitors. Generally, this paper provides an effectively method for de novo molecular design of SARS-COV-2 inhibitors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.