Quick Start

Display available backend

from biotransformers import BioTransformers
BioTransformers.list_backend()

>>
    *   esm1_t34_670M_UR100
    *   esm1_t6_43M_UR50S
    *   esm1b_t33_650M_UR50S
    *   esm_msa1_t12_100M_UR50S
    *   protbert
    *   protbert_bfd

Compute embeddings on gpu

Please refer to the multi-gpus section to have a full understanding of the functionnality.

import ray

sequences = [
        "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
        "RSKEPVSGFDLIRDHISQTGMPPTRAEIARSKEPVSGRKGVIEIVSGASRGIRLLQEE",
        "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
        "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
    ]

ray.init()
bio_trans = BioTransformers(backend="protbert", num_gpus=4)
embeddings = bio_trans.compute_embeddings(sequences, pool_mode=('cls','mean'))

cls_emb = embeddings['cls']
mean_emb = embeddings['mean']

where:

  • pooling_list: kind of aggregation functions to be used. ‘cls’ return the <CLS> token embedding used for classification. ‘mean’ will make the mean of all the tokens a sequence.