biotransformers.wrappers.language_model

`biotransformers.wrappers.language_model`¶

This script defines a generic template class for any language model. Both ESM and Rostlab language models should implement this class.

Module Contents¶

Classes¶

LanguageModel

Class that implements a language model.

class biotransformers.wrappers.language_model.LanguageModel(model_dir: str, device)¶

Bases: abc.ABC

Class that implements a language model.

property model_id(self) → str¶: Model ID, as specified in the model directory

property clean_model_id(self) → str¶: Clean model ID (in case the model directory is not)

property model_vocabulary(self) → List[str]¶: Returns the whole vocabulary list

property vocab_size(self) → int¶: Returns the whole vocabulary size

property mask_token(self) → str¶: Representation of the mask token (as a string)

property pad_token(self) → str¶: Representation of the pad token (as a string)

property begin_token(self) → str¶: Representation of the beginning of sentence token (as a string)

property end_token(self) → str¶: Representation of the end of sentence token (as a string).

property does_end_token_exist(self) → bool¶: Returns true if a end of sequence token exists

property token_to_id(self)¶: Returns a function which maps tokens to IDs

property embeddings_size(self) → int¶: Returns size of the embeddings

abstract process_sequences_and_tokens(self, sequences_list: List[str]) → Dict[str, torch.Tensor]¶: Function to transform tokens string to IDs; it depends on the model used

property model(self) → torch.nn.Module¶: Return torch model.

abstract set_model(self, model: torch.nn.Module)¶: Set torch model.

abstract model_pass(self, model_inputs: Dict[str, torch.tensor], batch_size: int, silent: bool = False, pba: ray.actor.ActorHandle = None) → Tuple[torch.Tensor, torch.Tensor]¶

Function which computes logits and embeddings based on a dict of sequences tensors, a provided batch size and an inference configuration. The output is obtained by computing a forward pass through the model (“forward inference”)

Parameters

model_inputs (Dict[str, torch.tensor]) – [description]
batch_size (int) – size of the batch
silent – display or not progress bar
pba – tqdm progress bar for ray actor

Returns

logits [num_seqs, max_len_seqs, vocab_size]
embeddings [num_seqs, max_len_seqs+1, embedding_size]

Return type

Tuple[torch.tensor, torch.tensor]

abstract get_alphabet_dataloader(self)¶: Define an alphabet mapping for common method between protbert and ESM

bio-transformers v0.1.14

biotransformers.wrappers.language_model

Contents

biotransformers.wrappers.language_model¶

Module Contents¶

Classes¶

`biotransformers.wrappers.language_model`¶