:mod:`biotransformers.wrappers.esm_wrappers` ============================================ .. py:module:: biotransformers.wrappers.esm_wrappers .. autoapi-nested-parse:: This script defines a class which inherits from the LanguageModel class, and is specific to the ESM model developed by FAIR (https://github.com/facebookresearch/esm). Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: biotransformers.wrappers.esm_wrappers.ESMWrapper Attributes ~~~~~~~~~~ .. autoapisummary:: biotransformers.wrappers.esm_wrappers.log biotransformers.wrappers.esm_wrappers.path_msa_folder .. data:: log .. data:: path_msa_folder .. class:: ESMWrapper(model_dir: str, device: str) Bases: :py:obj:`biotransformers.wrappers.language_model.LanguageModel` Class that uses an ESM type of pretrained transformers model to evaluate a protein likelihood so as other insights. .. method:: model(self) -> torch.nn.Module :property: Return torch model. .. method:: set_model(self, model: torch.nn.Module) Set torch model. .. method:: clean_model_id(self) -> str :property: Clean model ID (in case the model directory is not) .. method:: model_vocabulary(self) -> List[str] :property: Returns the whole vocabulary list .. method:: vocab_size(self) -> int :property: Returns the whole vocabulary size .. method:: mask_token(self) -> str :property: Representation of the mask token (as a string) .. method:: pad_token(self) -> str :property: Representation of the pad token (as a string) .. method:: begin_token(self) -> str :property: Representation of the beginning of sentence token (as a string) .. method:: end_token(self) -> str :property: Representation of the end of sentence token (as a string) .. method:: does_end_token_exist(self) -> bool :property: Returns true if a end of sequence token exists .. method:: token_to_id(self) :property: Returns a function which maps tokens to IDs .. method:: embeddings_size(self) :property: Returns size of the embeddings .. method:: process_sequences_and_tokens(self, sequences_list: List[str]) -> Dict[str, torch.Tensor] Function to transform tokens string to IDs; it depends on the model used .. method:: model_pass(self, model_inputs: Dict[str, torch.Tensor], batch_size: int, silent: bool = False, pba: ray.actor.ActorHandle = None) -> Tuple[torch.Tensor, torch.Tensor] Function which computes logits and embeddings based on a list of sequences, a provided batch size and an inference configuration. The output is obtained by computing a forward pass through the model ("forward inference") The datagenerator is not the same the multi_gpus inference. We use a tqdm progress bar that is updated by the worker. The progress bar is instantiated before ray.remote :param model_inputs: [description] :type model_inputs: Dict[str, torch.tensor] :param batch_size: size of the batch :type batch_size: int :param silent: display or not progress bar :param pba: tqdm progress bar for ray actor :returns: * logits [num_seqs, max_len_seqs, vocab_size] * embeddings [num_seqs, max_len_seqs+1, embedding_size] :rtype: Tuple[torch.tensor, torch.tensor] .. method:: get_alphabet_dataloader(self) Define an alphabet mapping for common method between protbert and ESM