:mod:`biotransformers.wrappers.esm_wrappers`
============================================

.. py:module:: biotransformers.wrappers.esm_wrappers

.. autoapi-nested-parse::

   This script defines a class which inherits from the LanguageModel class, and is
   specific to the ESM model developed by FAIR (https://github.com/facebookresearch/esm).


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   biotransformers.wrappers.esm_wrappers.ESMWrapper


Attributes
~~~~~~~~~~

.. autoapisummary::

   biotransformers.wrappers.esm_wrappers.log
   biotransformers.wrappers.esm_wrappers.path_msa_folder


.. data:: log
   

.. data:: path_msa_folder
   

.. class:: ESMWrapper(model_dir: str, device: str)


   Bases: :py:obj:`biotransformers.wrappers.language_model.LanguageModel`

   Class that uses an ESM type of pretrained transformers model to evaluate
   a protein likelihood so as other insights.

   .. method:: model(self) -> torch.nn.Module
      :property:

      Return torch model.


   .. method:: set_model(self, model: torch.nn.Module)

      Set torch model.


   .. method:: clean_model_id(self) -> str
      :property:

      Clean model ID (in case the model directory is not)


   .. method:: model_vocabulary(self) -> List[str]
      :property:

      Returns the whole vocabulary list


   .. method:: vocab_size(self) -> int
      :property:

      Returns the whole vocabulary size


   .. method:: mask_token(self) -> str
      :property:

      Representation of the mask token (as a string)


   .. method:: pad_token(self) -> str
      :property:

      Representation of the pad token (as a string)


   .. method:: begin_token(self) -> str
      :property:

      Representation of the beginning of sentence token (as a string)


   .. method:: end_token(self) -> str
      :property:

      Representation of the end of sentence token (as a string)


   .. method:: does_end_token_exist(self) -> bool
      :property:

      Returns true if a end of sequence token exists


   .. method:: token_to_id(self)
      :property:

      Returns a function which maps tokens to IDs


   .. method:: embeddings_size(self)
      :property:

      Returns size of the embeddings


   .. method:: process_sequences_and_tokens(self, sequences_list: List[str]) -> Dict[str, torch.Tensor]

      Function to transform tokens string to IDs; it depends on the model used


   .. method:: model_pass(self, model_inputs: Dict[str, torch.Tensor], batch_size: int, silent: bool = False, pba: ray.actor.ActorHandle = None) -> Tuple[torch.Tensor, torch.Tensor]

      Function which computes logits and embeddings based on a list of sequences,
      a provided batch size and an inference configuration. The output is obtained
      by computing a forward pass through the model ("forward inference")

      The datagenerator is not the same the multi_gpus inference. We use a tqdm progress bar
      that is updated by the worker. The progress bar is instantiated before ray.remote

      :param model_inputs: [description]
      :type model_inputs: Dict[str, torch.tensor]
      :param batch_size: size of the batch
      :type batch_size: int
      :param silent: display or not progress bar
      :param pba: tqdm progress bar for ray actor

      :returns:         * logits [num_seqs, max_len_seqs, vocab_size]
                        * embeddings [num_seqs, max_len_seqs+1, embedding_size]
      :rtype: Tuple[torch.tensor, torch.tensor]


   .. method:: get_alphabet_dataloader(self)

      Define an alphabet mapping for common method between
      protbert and ESM