biotransformers.utils.msa_utils

Module Contents

Functions

get_translation() → Dict[int, Any]

get translation dict to convert unused character in MSA

read_sequence(filename: str) → Tuple[str, str]

Reads the first (reference) sequences from a fasta or MSA file.

remove_insertions(sequence: str) → str

Removes any insertions into the sequence.

read_msa(filename: str, nseq: int) → List[Tuple[str, str]]

Reads the first nseq sequences from an MSA file,

get_msa_list(path_msa: Optional[str]) → List[str]

Get all files of the msa folder and check file format

get_msa_lengths(list_msa: List[List[Tuple[str, str]]], nseq: int) → List[int]

Get length of an MSA list

msa_to_remove(path_msa: str, n_seq) → List[str]

Get list of msa with less than nseq sequence

biotransformers.utils.msa_utils.get_translation()Dict[int, Any]

get translation dict to convert unused character in MSA

biotransformers.utils.msa_utils.read_sequence(filename: str)Tuple[str, str]

Reads the first (reference) sequences from a fasta or MSA file.

biotransformers.utils.msa_utils.remove_insertions(sequence: str)str

Removes any insertions into the sequence. Needed to load aligned sequences in an MSA.

biotransformers.utils.msa_utils.read_msa(filename: str, nseq: int)List[Tuple[str, str]]

Reads the first nseq sequences from an MSA file, automatically removes insertions.

biotransformers.utils.msa_utils.get_msa_list(path_msa: Optional[str])List[str]

Get all files of the msa folder and check file format

Parameters

path_msa (Optional[str]) – path of the folder with a3m file

biotransformers.utils.msa_utils.get_msa_lengths(list_msa: List[List[Tuple[str, str]]], nseq: int)List[int]

Get length of an MSA list

All MSA must have at least nseq in msa

Parameters
  • list_msa (List[List[Tuple[str,str]]]) – list of MSA. MSA is a list of tuple

  • nseq

Returns

[description]

Return type

List[int]

biotransformers.utils.msa_utils.msa_to_remove(path_msa: str, n_seq)List[str]

Get list of msa with less than nseq sequence

Parameters

path_msa (str) – [description]

Returns

List of msa filepath that don’t have enough enough sequences.