biotransformers.utils.msa_utils
Contents
biotransformers.utils.msa_utils¶
Module Contents¶
Functions¶
|
get translation dict to convert unused character in MSA |
|
Reads the first (reference) sequences from a fasta or MSA file. |
|
Removes any insertions into the sequence. |
|
Reads the first nseq sequences from an MSA file, |
|
Get all files of the msa folder and check file format |
|
Get length of an MSA list |
|
Get list of msa with less than nseq sequence |
- biotransformers.utils.msa_utils.get_translation() → Dict[int, Any]¶
get translation dict to convert unused character in MSA
- biotransformers.utils.msa_utils.read_sequence(filename: str) → Tuple[str, str]¶
Reads the first (reference) sequences from a fasta or MSA file.
- biotransformers.utils.msa_utils.remove_insertions(sequence: str) → str¶
Removes any insertions into the sequence. Needed to load aligned sequences in an MSA.
- biotransformers.utils.msa_utils.read_msa(filename: str, nseq: int) → List[Tuple[str, str]]¶
Reads the first nseq sequences from an MSA file, automatically removes insertions.
- biotransformers.utils.msa_utils.get_msa_list(path_msa: Optional[str]) → List[str]¶
Get all files of the msa folder and check file format
- Parameters
path_msa (Optional[str]) – path of the folder with a3m file
- biotransformers.utils.msa_utils.get_msa_lengths(list_msa: List[List[Tuple[str, str]]], nseq: int) → List[int]¶
Get length of an MSA list
All MSA must have at least nseq in msa
- Parameters
list_msa (List[List[Tuple[str,str]]]) – list of MSA. MSA is a list of tuple
nseq –
- Returns
[description]
- Return type
List[int]
- biotransformers.utils.msa_utils.msa_to_remove(path_msa: str, n_seq) → List[str]¶
Get list of msa with less than nseq sequence
- Parameters
path_msa (str) – [description]
- Returns
List of msa filepath that don’t have enough enough sequences.