Breaking news: bad infections kill many people. Antibiotics do nothing.
A violent bacterial outbreak is currently happening, killing tons of people, and the usual antibiotics have absolutely no effect. This has already happened in the past, something known as AMR, but up to now, the pharama labs managed to circumvent the resistance (source):
Fortunately, an outstanding biologist like her managed to isolate two different strains of the bacteria: the resistant strain to the tetracycline and a wild type.
TATFAR makes a call to develop computational tools to determine the origin of the problem, and possibly give biologists some insights on what could be done.
What the TATFAR wants
TATFAR makes a call for developing tools that answer the current crisis but that can be used for further events. Basically, the specifications are:
T1 - one tool that takes two simple FASTA sequencing files (Illumina reads) and outputs a list of SNPs (different from SNP),
T2 (optional) - Apply your tool to identify mutations between two variants of SARS-Cov-2
- T3 - make a POC of the tools of WP1 on the current AMR crisis by understanding what is the difference between the strains, what gene(s) is involved, build a model of the protein structure associated to the gene and give a possible explanation.
Allowed resources are the Biopython library and the software available in
resources directory on the git.
You answer the TATFAR call as a consultant. You will detail an estimation of the costs (DNA sequencing, work load for you) justified by a basic description of the piece of software you will develop. You will post your quote on TEIDE by next week.
Since you are a responsible citizen and a trusted collaborator, before getting the call outcome, you have already sent the two strains for sequencing. The company will make the sequences available as soon as possible, i.e. in about a week.
As you know that the delays are always tight, and you are sooo serious, even before getting the sequencing data you start to develop the tools together with some tests.
To test your prototype, you can use the fasta file of reads present on the git at hands-on/reference-data.
Evaluation of your project
The final evaluation of your proposal (max. 5 pages) will take into account:
- the soundness of your approach (description and mathematical justification of the underlying statistical model of your method)
- algorithmic complexity (more than linear time is not allowed)
- the empirical proof (e.g. using k-mer abundance simulation) to assess the performance of your method (e.g. up to what sequencing error rate you estimate that your method is robust)
- the application on the data provided (correctness of the identified gene)
- the biological soundness of the identified mutated gene
- the clarity and simplicity of the provided code (simple execution like ‘python3 snpDetect.py readsA.fasta readsB.fasta’, that outputs the list of SNPs)