Welcome to trnanalysis’s documentation!¶
This pipeline was developed to accurately map small RNA sequencing data and then perform accurate mapping of tRNA reads and qualitatively analyse the resulting data. trnanalysis has an emphasis on profiling nuclear and mitochondrial tRNA fragments.
Citation¶
Currently we have a bioRxiv pre-print available:
Support¶
- Please refer to the FAQ section
- For bugs and issues, please raise and issue on _github
Indices and tables¶
Installation¶
The following sections describe how to install tRNAnalysis.
Conda Installation¶
The our preffered method of installation is using conda. If you dont have conda installed then please install conda using miniconda or anaconda.
tRNAnalysis is currently installed using the bioconda channel and the recipe can be found on `github `_.
To install tRNAnalysis:
conda install -c bioconda trnanalysis
Conda environment¶
Conda is an awesome project, however it can suffer from significant issues relating to how long it takes the solver to fix installation issues. For more information regarding these conda issues please see bioconda issues.
In order to try and speed things up we have provided a conda environment for installation. Currently only linux is supported and it can be installed by doing the following:
wget https://raw.githubusercontent.com/Acribbs/tRNAnalysis/master/conda/environments/trnanalysis-linux.yml
conda env create -f trnanalysis-linux.yml
conda activate trnanalysis-env
Pip installation¶
We recommend installation through conda because it manages the dependencies. However, tRNAnalysis can also be installed easily using the pip package manager. However, you will also have to install other dependencies manually:
pip install trnanalysis
Manual installation¶
To obtain the latest code, check it out from the public git repository and activate it:
git clone https://github.com/Acribbs/tRNAnalysis.git
cd tRNAnalysis
python setup.py install
Once checked-out, you can get the latest changes via pulling:
git pull origin master
Installing additonal software¶
When building your own workflows we recomend using conda to install software into your environment where possible.
This can easily be performed by:
conda search <package>
conda install <package>
Cluster configuration¶
Our pipeline is developed using CGAT-core as the workflow engine. For more information on how tRNAnalysis is written and executed. In order for our workflows to be executed over a cluster you will need to configure the cluster options by following the example below:
Currently SGE, SLURM, Torque and PBSPro workload managers are supported. The default cluster options for
cgatcore are set for SunGrid Engine (SGE). Therefore, if you would like to run an alternative workload manager
then you will need to configure your settings for your cluster. In order to do this you will need to
create a .cgat.yml
within the user`s home directory.
This will allow you to overide the default configurations. To view the hardcoded parameters for cgatcore please see the parameters.py file.
For an example of how to configure a PBSpro workload manager see this link to this config example.
The .cgat.yml is placed in your home directory and when a pipeline is executed it will automatically prioritise the
.cgat.yml
parameters over the cgatcore hard coded parameters. For example, adding the following to the
.cgat.yml file will implement cluster settings for PBSpro:
memory_resource: mem
options: -l walltime=00:10:00 -l select=1:ncpus=8:mem=1gb
queue_manager: pbspro
queue: NONE
parallel_environment: "dedicated"
Running a pipeline¶
Running tRNAnalysis is easy using the commandline. If you have installed trnanalysis using conda then all the software dependancies should have been installed and you are ready to go. A step by step tutorial pipeline can be found here: getting_started-Tutorial_.
Introduction¶
This pipeline requires the following input:
- a single end fastq file - if you have paired end data we recommend flashing the reads together to make a single file or only using the first read of your paired end data.
- a bowtie indexed genome
- ensembl gtf: we recomend that you download out gtf files that have been sanitised for out pipeline here.
Optionally to make the pipeline run faster you can also use a downloaded tRNAscan-SE output. The most time consuming part of the pipeline is running tScan-SE to identify tRNAs across the genome. In order to speed the pipeline execution we have pre-ran tScan-SE and generated the outputs that can be found in the following directory . You can then tell the pipeline the location of the file using the yml configuration file.
Running tRNAnalysis¶
Command line usage information is available by running:
trnanalysis trna --help
The basic syntax for running tRNAnalysis is:
trnanalysis trna [workflow options] [workflow arguments]
workflow options
can be one of the following:
make <task>
run all tasks required to build task
show <task>
show tasks required to build task without executing them
plot <task>
plot image of workflow (requires inkscape) of pipeline state for task
touch <task>
touch files without running task or its pre-requisites. This sets the timestamps for files in task and its pre-requisites such that they will seem up-to-date to the pipeline.
config
write a new configuration filepipeline.ini
with default values. An existing configuration file will not be overwritten.
clone <srcdir>
clone a pipeline fromsrcdir
into the current directory. Cloning attempts to conserve disk space by linking.
Fastq naming convention¶
tRNAanalysis assume that input fastq files follows the following naming convention(with the read inserted between the fastq and the gz). The reason for this is so that regular expressions do not have to acount for the read within the name. It is also more explicit:
sample1-condition-R1.fastq.1.gz
sample1-condition-R2.fastq.2.gz
Additional options¶
In addition to running tRNAanalysis with default command line options, running trnaanalysis
with –help will allow you to see additional options for workflow arguments
when running the pipelines. These will modify the way the pipeline in ran.
- -no-cluster
This option allows the pipeline to run locally.
- -input-validation
This option will check the pipeline.ini file for missing values before the pipeline starts.
- -debug
Add debugging information to the console and not the logfile
- -dry-run
Perform a dry run of the pipeline (do not execute shell commands)
- -exceptions
Echo exceptions immidietly as they occur.
-c - -checksums
Set the level of ruffus checksums.
Building tRNAnalysis reports¶
Reports are generated using the following command once a the full command has completed:
tranalysis trna make build_report
Troubleshooting¶
Many things can go wrong while running the pipeline. Look out for
- bad input format. The pipeline does not perform sanity checks on the input format. If the input is bad, you might see wrong or missing results or an error message.
- pipeline disruptions. Problems with the cluster, the file system or the controlling terminal might all cause the pipeline to abort.
- bugs. The pipeline makes many implicit assumptions about the input files and the programs it runs. If program versions change or inputs change, the pipeline might not be able to deal with it. The result will be wrong or missing results or an error message.
If tRNAnalysis aborts, locate the step that caused the error by
reading the logfiles and the error messages on stderr
(nohup.out
). See if you can understand the error and guess the
likely problem (new program versions, badly formatted input, …). If
you are able to fix the error, remove the output files of the step in
which the error occured and restart the pipeline. Processing should
resume at the appropriate point.
Note
Look out for upstream errors. For example, the pipeline might build a geneset filtering by a certain set of contigs. If the contig names do not match, the geneset will be empty, but the geneset building step might conclude successfully. However, you might get an error in any of the downstream steps complaining that the gene set is empty. To fix this, fix the error and delete the files created by the geneset building step and not just the step that threw the error.
Common errors¶
One of the most common errors when runnig the tRNAnalysis is:
GLOBAL_SESSION = drmaa.Session()
NameError: name 'drmaa' is not defined
This error occurrs because you are not connected to the cluster. Alternatively you can run the pipleine in local mode by adding - -no-cluster as a command line option.
Running tRNAnalysis - Tutorial¶
Before beginning this tutorial make sure you have tRNAnalysis installed correctly, please see here (see Installation) for installation instructions.
In the following section we will run a toy example pipeline that demonstrates the functionality of tRNAnalysis. tRNAnalysis can be run locally or distributed across a cluster. This tutorial will explain the steps required to run tRNAnalysis.
Tutorial start¶
1. First download the tutorial data:
wget https://www.cgat.org/downloads/public/adam/trnanalysis/test_trna.tar.gz
tar -zxvf test_trna.tar.gz
2. Next we will generate a configuration yml file so the pipeline output can be modified:
cd test_trna
trnanalysis trna config
This will generate a pipeline.yml file containing the configuration parameters than can be used to modify the output of the pipeline. These parameters can all be modified to change the output or running of tRNAnalysis. However, for this tutorial you do not need to modify the parameters to run the pipeline as the default ones are appropriate in this instance.
Note
There is already a pipeline.yml file within the test data so you dont really need to run the command above this time.
3. Next we will run the pipeline:
trnanalysis trna make full -v5 --no-cluster
This --no-cluster
will run the pipeline locally if you do not have access to a cluster. Alternatively if you have a
cluster remove the --no-cluster
option and the pipleine will distribute your jobs accross the cluster.
For information on how to configure your cluster please see the
cluster config help.
Note
There are many commandline options available to run the pipeline. To see available options please run trnanalysis --help
.
The pipeline is mostly quite quick to execute. However, it can take a considerable time to generate the bowtie indexes, in the future we will parameterise this so you can use pre-configured bowtie indexes so these do not have to keep being generated each time the pipeline is ran.
4. Generate a report
The final step is to generate a report to display the output of the tRNAnalysis. In order to generate these reports run the command:
trnanalysis trna make build_report -v 5 --no-cluster
This will generate a MultiQC report in the folder MultiQC_report.dir/ and an Rmarkdown report in R_report.dir/Final_report/index.html.
This completes the tutorial for running the tRNAnalysis , hope you find it as useful as we do for analysing tRNA sequencing data.
Contributing¶
Contributions are very much encouraged and we greatly appreciate the time and effort people make to help maintain and support out tools. Every contribution helps, please dont be shy, we dont bite.
You can contribute to the development of our pipeline in a number of different ways:
Reporting bug fixes¶
Bugs are annoying and reporting them will help us to fix your issue.
Bugs can be reported using the issue section in github
When reporting issues, please include:
- Steps in your code/command that led to the bug so it can be reproduced.
- The error message from the log message.
- Any other helpful info, such as the system/cluster engine or version information.
Proposing a new feature/enhancement¶
If you wish to contribute a new feature to the pipeline then the best way is to raise this as an issue and label it as an enhancement in github
If you propose a new feature then please:
- Explain how your enhancement will work
- Describe as best as you can how you plan to implement this.
- If you dont think you have the necessary skills to implement this on your own then please say and we will try our best to help (or implement this for you). However, please be aware that this is a community developed software and our volunteers have other jobs. Therefore, we may not be able to work as fast as you hoped.
Pull Request Guidelines¶
Why not contribute to our project, its a great way of making the project better, your help is always welcome. We follow the fork/pull request model. To update our documentation, fix bugs or add extra enhancements you will need to create a pull request through github.
To create a pull request perform these steps:
- Create a github account.
- Create a personal fork of the project on github.
- Clone the fork onto your local machine. Your remote repo on github is called
origin
. - Add the orginal repository as a remote called
upstream
. - If you made the fork a while ago then please make sure you
git pull upstream
to keep your repository up to date - Create a new branch to work on! We usually name our branches with capital first and last followed by a dash and something unique. For example:
git checkout -b AC-new_doc
. - Impliment your fix/enhancement and make sure your code is effectively documented.
- Our code has tests and these will be ran when a pull request is submitted, however you can run our tests before you make the pull request, we have a number written in the
tests/
directory. For example: to run our import tests please runnosetests tests/test_import.py
. - Add or change our documentation in the
docs/
directory. - Squash all of your commits into a single commit with gits interactive rebase.
- Push your branch to your fork on github
git push origin
- From your fork in github.com, open a pull request in the correct branch.
- … This is where someone will review your changes and modify them or approve them …
- Once the pull request is approved and merged you can pull the changes from the
upstream
to your local repo and delete your branch.
Note
Always write your commit messages in the present tense. Your commit messages should describe what the commit does to the code and not what you did to the code.
FAQs¶
As our workflow develops we will add frequently asked questions here.
In the meantime please add issues to the github page
Licence¶
tRNAnalysis is an open-source project and we have made the repository available under the open source permissive free MIT software licence, allowing free and full use of the code for both commercial and non-commercial purposes. A copy of the licence is shown below:
MIT License¶
Copyright (c) 2019 Adam Cribbs
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.