Welcome to trnanalysis’s documentation!

This pipeline was developed to accurately map small RNA sequencing data and then perform accurate mapping of tRNA reads and qualitatively analyse the resulting data. trnanalysis has an emphasis on profiling nuclear and mitochondrial tRNA fragments.

Support

  • Please refer to the FAQ section
  • For bugs and issues, please raise and issue on _github

Indices and tables

Installation

The following sections describe how to install tRNAnalysis.

Conda Installation

The our preffered method of installation is using conda. If you dont have conda installed then please install conda using miniconda or anaconda.

tRNAnalysis is currently installed using the bioconda channel and the recipe can be found on `github `_.

To install tRNAnalysis:

conda install -c bioconda trnanalysis
Conda environment

Conda is an awesome project, however it can suffer from significant issues relating to how long it takes the solver to fix installation issues. For more information regarding these conda issues please see bioconda issues.

In order to try and speed things up we have provided a conda environment for installation. Currently only linux is supported and it can be installed by doing the following:

wget https://raw.githubusercontent.com/Acribbs/tRNAnalysis/master/conda/environments/trnanalysis-linux.yml
conda env create -f trnanalysis-linux.yml
conda activate trnanalysis-env
Pip installation

We recommend installation through conda because it manages the dependencies. However, tRNAnalysis can also be installed easily using the pip package manager. However, you will also have to install other dependencies manually:

pip install trnanalysis
Manual installation

To obtain the latest code, check it out from the public git repository and activate it:

git clone https://github.com/Acribbs/tRNAnalysis.git
cd tRNAnalysis
python setup.py install

Once checked-out, you can get the latest changes via pulling:

git pull origin master
Installing additonal software

When building your own workflows we recomend using conda to install software into your environment where possible.

This can easily be performed by:

conda search <package>
conda install <package>

Cluster configuration

Our pipeline is developed using CGAT-core as the workflow engine. For more information on how tRNAnalysis is written and executed. In order for our workflows to be executed over a cluster you will need to configure the cluster options by following the example below:

Currently SGE, SLURM, Torque and PBSPro workload managers are supported. The default cluster options for cgatcore are set for SunGrid Engine (SGE). Therefore, if you would like to run an alternative workload manager then you will need to configure your settings for your cluster. In order to do this you will need to create a .cgat.yml within the user`s home directory.

This will allow you to overide the default configurations. To view the hardcoded parameters for cgatcore please see the parameters.py file.

For an example of how to configure a PBSpro workload manager see this link to this config example.

The .cgat.yml is placed in your home directory and when a pipeline is executed it will automatically prioritise the .cgat.yml parameters over the cgatcore hard coded parameters. For example, adding the following to the .cgat.yml file will implement cluster settings for PBSpro:

    memory_resource: mem

options: -l walltime=00:10:00 -l select=1:ncpus=8:mem=1gb

queue_manager: pbspro

queue: NONE

parallel_environment:  "dedicated"

Running a pipeline

Running tRNAnalysis is easy using the commandline. If you have installed trnanalysis using conda then all the software dependancies should have been installed and you are ready to go. A step by step tutorial pipeline can be found here: getting_started-Tutorial_.

Introduction

This pipeline requires the following input:

  • a single end fastq file - if you have paired end data we recommend flashing the reads together to make a single file or only using the first read of your paired end data.
  • a bowtie indexed genome
  • ensembl gtf: we recomend that you download out gtf files that have been sanitised for out pipeline here.

Optionally to make the pipeline run faster you can also use a downloaded tRNAscan-SE output. The most time consuming part of the pipeline is running tScan-SE to identify tRNAs across the genome. In order to speed the pipeline execution we have pre-ran tScan-SE and generated the outputs that can be found in the following directory . You can then tell the pipeline the location of the file using the yml configuration file.

Running tRNAnalysis

Command line usage information is available by running:

trnanalysis trna --help

The basic syntax for running tRNAnalysis is:

trnanalysis trna [workflow options] [workflow arguments]

workflow options can be one of the following:

make <task>

run all tasks required to build task

show <task>

show tasks required to build task without executing them

plot <task>

plot image of workflow (requires inkscape) of pipeline state for task

touch <task>

touch files without running task or its pre-requisites. This sets the timestamps for files in task and its pre-requisites such that they will seem up-to-date to the pipeline.

config

write a new configuration file pipeline.ini with default values. An existing configuration file will not be overwritten.

clone <srcdir>

clone a pipeline from srcdir into the current directory. Cloning attempts to conserve disk space by linking.
Fastq naming convention

tRNAanalysis assume that input fastq files follows the following naming convention(with the read inserted between the fastq and the gz). The reason for this is so that regular expressions do not have to acount for the read within the name. It is also more explicit:

sample1-condition-R1.fastq.1.gz
sample1-condition-R2.fastq.2.gz
Additional options

In addition to running tRNAanalysis with default command line options, running trnaanalysis with –help will allow you to see additional options for workflow arguments when running the pipelines. These will modify the way the pipeline in ran.

- -no-cluster

This option allows the pipeline to run locally.

- -input-validation

This option will check the pipeline.ini file for missing values before the pipeline starts.

- -debug

Add debugging information to the console and not the logfile

- -dry-run

Perform a dry run of the pipeline (do not execute shell commands)

- -exceptions

Echo exceptions immidietly as they occur.

-c - -checksums

Set the level of ruffus checksums.
Building tRNAnalysis reports

Reports are generated using the following command once a the full command has completed:

tranalysis trna make build_report
Troubleshooting

Many things can go wrong while running the pipeline. Look out for

  • bad input format. The pipeline does not perform sanity checks on the input format. If the input is bad, you might see wrong or missing results or an error message.
  • pipeline disruptions. Problems with the cluster, the file system or the controlling terminal might all cause the pipeline to abort.
  • bugs. The pipeline makes many implicit assumptions about the input files and the programs it runs. If program versions change or inputs change, the pipeline might not be able to deal with it. The result will be wrong or missing results or an error message.

If tRNAnalysis aborts, locate the step that caused the error by reading the logfiles and the error messages on stderr (nohup.out). See if you can understand the error and guess the likely problem (new program versions, badly formatted input, …). If you are able to fix the error, remove the output files of the step in which the error occured and restart the pipeline. Processing should resume at the appropriate point.

Note

Look out for upstream errors. For example, the pipeline might build a geneset filtering by a certain set of contigs. If the contig names do not match, the geneset will be empty, but the geneset building step might conclude successfully. However, you might get an error in any of the downstream steps complaining that the gene set is empty. To fix this, fix the error and delete the files created by the geneset building step and not just the step that threw the error.

Common errors

One of the most common errors when runnig the tRNAnalysis is:

GLOBAL_SESSION = drmaa.Session()
NameError: name 'drmaa' is not defined

This error occurrs because you are not connected to the cluster. Alternatively you can run the pipleine in local mode by adding - -no-cluster as a command line option.

Running tRNAnalysis - Tutorial

Before beginning this tutorial make sure you have tRNAnalysis installed correctly, please see here (see Installation) for installation instructions.

In the following section we will run a toy example pipeline that demonstrates the functionality of tRNAnalysis. tRNAnalysis can be run locally or distributed across a cluster. This tutorial will explain the steps required to run tRNAnalysis.

Tutorial start

1. First download the tutorial data:

wget https://www.cgat.org/downloads/public/adam/trnanalysis/test_trna.tar.gz
tar -zxvf test_trna.tar.gz

2. Next we will generate a configuration yml file so the pipeline output can be modified:

cd test_trna
trnanalysis trna config

This will generate a pipeline.yml file containing the configuration parameters than can be used to modify the output of the pipeline. These parameters can all be modified to change the output or running of tRNAnalysis. However, for this tutorial you do not need to modify the parameters to run the pipeline as the default ones are appropriate in this instance.

Note

There is already a pipeline.yml file within the test data so you dont really need to run the command above this time.

3. Next we will run the pipeline:

trnanalysis trna make full -v5 --no-cluster

This --no-cluster will run the pipeline locally if you do not have access to a cluster. Alternatively if you have a cluster remove the --no-cluster option and the pipleine will distribute your jobs accross the cluster. For information on how to configure your cluster please see the cluster config help.

Note

There are many commandline options available to run the pipeline. To see available options please run trnanalysis --help.

The pipeline is mostly quite quick to execute. However, it can take a considerable time to generate the bowtie indexes, in the future we will parameterise this so you can use pre-configured bowtie indexes so these do not have to keep being generated each time the pipeline is ran.

4. Generate a report

The final step is to generate a report to display the output of the tRNAnalysis. In order to generate these reports run the command:

trnanalysis trna make build_report -v 5 --no-cluster

This will generate a MultiQC report in the folder MultiQC_report.dir/ and an Rmarkdown report in R_report.dir/Final_report/index.html.

This completes the tutorial for running the tRNAnalysis , hope you find it as useful as we do for analysing tRNA sequencing data.

Developers

The following individuals are the main developers of the tRNAnalysis

Adam Cribbs

Anna James-Bott

Contributing

Contributions are very much encouraged and we greatly appreciate the time and effort people make to help maintain and support out tools. Every contribution helps, please dont be shy, we dont bite.

You can contribute to the development of our pipeline in a number of different ways:

Reporting bug fixes

Bugs are annoying and reporting them will help us to fix your issue.

Bugs can be reported using the issue section in github

When reporting issues, please include:

  • Steps in your code/command that led to the bug so it can be reproduced.
  • The error message from the log message.
  • Any other helpful info, such as the system/cluster engine or version information.
Proposing a new feature/enhancement

If you wish to contribute a new feature to the pipeline then the best way is to raise this as an issue and label it as an enhancement in github

If you propose a new feature then please:

  • Explain how your enhancement will work
  • Describe as best as you can how you plan to implement this.
  • If you dont think you have the necessary skills to implement this on your own then please say and we will try our best to help (or implement this for you). However, please be aware that this is a community developed software and our volunteers have other jobs. Therefore, we may not be able to work as fast as you hoped.
Pull Request Guidelines

Why not contribute to our project, its a great way of making the project better, your help is always welcome. We follow the fork/pull request model. To update our documentation, fix bugs or add extra enhancements you will need to create a pull request through github.

To create a pull request perform these steps:

  1. Create a github account.
  2. Create a personal fork of the project on github.
  3. Clone the fork onto your local machine. Your remote repo on github is called origin.
  4. Add the orginal repository as a remote called upstream.
  5. If you made the fork a while ago then please make sure you git pull upstream to keep your repository up to date
  6. Create a new branch to work on! We usually name our branches with capital first and last followed by a dash and something unique. For example: git checkout -b AC-new_doc.
  7. Impliment your fix/enhancement and make sure your code is effectively documented.
  8. Our code has tests and these will be ran when a pull request is submitted, however you can run our tests before you make the pull request, we have a number written in the tests/ directory. For example: to run our import tests please run nosetests tests/test_import.py.
  9. Add or change our documentation in the docs/ directory.
  10. Squash all of your commits into a single commit with gits interactive rebase.
  11. Push your branch to your fork on github git push origin
  12. From your fork in github.com, open a pull request in the correct branch.
  13. … This is where someone will review your changes and modify them or approve them …
  14. Once the pull request is approved and merged you can pull the changes from the upstream to your local repo and delete your branch.

Note

Always write your commit messages in the present tense. Your commit messages should describe what the commit does to the code and not what you did to the code.

FAQs

As our workflow develops we will add frequently asked questions here.

In the meantime please add issues to the github page

Licence

tRNAnalysis is an open-source project and we have made the repository available under the open source permissive free MIT software licence, allowing free and full use of the code for both commercial and non-commercial purposes. A copy of the licence is shown below:

MIT License

Copyright (c) 2019 Adam Cribbs

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.