Skip to content

This repository contains all scripts files of the GxP project

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



17 Commits

Repository files navigation


This repository contains all scripts files of the GxP project. This repository has been superseeded by the (GxP)[]

General report

This document contains all the analysis performed so far for the gxp project. I processed the raw fastq files through a several steps, including raw reads alignment, quality check, sorting and annotation, using a pipeline that incorporates various tools such as FastQC, HISAT2, Samtools, HTseq and DEseq2.

List of batches analysed: GxP1:

• 220324_NS500258_0518_AHJG33BGXK (considered as old sequences)

• 221020_NS500258_0531_AHJF3GBGXK (considered as new sequences)

Then merged

  Script used: For each batch, it was created a folder dedicated for the analysis and results. Inside each folder(batch), three folders were created; “bams” for the alignment, “counts” for reads counting, and “deseq” for the differential gene expression analysis. First step, in the “bams” folder the following scripts were added and launched.

set={Folder to the fastq files for the specific batch}
ls /wsu/home/groups/piquelab/OurData/Nextseq/GxP/ ${set} | while read fastq_file; do echo ${fastq_file%_R*}; done | uniq > names.txt
cat names.txt | while read var; do sbatch --export=var=${var}; done is the bash script file that is being called by

# Job name
#SBATCH --job-name alignment
# Submit to the primary QoS
#SBATCH -q primary
# Request one node
# Total number of cores, in this example it will 1 node with 1 core each. 
#SBATCH -n 8
# Request memory
#SBATCH --mem=100G
# Request a node with avx2 instruction set 
#SBATCH --constraint=avx2
# Mail when the job begins, ends, fails, requeues 
#SBATCH --mail-type=ALL
# Where to send email alerts
#SBATCH [email protected]
# Create an output file that will be output_<jobid>.out 
#SBATCH -o output_%j.out
# Create an error file that will be error_<jobid>.out
#SBATCH -e errors_%j.err
# Set maximum time limit 
#SBATCH --time=1-01:00:00

module load hisat2/2.0.4
module load samtools/1.4

set={Folder to the fastq files for the specific batch}

#Update this "path" if copied from another directory
filePath= /wsu/home/groups/piquelab/OurData/Nextseq/GxP/${set}


###Align Reads###

hisat2 -p 8 -x ${genomeindex} -1 ${filePath}/${var}_R1_001.fastq.gz \
                              -2 ${filePath}/${var}_R2_001.fastq.gz \
      2> ${var}_aligned.bam.e | samtools view -b1 - > ${var}_aligned.bam

###Sort Reads###
samtools sort -@ 4 -T tmp_${var}_aligned.bam -o ${var}_sorted.bam ${var}_aligned.bam
samtools index ${var}_sorted.bam
samtools view -c ${var}_sorted.bam > ${var}_sorted_count.txt

###Quality Filter###
samtools view -b1 -q10 ${var}_sorted.bam > ${var}_quality.bam
samtools index ${var}_quality.bam
samtools view -c ${var}_quality.bam > ${var}_quality_count.txt

samtools rmdup ${var}_quality.bam ${var}_clean.bam
samtools index ${var}_clean.bam
samtools view -c ${var}_clean.bam > ${var}_clean_count.txt

echo ${var} >> finished.txt

# Job name
#SBATCH --job-name alignmentsummary
# Submit to the primary QoS
#SBATCH -q primary
# Request one node
# Total number of cores, in this example it will 1 node with 1 core each. 
#SBATCH -n 8
# Request memory
#SBATCH --mem=100G
# Request a node with avx2 instruction set 
#SBATCH --constraint=avx2
# Mail when the job begins, ends, fails, requeues 
#SBATCH --mail-type=ALL
# Where to send email alerts
#SBATCH [email protected]
# Create an output file that will be output_<jobid>.out 
#SBATCH -o output_%j.out
# Create an error file that will be error_<jobid>.out
#SBATCH -e errors_%j.err
# Set maximum time limit 
#SBATCH --time=1-01:00:00

#Making counts.csv
for i in `ls *sorted_count.txt`; do echo "$i"| cut -d_ -f1,2,3 >m ; cat m| tr "\n" " "; cat $i; done > sorted.txt
for i in `ls *quality_count.txt`; do echo "$i"| cut -d_ -f1,2,3 >m ; cat m| tr "\n" " "; cat $i; done > quality.txt
for i in `ls *clean_count.txt`; do echo "$i"| cut -d_ -f1,2,3 >m ; cat m| tr "\n" " "; cat $i; done > clean.txt

join sorted.txt quality.txt > tmp.txt
join tmp.txt clean.txt > all_counts.txt

rm m
rm tmp.txt
rm sorted.txt
rm quality.txt
rm clean.txt

#To make list of total number of reads processed by HTcnts for i in cat names.txt; do echo -n "$i " >> total_reads.txt; head -1 ${i}_aligned.bam.e|cut -d ' ' -f1 >> total_reads.txt;done

#names.txt This file contains a list of all fastq files found and used for the alignment step. It is generated by   Second step, in the folder “counts” the following scripts were added and launched

set={Folder to the fastq files for the specific batch}
ls /wsu/home/groups/piquelab/OurData/Nextseq/GxP/ ${set} | while read fastq_file; do echo ${fastq_file%_R*}; done | uniq > names.txt
cat names.txt | while read var; do sbatch --export=var=${var}; done
mkdir counts2 #all file containing counts will be move to this folder is the bash script file that is being called by

# Job name
#SBATCH --job-name readscounting
# Submit to the primary QoS
#SBATCH -q primary
# Request one node
# Total number of cores, in this example it will 1 node with 1 core each. 
#SBATCH -n 8
# Request memory
#SBATCH --mem=100G
# Request a node with avx2 instruction set 
#SBATCH --constraint=avx2
# Mail when the job begins, ends, fails, requeues 
#SBATCH --mail-type=ALL
# Where to send email alerts
#SBATCH [email protected]
# Create an output file that will be output_<jobid>.out 
#SBATCH -o output_%j.out
# Create an error file that will be error_<jobid>.out
#SBATCH -e errors_%j.err
# Set maximum time limit 
#SBATCH --time=1-01:00:00

module unload python
module load anaconda3.python
source activate htseq


htseq-count ../bams/${var}_clean.bam  $gtffile --stranded=reverse -f bam  > counts2/$var.cnts

echo ${var} >> Finished.txt

  Move to the folder “counts2” which was previously created with script. Then add and launch the following script

# Job name
#SBATCH --job-name readscountingsummary
# Submit to the primary QoS
#SBATCH -q primary
# Request one node
# Total number of cores, in this example it will 1 node with 1 core each. 
#SBATCH -n 8
# Request memory
#SBATCH --mem=100G
# Request a node with avx2 instruction set 
#SBATCH --constraint=avx2
# Mail when the job begins, ends, fails, requeues 
#SBATCH --mail-type=ALL
# Where to send email alerts
#SBATCH [email protected]
# Create an output file that will be output_<jobid>.out 
#SBATCH -o output_%j.out
# Create an error file that will be error_<jobid>.out
#SBATCH -e errors_%j.err
# Set maximum time limit 
#SBATCH --time=1-01:00:00

a=`head -n 1 ../names.txt`
## cd counts
cut -f 1 "$a".cnts >counts.txt
for i in ` cat ../names.txt`
	cut -f 2 "$i".cnts >tmp1
	cp counts.txt tmp2
	paste tmp2 tmp1 >counts.txt
# add header
# add header
echo "Genes" >tmp1.txt
`cat ../names.txt >tmp2.txt`
cat tmp1.txt tmp2.txt|paste -s -d '\t' >tmp3.txt
cp counts.txt tmp1.txt
cat tmp3.txt tmp1.txt >counts.txt
rm tmp1.txt
rm tmp2.txt
rm tmp3.txt
rm tmp1
rm tmp2


This repository contains all scripts files of the GxP project






No releases published


No packages published