NEWS
HTSeqGenie 4.17.1
- fix bug in targetLenght plotting in reportSummaryQC
- make countGenomicFeatuers work with subclasses of GRangesList
HTSeqGenie 4.5.1
- changed variant calling tests in accordance with changes made to bam_tally in gmapR
HTSeqGenie 4.4
- bioc release version bump
HTSeqGenie 4.2
- bioc release version bump
HTSeqGenie 4.1.15
- investigating a memory fault produced by tallyVariants() in the test.callVariantsVariantTools.genotype() unit test
HTSeqGenie 4.1.13
- fixing NGSPIPE-191: removed mergeReads and peared references ; changed BioC imports ; should build on the BioC server (but cannot
right now due to VariantTools::tallyVariants memory issues)
HTSeqGenie 4.1.11
- fixed NGSPIPE-182: updated the documentation of generateSingleGeneDERs(), highlighting that
the function generates DEXSeq-ready exons by disjoining the whole exon set, keeping only the exons of coding regions
and keeping only the exons that belong to unique genes.
HTSeqGenie 4.1.10
- add simple check for duplicate ids to prepocessReads. Note this will only catch dups within a chunk.
HTSeqGenie 4.1.9
- fixed a small glitch that prevented truncateReads to work with trimReads.trim5 config parameter
only (without length truncation)
HTSeqGenie 4.1.8
- fixed NGSPIPE-167: added 5'-read trimming with the configuration parameter trimReads.trim5
(code, unit test, configuration check code, handling corner cases)
HTSeqGenie 4.1.7
- fixed the NGSPIPE-150 ticket: cleaned up readRNASeqEnds and removed consolidateByRead
HTSeqGenie 4.1.6
- fixed a buglet (absence of summary_alignment.tab file) that prevented test.callVariantsVariantTools.genotype()
to run
- moved the loading of genomic_features in countGenomicFeatures and not countGenomicFeaturesChunk, for speedup
- R CMD check ok
HTSeqGenie 4.1.5
- fixed the NGSPIPE-172 ticket on "variant calling slow on GRCH38". The selftest took about 1.5 h to complete,
due to the analyzeVariants() phase of the two test.runPipeline.Exome() tests, each lasting about 0.6 h.
This was due to (1) a wrong loading of the huge dbSNP database for avgNborCount filtering (needed only if
the avgNborCount keyword is present in the config parameter "analyzeVariants.postFilters". And due to (2) a
costly tiling of the genome, which is not required for small datasets (with a number of reads lower
than 100e6). The selftest now takes about 34 minutes to complete.
HTSeqGenie 4.1.4
- added test.readRNASeqEnds.dupmark() to test if readRNASeqEnds() handles properly dups (which is the case; the
bug NGSPIPE-180 on "Ignore Dup marked reads during feature counting" is due to the fact that markdup is done
on whole bams, while featuring counting is done on chunks). This revision closes NGSPIPE-180.
HTSeqGenie 4.1.3
- filter out alignments that are marked as dups and QC failed using the proper bam flag before feature counting and coverage
HTSeqGenie 4.1.2
- activate read quality trimming for GATK-rescaled
- move sanger quality at the end of the list of possible qualities, as we do not do quality trimming for real sanger.
Current and Recent data from illuma should always come with illumina 1.5 or 1.8 range, so we want to make
sure the read triming is triggered.
HTSeqGenie 4.1.1
- createTmpDir() is now more robust and retries if dir already exists.
this is needed as the indel realigner step uses this function to parallelize by chromosomes.
For genomes with lots of chromomes/contigs such as GRCh38 (>200) this fixed a rare bug.
HTSeqGenie 4.1.0
HTSeqGenie 3.17.9
- add variant_effect_predictor step
HTSeqGenie 3.17.8
- make sure chunk_dir points to some non-existant dir
HTSeqGenie 3.17.7
- VT changed its default Filters, probably due to an error.
Until that is resolved, we manually create out Filterset ourselves.
HTSeqGenie 3.17.6
- de-activate indel Realignment step in mergeLanes again.
Doesn't work as expected as chunk_firs are set to merged in config which throws up
indel realignment step.
HTSeqGenie 3.17.5
- update buildGenomicFeaturesFromTxdb with GNE internal version
- add tests
HTSeqGenie 3.17.4
- add config mergeReads.pear_param passed down to pear binary view peared::pear()
HTSeqGenie 3.17.3
- activate indel Realignment step in mergeLanes
HTSeqGenie 3.17.2
- add mergeReads step using pear
HTSeqGenie 3.17.1
- new dev cycle
- fix some missing imports that causes problems when functions re-used in other package
HTSeqGenie 3.15.7
- use less cores for VT genotyping to use less memory
HTSeqGenie 3.15.6
- adapt to changes in API of VariantTools BSgenome.Hsapiens.UCSC.hg19
- use GeneGenome from gmapR to create test genomes
HTSeqGenie 3.15.5
- call genotypes via VariantTools controlled by config value
HTSeqGenie 3.15.4
- do some cleanup and gc during coverage computation
HTSeqGenie 3.15.3
- parallelize Indel realignment by chromosome
HTSeqGenie 3.15.2
- use a binary tree-reduce like algorithm to merge coverages.
For a big WGS this reduced runtime from 12h to 2 h.
HTSeqGenie 3.15.1
- start next dev cycle
- remove count_transcripts and count_ncRNA_nongenic from QA
HTSeqGenie 3.14.1
- 3.14.1 release
- fixed the fragmentLength NA bug spotted by Jinfeng
HTSeqGenie 3.13.20
- use .gz suffix for vcf files to make IGV happy
- reactivate genome tiling with larger tiles of 1e8
HTSeqGenie 3.13.19
- force coverage to use SimpleIntegerLists if data becomes too dense
- disable genome tiling during bam_tallying, as it incures a huge time overhead
under some circumstances
HTSeqGenie 3.13.18
- new convenience function findVariantFile
- use genome tiling with VariantTools to keep memory footprint small
HTSeqGenie 3.13.17
- depend on VariantTools 1.6.1 to get the bamtally memory leak fix
HTSeqGenie 3.13.16
- version bump to match HTSeqGenie.gne
HTSeqGenie 3.13.14
- turn off read_min_length check in mergeLanes as due to qual trimming we now expect lanes to differ
HTSeqGenie 3.13.13
- speed up coverage step for WGS data by switching back to SimpleIntegerList if the coverage becomes too ragged.
HTSeqGenie 3.13.12
- repurpose the gatk.params config value, for general GATK params. This param will be passed on to all
GATK calls. An example would be
gatk.params: -L intervals.bed
to restrict realignment and variant calling to a certain subset of the genome thus saving
huge amounts of time for targeted sequencing approaches with a few hundred genes.
- now uses TP53Which() (instead of which.rds) for variant calling tests
- now builds in BioC
HTSeqGenie 3.13.11
- now use TxDb.Hsapiens.UCSC.hg19.knownGene's version number in the TP53 genomic feature cache file, to avoid
bugs when updating TxDb.Hsapiens.UCSC.hg19.knownGene
HTSeqGenie 3.13.10
- added an optional which config to limit VariantTools variant calling to certain regions
HTSeqGenie 3.13.9
HTSeqGenie 3.13.8
- fixed the "as.vector(x, mode) : invalid 'mode' argument" bug by adding the "importFrom(BiocGenerics,table)" NAMESPACE directive
HTSeqGenie 3.13.7
- enable indelRealigner in runPipeline controlled by
config value of alignReads.do
HTSeqGenie 3.13.6
- set gatk.path in options if GATK_PATH env is set to an existing file.
This is mostly for allowing the unit tests to know if and where a GATK is installed
HTSeqGenie 3.13.5
- require bamtally bugfixed version of gmapR
HTSeqGenie 3.13.4
- fixed bug in generateCountFeaturesPlots() when no genes left after filtering.
Practically only happens for bacterial or viral genomes when every gene is hit more than 200 times
HTSeqGenie 3.13.3
- now saving fragmentLength in summary_coverage.txt
- using Jinfeng's weighted mean to estimate fragmentLength
- removed the parallelization in saveCoverage (since saveCoverage is used in parallel)
HTSeqGenie 3.13.2
- activate read trimming for illumina 1.5 quality using the 'B' as qual indicator
HTSeqGenie 3.13.1
- deactived the unit tests (test.wrap.callVariants, test.wrap.callVariants.parallel, test.wrap.callVariants.rmsk_dbsnp)
until VariantAnnotation is fixed, to be able to compile HTSeqGenie on BioC servers
HTSeqGenie 3.13.0
- change to new dev cycle version number
HTSeqGenie 3.12.0
HTSeqGenie 3.11.20
- add extra config value to turrn off gatk repeat filtering
- do not filter GATK variants for repeats by default, as this produces buggy vcf
HTSeqGenie 3.11.19
- make indel calling default for VT
HTSeqGenie 3.11.18
- -make the summary qa report much more robust. Now runs on single and paired end,
Varaint calling is optional. Also works on non human/mouse genomes
HTSeqGenie 3.11.17
- make sure VRanges are sorted before writing as vcf until Variantannotatin fixes bug
HTSeqGenie 3.11.16
- fixed the NA bug when the file indicated by template_config is not found, now reports an explicit error message
HTSeqGenie 3.11.15
- filter GATK variants by overlap with low_complexit regions if bed file given in config
HTSeqGenie 3.11.14
- -fixed bug in VariantCalling where dbsnp whitelist was effectively not passed but lost in the '...' orkus.
HTSeqGenie 3.11.13
- remove getRDataFromFile and replace with safeGetObject() which is used in a couple of scripts
HTSeqGenie 3.11.12
- fix bug in sanity check that did not count circular mappings
HTSeqGenie 3.11.11
- now breaks properly when input FastQ gzipped files are corrupted/truncated
HTSeqGenie 3.11.10
- vignette updated to build on BioC dev
HTSeqGenie 3.11.9
- TP53GenomicFeatures() now uses the same temp dir as TP53Genome(), to fix a build issues on BioC servers
HTSeqGenie 3.11.8
- -choice of VT filters configurable via config
HTSeqGenie 3.11.7
- mergeBams(..., sort=FALSE) in alignReads.R: since bams are already sorted in chunks, merging preserves the order
and no re-sorting should is necessary
HTSeqGenie 3.11.6
- saveCoverage now outputs a summary.coverage summary file
HTSeqGenie 3.11.5
- directly export coverage from Rle with rtracklayer::export (instead of to make a memory-costly convertsion to a RangedData object)
HTSeqGenie 3.11.4
- update variant calling code to work with VariantTools 1.3.6
- include Jens' minlength=1 modification to handle reads fully trimmed
HTSeqGenie 3.11.3
- exports loginfo, logdebug, logwarn, logerror
- uses TallyVariantsParam and as(,"VRanges") to fix a bug preventing compilation on BioC
- the number of threads used during processChunks is divided by 2 due to an erroneous extra mcparallel(...) step in sclapply/safeExecute
- use a maximum of 12 cores in preprocessReads
HTSeqGenie 3.11.2
- trim lowest quality tail of reads according to Illumina manual.
This new quality step precedes the regular quality filtering that is still done.
HTSeqGenie 3.11.1
- allow to use better filters for variant tools
by default, uses standard filters (bad),
but if repeat masked track and dbsnp are given in config it uses those.
HTSeqGenie 3.11.0
- starting the new development branch
HTSeqGenie 3.10.1
HTSeqGenie 3.9.31
- -use new VariantTools API to do id verifu in mergeLanes.
works for both GATK and VariantTools
HTSeqGenie 3.9.30
- fix logger bug in mergeLanes
HTSeqGenie 3.9.29
- make report_pipeline_QA work with new version, since some
fields in the summaries are gone or renamed
HTSeqGenie 3.9.28
- revert recent changes to gatk.R.
We now make our own gz and index file, as GATK
repeatedly failed on some samples.
HTSeqGenie 3.9.27
- allow multiple paths in HTSEQGENIE_CONFIG, separated by :
HTSeqGenie 3.9.26
- removed preschedule=FALSE in wrapGsnap
- added vcfStat to produce summary_variants.tab file
- changed vcf name from "_variants.vcf.gz" to ".variants.vcf.gz", to stay consistent within the results/ directory
HTSeqGenie 3.9.25
- VariantTools "analyzeVariants.indels" are OK
HTSeqGenie 3.9.24
- aloow for new quality score range "GATK-rescaled" from 1-50 (33-83 in ASCII)
HTSeqGenie 3.9.23
- added the config parameter "analyzeVariants.indels"
HTSeqGenie 3.9.22
- removed the dependency towards the "logging" package
HTSeqGenie 3.9.21
HTSeqGenie 3.9.20
- preparation to BioC submission
- now using detectRRNA.do: FALSE in default-config.txt
HTSeqGenie 3.9.19
- removed mc.preschedule=FALSE from mergeBAMsAcrossDirs
- added the configuration parameter 'analyzeVariants.method' (GATK check has yet to be done)
HTSeqGenie 3.9.18
- now depends on VariantTools 1.1.13 that fixes the mclapply(mc.preschedule=FALSE) bug
HTSeqGenie 3.9.17
- added the config parameter 'alignReads.analyzedBam' to control how analyzed.bam are built
- removed the former config parameter 'alignReads.analyzed_bamregexp' that could not work on single ends
HTSeqGenie 3.9.16
- sessionInfo() is not called anymore in writePreprocessAlignReport() during generation of report,
to prevent crash when PACKAGES have been updated while the pipeline is running
- sclapply() now uses a 'finally' cleanup procedure to kill all threads it has created
- added some unit tests to check that no leftover threads are present after sclapply() in different scenarios
HTSeqGenie 3.9.15
- use low lever variant calling interface from VariantTools.
This allows for access to the raw_variants as well as the filtered ones/
- variant calling now included in mergeLanes()
HTSeqGenie 3.9.14
- added the config parameter "alignReads.use_gmapR_gsnap" to control if gsnap should be called from gmapR or from the PATH
- default config parameter "alignReads.use_gmapR_gsnap" is now TRUE
- removed the duplicated default config parameters: path.picard_tools, markDuplicates.do
- added a check in checkConfig() to stop if some config paramters are duplicated
HTSeqGenie 3.9.13
- add variant calling using VariantTools (not yet parallelized yet)
HTSeqGenie 3.9.12
- include markDuplicates into runPipeline(), controlled by markDuplicates.do config
HTSeqGenie 3.9.11
- refactor setupTestFramework() to allow for injection of TP53 genome template
HTSeqGenie 3.9.10
- add function to mark duplicates via picard tools
HTSeqGenie 3.9.9
- fixed detectRRNA code, including bug in wrapGsnap
- add test for detectRRNA working on tp53 genome
HTSeqGenie 3.9.8
- the system command 'samtools' is no used anymore in the code
- removed unused functions: indexBAMFiles, filterBam, getReadLengthFromBam, getBamIndexStats
- (filterBam will be back in the xenograft module)
HTSeqGenie 3.9.7
- works with Biobase 2.18.0 (Bioconductor release 2.11)
- fixed the "x is not present in the PATH" bogus message
- old gmapR stuffs are now gone: parallelized_gsnap, consolidateSAM, consolidateGsnapOutput, consolidateBAm
- now use wrapGsnap, to facilitate the transition to the gsnap offered by gmapR
- now depends on gmapR (to load TP53Genome())
HTSeqGenie 3.9.6
- make remaining tests run with TP53 genome
- move detectRRNA tests to HTSeqGenie.gne as they depend on IGIS
HTSeqGenie 3.9.5
- remove runPipeline tests depending on IGIS.
Instead use simple integration test based on TP53 genome.
This requried additon of :
R/runPipeline.R
R/TP53GenomicFeatures.R
copied from bioc branch and dependance on gmapR for the TP53Genome
HTSeqGenie 3.9.4
- converging with the BioC version: adding @internal keyword
- configuration parameter
HTSeqGenie 3.9.3
- minor comments (converging with the BioC version...)
- checks OK on module apps/ngs_pipeline/dev
HTSeqGenie 3.9.2
- removed everything related to calculateJunctionReads, junctionReads (due to the usage of an obsolete newCompressedList in BioC)
- checks OK on apps/ngs_pipeline/dev
HTSeqGenie 3.9.1
- removed everything related to SNVsOmuc, analyzeVariants, variantConcordance (due to gmapR conflict)
- renamed CHANGES into NEWS
HTSeqGenie 3.9.0
HTSeqGenie 3.8.0
- added the configuration parameter "filterQuality.minLength", to remove reads shorter than filterQuality.minLength during preprocessReads(). Default is NULL.
- added the configuration parameter "alignReads.analyzed_bamregexp", to specify the regexp to select bam files to build analyzed.bam. Default is "_uniq.*\.bam$".
- added the configuration parameter "coverage.do" to enable/disable coverage computation. Default is TRUE.
- added the configuration parameter "coverage.maxFragmentLength" to remove long read pairs, as suggested by Thomas when analysing ChIP-Seq. Default is NULL.
- the ChIP-Seq config files now have "alignReads.analyzed_bamregexp: concordant_uniq.*\.bam$" and "coverage.maxFragmentLength: 1e4" by default
- coverage computation now uses SimpleRleList and should be faster
HTSeqGenie 3.6.1
- speedup: calculateCoverage() now uses a map/reduce technique to speed up coverage computation
- speedup: bamCountUniqueReads() does not scan bam file per chromosome any more
- mergeLanes() now supports missing variants or missing countGenomicFeatures
- BioC 2.11 fix: using queryLength() instead of nrow() after findOverlaps()
HTSeqGenie 3.6
- support of IGIS 2.2
- support of R 2.15.0 and Bioconductor 2.10
- support for ChIP-Seq analysis (template configurations and coverage read extension)
- support for merging lanes (ngs_merge)
- support for variant concordance comparison (ngs_vconcord)
- results are different from 3.4.1 (due to the new splice sites of IGIS 2.2 used during alignment and due to the new variant caller)
correlation of RNA-Seq RPKM is usally higher than 0.99 between 3.4.1 and 3.6.0. Variant concordance is also typically higher than 0.999
- now computes intronic RNA counts
- minor bug fixes
HTSeqGenie 3.5.17
- mergeLanes (and therefore, ngs_merge) now checks that sample versions are identical before merging
- mergeLanes and has an improved interface to include parameters that have to be ignored during checkInputConfigs()
HTSeqGenie 3.5.16
- fixed the warning message "replacing previous import ‘density’ when loading ‘stats’" (coming from the chipseq package, version 1.6.1 fixes this message)
- fixed the gzfile(description) bug caused in calculateJunctionReads, due to the fact that "countGenomicFeatures.gfeatures" was needed when computing calculateJunctionReads()
- fixed the GenomeSeq-USA300-config.txt (removed the "path.genomic_features:", which is not need anymore, and added "countGenomicFeatures.do: FALSE")
HTSeqGenie 3.5.15
- added configuration parameters: analyzeVariants.bin_fraction
- new TxDb.*.BioMart.igis 2.2.0 with correct seqlenghts
HTSeqGenie 3.5.14
HTSeqGenie 3.5.13
- added ChIP-Seq config files
- added configuration parameters: countGenomicFeatures.do, analyzeVariants.do, coverage.extendReads, coverage.fragmentLength
- now uses SNVsOmuC 1.0.1
- ngs_merge can now merge only one file (not optimized)
- fixed ngs_vconcord (doesn't display the subgraph igraph 0.6 version issue any longer)
- added tmp_dir to config. If set will be used to store temporary chunk dirs
HTSeqGenie 3.5.12
- use min and max_processed_read_length for merge_checks instead of bam file
- use lsf_ngs_merge script that worked well in CGP3
HTSeqGenie 3.5.11
- added filterBam, to filter bam files based on a logical vector
- now writes merged ... </merged> in merged config files
- now trim target length at 600 in calculateTargetLengths, fixing a bug reported by Gregory Zynda
HTSeqGenie 3.5.10
- now building the "intron" track
- added the "intron" track in gfeatures-human-IGIS_2.1.0b.RData and gfeatures-mouse-IGIS_2.10b.RData; all other tracks are strictly identical to the ones in
gfeatures-human-IGIS_2.1.0.RData and gfeatures-mouse-IGIS_2.1.0.RData, respectively
HTSeqGenie 3.5.8
- copied HTSeq and RNASeqGenie in HTSeqGenie
- checks are OK!
HTSeqGenie 3.5.6
- copied HTSeq into HTSeqGenie
HTSeqGenie 3.5.5
- now works with R 2.15.0/Bioconductor 2.10 (and still works with R 2.14/ngs_pipeline environment)
- added ngs_concord script, to compute variant concordance between samples
- coverage.RData is now a RangedData object
HTSeqGenie 3.5.3
- added lsf_ngs_merge and ngs_merge scripts
- ngs_pipeline does not crash anymore when fed with bogus arguments
HTSeqGenie 3.5.2
- mergeLanes now uses safeExecute to save memory between merging steps
HTSeqGenie 3.5.1
- initPipelineFromSaveDir now updates save_dir
- mergeLanes.R does not check for identical quality_encoding any longer
- mergePreprocessSummary does not check for identical read length any longer
- mergeLanes now accepts config_update
HTSeqGenie 3.5.0
HTSeqGenie 3.4.1
- set default config parameter analyzeVariants.use_read_length to FALSE, to prevent the
buggy 3*2 Fisher's test used in SNVsOmuC/variantFilter.R to crash the pipeline
- overload logdebug, loginfo and logwarn with a try() statement, to prevent errors when
concurrent threads are logging at the same time
HTSeqGenie 3.4
- change gsnap param -E from 4 to 1. This should allow gsnap to find more translocations.
HTSeqGenie 3.3.11
- get rid of analysis type. At this point the only thing we do differently for Exome vs RNASeq
is the call to gsnap. Since that is actually created from the snp, splice and gsnap_param option
in the config, we donl;t need this explicite type any more.
HTSeqGenie 3.3.10
- now creating summary_analyzed_bamstats.tab
- reportQA now includes analysed bam stats
HTSeqGenie 3.3.9
- added computeBamStats, createSummaryAlignment, mergeSummaryAlignment
- now summary_alignment are merged (and not recomputed on the merged bams)
- new bam statistics in computeBamStats
HTSeqGenie 3.3.8
- fixed buildSplicesIIT, to build correct IIT splicing file, with checks
- generated splices-human-IGIS_2.1.0b and splices-mouse-IGIS_2.1.0b
- now using splices-human-IGIS_2.1.0b and splices-mouse-IGIS_2.1.0b in RNA-Seq template configs
- deleted bogus splices-human-IGIS_2.1.0 and splices-mouse-IGIS_2.1.0
HTSeqGenie 3.3.6
- added input_min_read_length, input_max_read_length, processed_min_read_length, processed_max_read_length in summary_preprocess
- removed reportwarning
HTSeqGenie 3.3.5
- added reportwarning (to log warnings in {save_dir})
- the pipeline now reports a warning if reads are of variable lengths
HTSeqGenie 3.3.4
- added mergeLanes, to merge lanes
- added the unit test: test.mergeLanes
- now check during preprocessReads that read are of constant length
- added concatListElements, used when building DEXSeq in buildGenomicFeatures
- added getReadLengthFromBam
- added test.runPipeline.identical320, to test if the results are identical compared to NGS 3.2.0
- the pipeline now fails if reads are of variable length
HTSeqGenie 3.3.3
- checkConfig now checks for absence of whitespace in: input_file, input_file2, save_dir, prepend_str, alignReads.sam_id
- the pipeline now fails if one chunk fails (stop.onfail=TRUE in processChunks)
- fixed the empty chunk bug, added the unit test: test.alignReads.sparsechunks
- added statCountFeatures(), to compute diverse quantile statistics on read/feature counts
HTSeqGenie 3.2
HTSeqGenie 3.1.5
- num_cores is now 1 by default
- alignReads.nbthreads_perchunk is now empty by default
- if unspecified, alignReads.nbthreads_perchunk is set to min(4, num_cores)
HTSeqGenie 3.1.4
- safeUnlink now stops if it can't delete a file
HTSeqGenie 3.1.3
- disabling ShortRead OPEN_MP, which crashes R when used in combination with mcparallel()
- safeExecute now executes an expression in a child thread, to avoid allocating memory in the main thread (this is the cause of memory leaks,
since R has a internal hashtable to store strings that keeps growing and is never cleared up)
- quality encoding upper limit of "illumina1.5" is now 105 instead of 104, to accomodate Phred-qualities of 41 (instead of 40)
HTSeqGenie 3.1.2
- added GenomeSeq-USA300-config.txt
HTSeqGenie 3.1.1
- added parseProgressLog() to parse progress.log files
- added gc() in processChunks() to save memory before firing new threads
- default.config: num_cores set to 4 and alignReads.nbthreads_perchunk to 4, for performance reasons
- now using safeUnlink(), to not follow symlink dirs when deleting files/dirs
- checkConfig.template() now first looks in the local directory for a template config file
- trimReads can now trim reads of variable lengths and now keeps the input quality encoding
HTSeqGenie 2.99.39
- now using quality_encoding: sanger, solexa, illumina1.3, illumina1.5, illumina1.8
HTSeqGenie 2.99.38
- now uses the config parameter quality_encoding, which can take a value out of: sanger, solexa, illumina13, illumina15, illumina18
- added detectQualityInFASTQFile
- added deterministic subsampling test
- the argument filname is now optional in writeConfig and writeAudit
HTSeqGenie 2.99.36
- umask is now set in both initPipelineFromConfig and initPipelineFromSaveDir
- now produce an "analyzed.bam" file instead of "main.bam"
HTSeqGenie 2.99.35
- added config parameters: alignReads.nbthreads_perchunk, alignReads.static_parameters, analysis_type
- added ExomeSeq-human-config.txt and ExomeSeq-mouse-config.txt
- processChunks now accepts nb.parallel.jobs
- DEXSeq OK
- new buildAlignerParams that accepts nbthreads_perchunk and alignReads.static_parameters
HTSeqGenie 2.99.34
- creation of SNP indexes for human and mouse, now used in the RNASeq templates
HTSeqGenie 2.99.33
- path.gsnap and path.samtools are now gone
- alignReads.do, countGenomicFeatures.do, processUniqueMappers.do, calculateJunctions.do are now gone
- added checkConfig.tools to check that gsnap, samtools, get-genome and bam_tally
HTSeqGenie 2.99.32
- now outputs summary_alignment.tab
- now uses main.bam in the RNASeqPipeline
- added bamCountUniqueReads()
HTSeqGenie 2.99.31
- renamed processRawFastq by preprocessReads
- defined initPipelineFromConfig and initPipelineFromSaveDir
- output of preprocessReads is now summary_preprocess.tab
- creation of {prepend_str}.main.bam
- implemented safeExecute (which now does the memory tracing) instead of logErrorOnFail
HTSeqGenie 2.99.29
- IGIS 2.1 for human and mouse
HTSeqGenie 2.99.26
- implemented the subsampler (controlled by subsample_nbreads)
- added config parameters: remove_processedfastq and remove_chunkdir
- removed mergeChunks
- the summaryTable doesn't contain the preprocess_summary information anymore
HTSeqGenie 2.99.25
- major release!
- using IGIS and our internal gsnap version for pipeline 3.0
- output tabulated results for counts
- added finally argument to logErrorOnFail (to kill memtracer in case of exceptions)
- using a 2 s delay between fired jobs in processChunks (to prevent firing all jobs at the same time, avoiding I/O collisions)
- checks OK (except the mouse genome)
HTSeqGenie 2.99.24
- removed rpkm_old
- fused detectNcRNA with countGenomicFeatures
- removed config parameters related to detectNcRNA
- removed depluralization code
- changed config parameters countGenomicFeatures.granges by countGenomicFeatures.gfeatures
- count output is now tab files with name, count, width and rpkm
- addec config parameters alignReads.extra_parameters
- saveWithID now supports tab-separated file format
HTSeqGenie 2.99.23
- now using new 3.0 gsnap aligner
- now using hg19_IGIS21
- the aligner found highqualAdapterContamIn3PrimeEnd:1:1:1:7#0/1 was rRNA-contaminated (which is true, with CIGAR 30C3C3C1): updated test.processRawFastq_single_end()
- tests OK except mouse-related tests (genome mm9_IGIS21 has to be built)
HTSeqGenie 2.99.21
- added config parameter calculateJunctions.do
- now stops if all jobs fail
- renamed log/ by logs/
- remove chunks/ at the end
HTSeqGenie 2.99.20
- plot insert lenghts OK
- added config parameter: debug.remove_chunkdir
- renamed output directory profile/ to log/
- renamed output directory RData/ to results/
- removed fastq_for_aligner12 fields
HTSeqGenie 2.99.19
- now uses ShortRead 1.13.12 that fixes a FastqSampler bug (that causes random crashes!)
- buildShortReadReports works (now subsampling by default 20e6 reads)
HTSeqGenie 2.99.18
- uses gmapR 0.12.5 to log gsnap system calls
HTSeqGenie 2.99.17
- new file permissions are now -rw-r--r-- and dir permissions are drwxr-xr-x
- detectAdapterContam: cutoff is now 13.87229 (independent of read length, see estimateCutoffs)
- detectAdapterContam: now save read names
- detectAdapterContam: added mergeDetectAdapterContam
- no max_mismatches in the pipeline anymore: using gsnap's defaults or alignReads.max_mismatches if specified
HTSeqGenie 2.99.16
- removed txdb_info
- added config parameter max_mismatches
- config is now written in RData/
- preprocessed reads are now merged
HTSeqGenie 2.99.15
- supports the HTSEQ_CONFIG environment variable to look for template config files
- passes samtools path to gmapR
- checks the presence of non-empty config parameters
- now produces the output directory, with chunks/chunk_%06d, with audit.txt and progress.log in profile/
HTSeqGenie 2.99.9
- added config parameters: detectNcRNA.do and detectNcRNA.granges
HTSeqGenie 2.99.8
- bumped version number to stay in sync with RNASeqGenie version 2.99.7
- preprocess_summary$adapter_contam and preprocess_summary$rRNA_contam_reads are now
set to 0 if their modules are disabled, to prevent unexpected behaviors in the final report
- added getChunkDirs, mergePreprocessSummary
- the merge/ directory is created mergeLanes and not in initPipeline any more
HTSeqGenie 2.99.6
- uses gmapR 0.12.3 to fix a deadly bug in consolidateSAMFiles() causing random crashes
- now consolidateSAMFiles is silent
HTSeqGenie 2.99.5
- chunked loggs
- continue on fail
- added logErrorOnFail(), encaspulating tryKeepTraceback and getTraceback
- sclapply now passes chunkid as an additional argument
- sclapply accepts now a tracer function
- added processChunks(), that does sclapply + logErrorOnFail + continue on fail + chunked logs
HTSeqGenie 2.99.4
- added max_nbchunks for debug purposes
- added runTophalf, runPreprocessing, runAlignment
- ready to test on new CGP 2011 data!
HTSeqGenie 2.99.3
- using path-config.txt to store system-dependent paths
- renamed resync() by resource(dirname); which can reload any package R directory
- implemented checkConfig.countGenomicFeatures()
- cleaned up writeAudit()
- removed package dependencies: snow and gtools
- implemented tests for tryKeepTraceback, writeAudit and minichunks for processRawFastq
- implemented mergeProcessRawFastq
- new alignReads() that does the parallel/chunking job
- new merge/ directory
HTSeqGenie 2.99.2
- templated configuration using the parameter "template_config"
- moved parameters from globals() and HTSeqGenieBase_globals-default.dcf to our configuration file:
-- removed globals.R
-- new config parameters: path.gsnap, path.samtools, path.gsnap_bin_dir, path.genomic_features, path.gsnap_genomes
-- new config parameters: countGenomicFeatures.do, countGenomicFeatures.grange, countGenomicFeatures.txdb_info
-- removed config parameter: alignReads.gsnap (now path.gsnap)
-- removed HTSeqGenieBase_globals-default.dcf
- sclapply now uses the argument max.parallel.jobs in the third position
- removed setGenomeFiles() and added config parameter: detectRRNA.rrna_genome
HTSeqGenie 2.99.1
- version number roll to be ready to release the 3.0.0 version
- now used by RNASeqGenie
- FastQStreamer.init() and FastQStreamer.getReads() do not need the configuration environment any more
- renamed trimReadsList by trimReads, mismatches_per_readwidth by getMismatchesPerReadwidth
- getConfig() and getConfig.*() now stops if the parameter is not declared and returns NULL if empty
- NAMESPACE does not export all function any longer
HTSeqGenie 0.0.17
- full stream version
- make_dir was renamed in makeDir()
- overwrite_save_dir now takes a parameter out of "never", "overwrite" or "erase"
- parseDCF doesn't remove empty parameters any longer
- added getConfig.nonempty() to check if a paramter is non-empty
- alignReads is now in parallel using sclapply
HTSeqGenie 0.0.16
- transition to the stream version
- alignReads now uses a single core and uses "-B 2" mode, sharing genome memory between processes, to save memory
- this version is not expected to R CMD check OK
HTSeqGenie 0.0.15
- save_dir/chunk_%06d/ output
- save_dir/progress.log
HTSeqGenie 0.0.14
- processRawFastqChunks works well (TODO: collect data)
- added unit tests for: FastQStreamer.init(), FastQStreamer.getReads(), sclapply()
- added initPipeline() (to initialise the pipeline)
- preprocessReads() is now independent
HTSeqGenie 0.0.13
- moved setGenomeFiles in detectRRNA.R
- added config parameters: filterQuality.do, alignReads.do
- added config parameters: chunk_size, subsample_nbreads
- fixed another stupid bug in sclapply()
HTSeqGenie 0.0.12
- renamed myTry() by tryKeepTraceback()
- added the checkConfig.noextraparameters() config check
- fixed a stupid bug: sclapply() does not throw chunks when waiting any more
- added temporary processRawFastqChunks.R
HTSeqGenie 0.0.11
- implementation of scProcessReads() for a simple, efficient parallel processing of read chunks
HTSeqGenie 0.0.10
- restyled code (using " instead of ', using <- instead of =m, added comments)
- "align.*" config parameters renamed to "alignReads.*"
- config parameter "genome" renamed to "alignReads.genome"
- added writeFastQFiles(), to write generic FastQ files
- traceMem() is now failsafe
- detectRRNA() now accepts lreads as a first argument
- added setupTestFramework(), to set up test frameworks
HTSeqGenie 0.0.9
- the version is now able to process LIB2478_SAM634423_L1.R122!
- added traceMem() to track memory peak usage
- added the num_cores config parameter. If unspecified the parameter is guessed from the environment variable NCPUS (set by PBS) or by the multicore package
- detectAdapterContam.R/detectAdapterContam() is now parallelized using mcProcessReads()
HTSeqGenie 0.0.8
- added getMemoryUsage() to track memory peak usage
- added mcProcessReads() a safe version of mclapply, to process reads in a parallel fashion
- added the parameter config: debug.tracemem
- added isConfig(), to test the presence of a parameter
- getConfig() without parameter now returns the config list
- added initLog(), starting logging information with R session info and config parameters
- now, filterQuality() uses mcProcessReads() to filter reads in parallel: this should help to process LIB2478_SAM634423
HTSeqGenie 0.0.7
- created updateConfig()
- now loadConfig() doesn't call checkConfig() any longer
- the "local" mode is now activated only and only in interactive sessions
- setUpDirs() has been renamed and now accepts the argument overwrite
HTSeqGenie 0.0.4
- implemented the RUnit test suite
HTSeqGenie 0.0.3
HTSeqGenie 0.0.2
- stricter test.filterQuality() tests
HTSeqGenie 0.0.1