Restrictions and Licence
Using WebScipio and Scipio by non-academics requires permission. WebScipio and Scipio may be obtained upon request and used under a Creative Commons License.
If you use Scipio for your research, please cite:
O. Keller, F. Odronitz, M. Stanke, M. Kollmar & S. Waack (2008)
Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species.
BMC Bioinformatics 9, 278.
Download Scipio
Download the latest version of the Scipio command line tool:
Scipio Version 1.4
Dowload search example files to test the Scipio tool (see help pagesfor more information):
Search Examples
Dowload expert search example files to test the Scipio tool (see help pagesfor more information):
Expert Search Examples
Version history
2013-05-24 Version 1.4.1
- Fix for GenePainter: prot_start and prot_end positions are given for each exon in the YAML result file.
2010-08-09 Version 1.4
- added user option --exhaust_gap_size (maximum protein region that Needleman/Wunsch is executed on; default: 3*tilesize)
- added timestamp and parameter list to YAML output
- some minor bugfixes
(from 2010-07-22)
- allow lower case protein inputs
- bugfixes for initial/terminal exon search
(from 2010-07-14)
- removed unneeded and undocumented option --nomargins
- extending terminal exons to the next start/stopcodon
- remove all white space from target sequences read in (especially CRs from Windows-style file have caused problems)
(from 2010-06-14)
- replaced user option --multiple_from (previously --force_score) by a new option --multiple_results showing all results with at least one hit with a score exceeding --min_score. (Hence, the old behaviour can be obtained with --multiple_results --min_score=f instead of --multiple_from=f)
- added new option --accepted_intron_penalty=f legalizing introns with nonstandard splice site patterns that were previously classified as "intron?". In particular, a value of at least 1.1 will allow for at-ac to be accepted, and 1.2 for ga-ag and gg-ag.
2010-05-12 Version 1.3
Scipio Version 1.3Scipio
- new user option --max_move_exon (how far away from a BLAT prediction Scipio checks candidate intron locations in codons; default:2, 4 is recommended for cross-species search)
- A translation table option has been implemented. If you search against genomes from species that use a different translation table, like Pichia or Tetrahymena, you will not get a long list of "mismatches" anymore.
- usage information rewritten
- Some genome files have space characters in the fasta-headers so that the fasta entries have the same name until the first space character. BLAT ignores everything after the first space but also aborts if the fasta-entries have the same names. Oliver Keller has written a small script to substitute space characters by underline characters:
space2score.pl <genome_file.fa>
- bugfix: target overlaps containing whole exons now work
yaml2log
- Some of the example cases for mismatches/frameshifts, that have been missing in the first version in the log-file header, have now been included.
- yaml2log now makes line breaks if a maximum width is specified:
yaml2log.pl --width=NN <input.yaml> output.log
(from 2009-12-16)
- new user options:
--min_coverage (minimal portion in % of query that is aligned; rejects hits with large gaps; default:60)
--min_dna_coverage (minimal portion in % of target that is aligned; rejects small partial hits joined by large marginal introns; default:0, maximal 0.2)
--max_assemble_size (maximal size of marginal intron; default:75 kbps)
--gap_to_close (maximal size of query gap that will be closed by adding mismatches; default:6 residues) - parameters controlling the Needleman-Wunsch algorithm can now be set by user options (--nw_*_penalty), readjusted default values, introduced stop penalty
- refined splice site evaluation, which is now used for both Needleman-Wunsch and ordinary intron search
- "intron?" type is now used for all splice site patterns except gt-ag and gc-ag
- several minor bugfixes concerning the addition of exons
(from 2009-11-17)
- new user option:
--min_intron_len (minimum length of an intron; default:22 bases) This was previously set fixed to 6 nucleotides. - re-implementation of the Needleman-Wunsch algorithm to recognize splice site patterns; new default parameters
- minor changes to the status output
- prefer adding mismatches to leaving small gaps now
- user option --force_score now called --multiple_from
- The penalties for gaps/insertions have been adjusted so that longer sequence differences between query and target are allowed. This is especially useful for cross-species searches. Now, 12 extra bases in the target sequence and 6 extra residues in the query sequence are allowed and marked as frameshifts instead of creating a gap.
- some error and warning messages were clarified
- several minor bugfixes concerning the addition of exons and the assembling of partial hits
2009-09-13 Version 1.2
- new default parameters for the Needleman-Wunsch algorithm
- new user options:
--exhaust_align_size (maximum DNA region where Needleman/Wunsch is executed on)
--single_target_hits (or --chromosome) (prevents assembling of partial hits. If Scipio is used with this option for searches against chromosome-assembly data (or data with large supercontigs) hits on several targets are impossible supposing that a gene cannot be spread on different chromosomes.)
--plusstrand / --minusstrand (some YAML parsers do not like the value '-') - If the query protein sequence is spread on more than one contig all sequence between the last exon on the contig to the contig-end and the sequence from the contig-start to the first exon on the following contig will be given as "intron" or "intron?". In Scipio version 1.0 only 2000 bp were given because hits could be assembled from several chromosomes producing extremely large yaml-files.
- allow queries containing stopcodons
- many bugfixes
2008-12-15 Version 1.1
- changed default behaviour when --blat_output is not given: BLAT is now always started in this case (rather than trying to find a .psl file in the current directory) and the output is written to a temporary file. The new user option --keep_blat can be used to prevent deleting the blat output file.
- first version of the Needleman-Wunsch algorithm is implemented to identify short exons, that have not been found by BLAT, in gaps
- query sequences may now contain gap characters
- prevent huge marginal introns (those joining partial hits over target boundaries)
- user option --min_identity now given in % (analogously to BLAT)
- prefer intron location at codon boundaries if a better splice site pattern is not found
- minor bugfixes