preprocess the sc3DG ======================== Navigation -------------------------------------- * :ref:`scHic` * :ref:`scHi-C+` * :ref:`Dip-C` * :ref:`HiRES` * :ref:`sn-m3C` * :ref:`scSPRITE` * :ref:`sciHi-C` * :ref:`snHi-C` * :ref:`snHi-C` * :ref:`scNanoHi-C` * :ref:`scMethyl` * :ref:`LiMAC` * :ref:`GAGE-seq` * :ref:`Droplet` * :ref:`Paired` Typical Workflow ---------------- .. _scHic: 1. scHi-C ~~~~~~~~~ Test data: /tutorial/scHi-C/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/scHic_no \ -t scHic \ -e MboI \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 Parameter Description: - **-o**: Location to save the result, note that **all paths must be absolute paths.** - **-f**: Directory where the sequencing data is located. - **-t**: Type of single-cell Hi-C. - **-e**: Type of restriction enzyme used, must be consistent with the experiment. - **-r**: Resolution used to convert pairs files into cool files. - **-i**: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index. - **-a**: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index. .. _scHi-C+: 2. scHi-C+ ~~~~~~~~~~~~~~~ Test data: /tutorial/scHic_index/data It should be noted that the number of fastq files generated by this technique should be 4, where _1 and _4 are reads, and _2 and _3 are the corresponding barcode files. .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/scHic_index \ -t scHic \ -e MboI \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --exist-barcode Note that the only difference with scHi-C is that you need to specify that it has barcodes with the parameter --exist-barcode. .. _Dip-C: 3. Dip-C ~~~~~~~~ Test data: /tutorial/Dip-C/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/dipC \ -t dipC \ -e MboI \ -i /absolute/path/to/hg38/hg38.fa \ --thread 60 Similar with scHi-C, STARK will automatically process the following steps according to the parameter -t. .. _HiRES: 4. HiRES ~~~~~~~~ Test data: /tutorial/HRIES/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/HIRES \ -t HIRES \ -e MboI \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 Similar with scHi-C, STARK will automatically process the follow-up according to the parameter -t. .. _sn-m3C: 5. sn-m3C ~~~~~~~~~ Test data: /tutorial/sn-m3C/data .. code-block:: bash stark count -o/absolute/path/to/result \ -f /absolute/path/to/data/sn_m3c \ -t sn_m3c \ -e MboI \ -r 10000 \ -i /absolute/path/to/bowtie2/hg38/hg38.fa \ --aligner bowtie2 \ --thread 60 Since the sn-m3C sequencing simultaneously methylation and Hi-C, only bismark can be used for assembly. Correspondingly, there is no multiple choice for -i and -a. However, bismark is based on bowtie2, you can write the parameters as bowtie2. .. _scSPRITE: 6. scSPRITE ~~~~~~~~~~~ Test data: /tutorial/scSPRITE/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/scSPRITE \ -t scSPRITE \ -e HpyCH4V \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --repeat-masker /absolute/path/to/mm10_rmsk.bed \ --exist-barcode Parameter Description: - **--sprite-config**: A txt file for scSPRITE generating barcode. - **--repeat-masker**: A bed file for genome masking of repetitive regions. .. _sciHi-C: 7. sciHi-C ~~~~~~~~~~ Test data: /tutorial/sciHi-C/data. It should be noted that not only two sequencing fastq files are needed in each sequencing file directory, but also txt files corresponding to inner_barcode.txt and outer_barcode.txt are needed. The format is as shown in the example. .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/sciHic \ -t sciHic \ -e DpnII \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --exist-barcode Similar with scHic_index, STARK will automatically carry out subsequent processing according to the parameter -t. .. _snHi-C: 8. snHi-C ~~~~~~~~~ Test data: /tutorial/snHi-C/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/sciHic \ -t sciHic \ -e DpnII \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --exist-barcode Similar with scHi-C, STARK will automatically process the follow-up according to the parameter -t. 9. snHi-C+ ~~~~~~~~~~ Test data: /tutorial/snHi-C+/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/snHic_index \ -t snHic \ -e MboI \ -i /absolute/path/to/hg38/hg38.fa \ --thread 60 \ --exist-barcode Similar with scHi-C, STARK will automatically process the following steps according to the parameter -t. .. _scNanoHi-C: 10. scNanoHi-C ~~~~~~~~~~~~~~ Test data: /tutorial/scNanoHi-C/data It is worth noting that scNanoHi-C uses third-generation sequencing, and the sequencing data directory should contain the fastq file, TN5.txt, PCR.txt, and index.txt. .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/scNano/data \ -t scNano \ -e MboI \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --exist-barcode scNanoHi-C uses third-generation sequencing technology and uses minimap2 by default. So the -a parameter will become ineffective. .. _scMethyl: 11. scMethyl ~~~~~~~~~~~~ Test data: /tutorial/scNanoHi-C/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/scMeth \ -t scMethyl \ -e DpnII \ -i //absolute/path/to/bowtie2/mm10/mm10 \ --aligner bowtie2 \ --thread 60 \ --exist-barcode .. _LiMAC: 12. LiMAC ~~~~~~~~~ Test data: /tutorial/LiMAC/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/LiMAC \ -t LiMAC \ -e MboI \ -i /absolute/path/to/hg38/hg38.fa \ --thread 60 Parameter Description: - **-o**: Location to save the result, note that **all paths must be absolute paths.** - **-f**: Directory where the sequencing data is located. - **-t**: Type of single-cell Hi-C. - **-e**: Type of restriction enzyme used, must be consistent with the experiment. - **-r**: Resolution used to convert pairs files into cool files. - **-i**: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index. - **-a**: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index. .. _GAGE-seq: 13. GAGE-seq ~~~~~~~~~~ Test data: /tutorial/GAGE-seq/data .. code-block:: bash stark count -o /absolute/path/to/result \ -f /absolute/path/to/data/GAGE-seq \ -t GAGE-seq \ -e MboI \ -i /absolute/path/to/mm10/mm10.fa \ --thread 60 \ --exist-barcode Parameter Description: - **-o**: Location to save the result, note that **all paths must be absolute paths.** - **-f**: Directory where the sequencing data is located. - **-t**: Type of single-cell Hi-C. - **-e**: Type of restriction enzyme used, must be consistent with the experiment. - **-r**: Resolution used to convert pairs files into cool files. - **-i**: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index. - **-a**: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index. .. _Droplet: 14. Droplet Hi-C ~~~~~~~~~ Test data: /tutorial/Droplet/data Please make sure you have installed bowtie before running .. code-block:: bash conda install bioconda::bowtie==1.3.1 after that, you need to build the bowtie index for the 10x barcode reference. .. code-block:: bash bowtie-build /path/to/10x/barcode/reference/ref.fa /path/to/10x/bowtie/index Then you can run the following command to process the droplet data. .. code-block:: bash stark count -t droplet \ --ref-10x /path/to/bowtie/index \ -f /absolute/path/to/data/droplet \ -i /cluster2/home/Kangwen/common/mm10/mm10.fa \ -e MboI \ -o /absolute/path/to/mm10/mm10.fa \ --exist-barcode \ --thread 32 Parameter Description: - **-o**: Location to save the result, note that **all paths must be absolute paths.** - **-f**: Directory where the sequencing data is located. - **-t**: Type of single-cell Hi-C. - **--ref-10x**: Directory of the bowtie index for 10x barcode reference. - **-i**: bwa index of the genome file used for alignment .. _Paired: 15. Paired ~~~~~~~~~ Test data: /tutorial/Paired/data Please make sure you have installed bowtie before running .. code-block:: bash conda install bioconda::bowtie==1.3.1 after that, you need to build the bowtie index for the 10x barcode reference. .. code-block:: bash bowtie-build /path/to/10x/barcode/reference/ref.fa /path/to/10x/bowtie/index Then you can run the following command to process the droplet data. .. code-block:: bash stark count -t Paired \ --ref-10x /path/to/bowtie/index \ -f /absolute/path/to/data/droplet \ -i /cluster2/home/Kangwen/common/mm10/mm10.fa \ -e MboI \ -o /absolute/path/to/mm10/mm10.fa \ --exist-barcode \ --thread 32 Parameter Description: - **-o**: Location to save the result, note that **all paths must be absolute paths.** - **-f**: Directory where the sequencing data is located. - **-t**: Type of single-cell Hi-C. - **--ref-10x**: Directory of the bowtie index for 10x barcode reference. - **-i**: bwa index of the genome file used for alignment The illumination of the Result --------------------------------- Here is the introduction to the results: .. code-block:: bash scSPRITE_test_tmp/ ├── Result │   ├── cool_folder │   ├── [Even2Bo10][Odd2Bo69][DPM6bot1]_10000.cool │   ├── [Even2Bo10][Odd2Bo69][DPM6bot1]10000.KR.cool │   ├── [Even2Bo11][Odd2Bo19][DPM6bot31]_10000.cool │   ├── [Even2Bo11][Odd2Bo19][DPM6bot31]10000.KR.cool │   ├── [Even2Bo11][Odd2Bo1][DPM6bot75]_10000.cool │   ├── [Even2Bo11][Odd2Bo1][DPM6bot75]10000.KR.cool │   ├── mcool_folder │   │   ├── [Even2Bo10][Odd2Bo69][DPM6bot1].mcool │   │   ├── [Even2Bo11][Odd2Bo19][DPM6bot31].mcool │   │   ├── [Even2Bo11][Odd2Bo1][DPM6bot75].mcool │   └── SCpair │   ├── [Even2Bo10][Odd2Bo69][DPM6bot1].pairs.gz │   ├── [Even2Bo11][Odd2Bo19][DPM6bot31].pairs.gz │   ├── [Even2Bo11][Odd2Bo1][DPM6bot75].pairs.gz ├── test.bam ├── test_logging.log └── trimmed.fastp.json **scSPRITE_test_tmp** is the root directory of the output, where ‘test’ is the name of the sample. **Result** is directory the main result saved in it. **cool_folder** is the directory stores all cells’ cool files before and after **KR** correction. **mcool _folder** is the ****directory stores all cells’ mcool files. **SCpair** is the directory stores all cells’ pair files. **test.bam** is the bam file of the sequencing data. **test_logging.log** records all the parameters of processing the data by STARK, as well as the time spent on each step. **trimmed.fastp.json** records the result of fastp processing the data.