preprocess the sc3DG

Typical Workflow

1. scHi-C

Test data: /tutorial/scHi-C/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/scHic_no \
 -t scHic \
 -e  MboI \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60

Parameter Description:

-o: Location to save the result, note that all paths must be absolute paths.
-f: Directory where the sequencing data is located.
-t: Type of single-cell Hi-C.
-e: Type of restriction enzyme used, must be consistent with the experiment.
-r: Resolution used to convert pairs files into cool files.
-i: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index.
-a: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index.

2. scHi-C+

Test data: /tutorial/scHic_index/data

It should be noted that the number of fastq files generated by this technique should be 4, where _1 and _4 are reads, and _2 and _3 are the corresponding barcode files.

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/scHic_index \
 -t scHic \
 -e  MboI \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60 \
 --exist-barcode

Note that the only difference with scHi-C is that you need to specify that it has barcodes with the parameter –exist-barcode.

3. Dip-C

Test data: /tutorial/Dip-C/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/dipC \
 -t dipC \
 -e  MboI \
 -i /absolute/path/to/hg38/hg38.fa \
 --thread 60

Similar with scHi-C, STARK will automatically process the following steps according to the parameter -t.

4. HiRES

Test data: /tutorial/HRIES/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/HIRES \
 -t HIRES \
 -e  MboI \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60

Similar with scHi-C, STARK will automatically process the follow-up according to the parameter -t.

5. sn-m3C

Test data: /tutorial/sn-m3C/data

stark count -o/absolute/path/to/result \
    -f /absolute/path/to/data/sn_m3c \
    -t sn_m3c \
    -e MboI \
    -r 10000 \
    -i /absolute/path/to/bowtie2/hg38/hg38.fa \
    --aligner bowtie2 \
    --thread 60

Since the sn-m3C sequencing simultaneously methylation and Hi-C, only bismark can be used for assembly. Correspondingly, there is no multiple choice for -i and -a. However, bismark is based on bowtie2, you can write the parameters as bowtie2.

6. scSPRITE

Test data: /tutorial/scSPRITE/data

stark count -o /absolute/path/to/result \
     -f /absolute/path/to/data/scSPRITE \
     -t scSPRITE \
     -e HpyCH4V \
     -i /absolute/path/to/mm10/mm10.fa \
     --thread 60 \
     --repeat-masker /absolute/path/to/mm10_rmsk.bed \
     --exist-barcode

Parameter Description:

–sprite-config: A txt file for scSPRITE generating barcode.
–repeat-masker: A bed file for genome masking of repetitive regions.

7. sciHi-C

Test data: /tutorial/sciHi-C/data. It should be noted that not only two sequencing fastq files are needed in each sequencing file directory, but also txt files corresponding to inner_barcode.txt and outer_barcode.txt are needed. The format is as shown in the example.

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/sciHic \
 -t sciHic \
 -e  DpnII \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60 \
 --exist-barcode

Similar with scHic_index, STARK will automatically carry out subsequent processing according to the parameter -t.

8. snHi-C

Test data: /tutorial/snHi-C/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/sciHic \
 -t sciHic \
 -e  DpnII \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60 \
 --exist-barcode

Similar with scHi-C, STARK will automatically process the follow-up according to the parameter -t.

9. snHi-C+

Test data: /tutorial/snHi-C+/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/snHic_index \
 -t snHic \
 -e  MboI \
 -i /absolute/path/to/hg38/hg38.fa \
 --thread 60 \
 --exist-barcode

Similar with scHi-C, STARK will automatically process the following steps according to the parameter -t.

10. scNanoHi-C

Test data: /tutorial/scNanoHi-C/data

It is worth noting that scNanoHi-C uses third-generation sequencing, and the sequencing data directory should contain the fastq file, TN5.txt, PCR.txt, and index.txt.

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/scNano/data \
 -t scNano \
 -e  MboI \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60 \
 --exist-barcode

scNanoHi-C uses third-generation sequencing technology and uses minimap2 by default. So the -a parameter will become ineffective.

11. scMethyl

Test data: /tutorial/scNanoHi-C/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/scMeth \
 -t scMethyl \
 -e  DpnII \
 -i //absolute/path/to/bowtie2/mm10/mm10 \
 --aligner bowtie2 \
 --thread 60 \
 --exist-barcode

12. LiMAC

Test data: /tutorial/LiMAC/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/LiMAC \
 -t LiMAC \
 -e  MboI \
 -i /absolute/path/to/hg38/hg38.fa \
 --thread 60

Parameter Description:

-o: Location to save the result, note that all paths must be absolute paths.
-f: Directory where the sequencing data is located.
-t: Type of single-cell Hi-C.
-e: Type of restriction enzyme used, must be consistent with the experiment.
-r: Resolution used to convert pairs files into cool files.
-i: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index.
-a: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index.

13. GAGE-seq

Test data: /tutorial/GAGE-seq/data

stark count -o /absolute/path/to/result \
 -f /absolute/path/to/data/GAGE-seq \
 -t GAGE-seq \
 -e  MboI \
 -i /absolute/path/to/mm10/mm10.fa \
 --thread 60 \
 --exist-barcode

Parameter Description: - -o: Location to save the result, note that all paths must be absolute paths. - -f: Directory where the sequencing data is located. - -t: Type of single-cell Hi-C. - -e: Type of restriction enzyme used, must be consistent with the experiment. - -r: Resolution used to convert pairs files into cool files. - -i: Directory of the genome file used for alignment, the final hg38.fa is the type of genome, not part of the directory, consistent with the -g parameter of STARK index. - -a: The software used for assembly, optional bwa, bowtie2, bismark, minimap2. Here it should be consistent with the index produced by STARK index.

14. Droplet Hi-C

Test data: /tutorial/Droplet/data

Please make sure you have installed bowtie before running

conda install bioconda::bowtie==1.3.1

after that, you need to build the bowtie index for the 10x barcode reference.

bowtie-build /path/to/10x/barcode/reference/ref.fa /path/to/10x/bowtie/index

Then you can run the following command to process the droplet data.

stark count -t droplet \
        --ref-10x /path/to/bowtie/index \
        -f /absolute/path/to/data/droplet \
        -i /cluster2/home/Kangwen/common/mm10/mm10.fa \
        -e MboI \
        -o /absolute/path/to/mm10/mm10.fa \
        --exist-barcode \
        --thread 32

Parameter Description:

-o: Location to save the result, note that all paths must be absolute paths.
-f: Directory where the sequencing data is located.
-t: Type of single-cell Hi-C.
–ref-10x: Directory of the bowtie index for 10x barcode reference.
-i: bwa index of the genome file used for alignment

15. Paired

Test data: /tutorial/Paired/data

Please make sure you have installed bowtie before running

conda install bioconda::bowtie==1.3.1

after that, you need to build the bowtie index for the 10x barcode reference.

bowtie-build /path/to/10x/barcode/reference/ref.fa /path/to/10x/bowtie/index

Then you can run the following command to process the droplet data.

stark count -t Paired \
        --ref-10x /path/to/bowtie/index \
        -f /absolute/path/to/data/droplet \
        -i /cluster2/home/Kangwen/common/mm10/mm10.fa \
        -e MboI \
        -o /absolute/path/to/mm10/mm10.fa \
        --exist-barcode \
        --thread 32

Parameter Description:

-o: Location to save the result, note that all paths must be absolute paths.
-f: Directory where the sequencing data is located.
-t: Type of single-cell Hi-C.
–ref-10x: Directory of the bowtie index for 10x barcode reference.
-i: bwa index of the genome file used for alignment

The illumination of the Result

Here is the introduction to the results:

scSPRITE_test_tmp/
    ├── Result
    │   ├── cool_folder
    │       ├── [Even2Bo10][Odd2Bo69][DPM6bot1]_10000.cool
    │       ├── [Even2Bo10][Odd2Bo69][DPM6bot1]10000.KR.cool
    │       ├── [Even2Bo11][Odd2Bo19][DPM6bot31]_10000.cool
    │       ├── [Even2Bo11][Odd2Bo19][DPM6bot31]10000.KR.cool
    │       ├── [Even2Bo11][Odd2Bo1][DPM6bot75]_10000.cool
    │       ├── [Even2Bo11][Odd2Bo1][DPM6bot75]10000.KR.cool
    │   ├── mcool_folder
    │   │   ├── [Even2Bo10][Odd2Bo69][DPM6bot1].mcool
    │   │   ├── [Even2Bo11][Odd2Bo19][DPM6bot31].mcool
    │   │   ├── [Even2Bo11][Odd2Bo1][DPM6bot75].mcool
    │   └── SCpair
    │       ├── [Even2Bo10][Odd2Bo69][DPM6bot1].pairs.gz
    │       ├── [Even2Bo11][Odd2Bo19][DPM6bot31].pairs.gz
    │       ├── [Even2Bo11][Odd2Bo1][DPM6bot75].pairs.gz
    ├── test.bam
    ├── test_logging.log
    └── trimmed.fastp.json

scSPRITE_test_tmp is the root directory of the output, where ‘test’ is the name of the sample.

Result is directory the main result saved in it.

cool_folder is the directory stores all cells’ cool files before and after KR correction.

mcool _folder is the ****directory stores all cells’ mcool files.

SCpair is the directory stores all cells’ pair files.

test.bam is the bam file of the sequencing data.

test_logging.log records all the parameters of processing the data by STARK, as well as the time spent on each step.

trimmed.fastp.json records the result of fastp processing the data.

preprocess the sc3DG

Navigation

Typical Workflow

1. scHi-C

2. scHi-C+

3. Dip-C

4. HiRES

5. sn-m3C

6. scSPRITE

7. sciHi-C

8. snHi-C

9. snHi-C+

10. scNanoHi-C

11. scMethyl

12. LiMAC

13. GAGE-seq

14. Droplet Hi-C

15. Paired

The illumination of the Result