GRN Inference

GRN inference using integrated methods

To infer a GRN for a given dataset using an integrated method, first download the test datasets:

aws s3 sync s3://openproblems-data/resources_test/grn/grn_benchmark \
  resources_test/grn_benchmark --no-sign-request

Use resources_test/ paths for development and testing. Switch to resources/grn_benchmark/ for full-scale runs (requires downloading the main datasets — see Datasets).

With Docker (Viash)

viash run src/methods/pearson_corr/config.vsh.yaml -- \
  --rna resources_test/grn_benchmark/inference_data/op_rna.h5ad \
  --prediction output/net.h5ad \
  --tf_all resources_test/grn_benchmark/prior/tf_all.csv

Without Docker (conda or Singularity)

bash src/methods/pearson_corr/run_local.sh \
  --rna resources_test/grn_benchmark/inference_data/op_rna.h5ad \
  --tf_all resources_test/grn_benchmark/prior/tf_all.csv \
  --prediction output/net.h5ad

Replace pearson_corr with any method under src/methods/. Methods that require Singularity (e.g. GRNBoost2, scPRINT, Scenic/Scenic+, CellOracle) will invoke it automatically from their run_local.sh script.

GRN inference without method integration

Download the inference datasets from the Datasets page. For each dataset, perform GRN inference using your own algorithm. Consider the following:

We evaluate only the top TF-gene pairs, currently limited to 50,000 edges, ranked by their assigned weight.
The inferred network should follow this format:

Columns:
- source: Transcription factor (TF)
- target: Target gene
- weight: Regulatory importance/likelihood score

Since geneRNIB works with AnnData, your inferred network should be saved in this format. If your network is a pandas DataFrame with three columns (source, target, weight), you can save it as follows:

Python

import anndata as ad

net['weight'] = net['weight'].astype(str)  # weight must be stored as a string

output = ad.AnnData(
    X=None,
    uns={
        "method_id": "your_method_name",  # e.g. "grnboost2"
        "dataset_id": "dataset_name",     # e.g. "replogle"
        "prediction": net[["source", "target", "weight"]]
    }
)
output.write_h5ad("save_to_file.h5ad")

R

library(zellkonverter)

net$weight <- as.character(net$weight)

output <- AnnData(
    X = matrix(nrow = 0, ncol = 0),
    uns = list(
        method_id = "your_method_name",  # e.g. "grnboost2"
        dataset_id = "dataset_name",     # e.g. "replogle"
        prediction = net[, c("source", "target", "weight")]
    )
)
output$write_h5ad("save_to_file.h5ad", compression = "gzip")

Once you have inferred GRNs for one or more datasets, proceed to the GRN evaluation page to run the evaluation metrics.