Prioritizing target genes from large clinical datasets using Watson Drug Discovery (WDD)

Associate Professor Mark Fear1, Dr Andrew Stevenson1, Professor  Fiona  Wood1

1Burn Injury Research Unit, University Of Western Australia, Crawley, Australia


Decreased costs have led to the widespread use of ‘omics’ approaches to identify mutations and genes important in pathobiology(Ioannidis and Khoury, 2011). Often large datasets are generated from small sample sizes, limiting the ability to interpret these data to effectively identify genes of interest (Sung et al., 2012). Developing other methods to identify interesting genes for testing are needed.

WDD is a cloud-based platform using machine learning and natural language processing on heterogeneous content, medical journal articles, patents, and ontologies. Limited patient numbers in a recently conducted genome wide association study (GWAS) on scarring after injury led to large numbers of target genes with no rationale for restricting the list from statistical analysis alone. We leveraged the cognitive capabilities of WDD to rank genes with likely important roles in fibrosis.

WDD ranked 600 candidate genes identified associated with increased height and decreased pliability of the scars. The ranking was based on a semantic similarity analysis of these candidates to ~30 known fibrosis-related genes. The WDD algorithm created a distance matrix comparing every gene to each other, based on the frequency and relevance of words and phrases used in documents. The distance matrix was then used by a graph diffusion algorithm to score and rank every gene by similarity to the overall set of known fibrosis-related genes.

Potential genes of interest not readily identified through traditional methods were found to rank highly. These genes were cross-validated by graph network analysis of associated genes and pathways related to these genes to confirm relevance to fibrosis.

Ioannidis JP and Khoury MJ. (2011) Improving validation practices in “omics” research. Science 334: 1230-1232.
Sung J, Wang Y, Chandrasekaran S, et al. (2012) Molecular signatures from omics data: from chaos to consensus. Biotechnology journal 7: 946-957.


A/Prof Fear received his PhD in 2003 from the University of London. Subsequently A/Prof Fear worked with a biotechnology company in Perth, Australia until rejoining academia and working with CIA in burn injury and scarring since 2007