Bioinformatics: biological and medical sciences in the age of big data

What is bioinformatics?

Bioinformatics is a term which we will use as an umbrella to encompass work that may also be referred to as computational biology or biomedical informatics. It is an interdisciplinary field of science that combines biology, computer science, mathematics, statistics and engineering (including their subfields such as control theory, information theory, thermodynamics, machine learning and artificial intelligence) to analyse and interpret biological and medical data. Indeed, both molecular and phenotypic data has become available on an unprecedented scale from genomics, proteomics, functional genomics, high-throughput cell screening, metabolomics, and imaging platforms (amongst other). 

This data enabled researcher to study biological systems at both higher specificity (enabling us to look simultaneously at single molecule, molecular species or cells) and breadth (enabling us to look at entire proteomes, genomes, metabolomes, transcriptomes in a single assay). This data has further been combined with higher level medical data such as .g. the presence of a disease or subtype of disease, data about response to treatment, relapse, comorbidity, medical history to develop a better understanding of the aetiology of diseases, and to tailor therapies to patient’s molecular profiles.

Exponential growth

The field was arguably born in the mid-nineties, and has grown exponentially since, underlined by parallel major technological advance in two areas: biological data collection technologies and computer science / information technology.

In the area of computer science, major improvements in our abilities to store, process and share data via improvements in CPUs, disk storage, the development of the internet and more recently cloud computing, have revolutionised just about every aspect of human life in the last three decades. 

In the area of biological data collection technologies, perhaps the first paradigm shift that contributed to the birth of the bioinformatics field relates to nucleic acid sequencing technologies. Whereas in the 80’s-90’s Sanger sequencing allowed the sequencing of a single short fragment (up to a few hundred bases) at a time in a lengthy process, today high-throughput sequencing (born in the mid-nineties, also referred to as ‘next generation sequencing’ - NGS) allows the sequencing of an entire genome in a few hours and for an ever decreasing cost.  

Today, most types of biological or medical data that can be collected have a high-throughput and/or high content equivalent. High-throughput assays (enabling the rapid collection of data from a large number of samples) and high-content assays (enabling the rapid collection of data about a large number of features from each sample) have dramatically changed the nature and sheer amount of data that comes out of biological and biomedical assays, and opened new possibilities as detailed in my previous blog.

In order to support, organise and share the vast amount of data that has been generated through the above-mentioned technologies, databases and data resources have been created which are pivotal to bioinformatics, and hence to modern biological and medical sciences. These include well-known resources such as GenBank (a genetic sequence database comprising nucleotide sequences for over 300k organisms), Uniprot (a database of protein sequence and function comprising 500k+ manually curated entries and over 180M automatically annotated entries), KEGG (a database of pathways, functions and utilities), Ensembl (a vertebrate genome browser for 200+ species), GEO (a gene expression data repository), Cosmic (a database of somatic mutations in cancer), amongst many others. Tremendous and invaluable effort is spent on developing tools to create, maintain, share and analyse these data resources, many of which are at least partially publicly funded, such as e.g. by the NIH or EMBL-EBI, and freely accessible. 

How has bioinformatics changed the biological and medical science landscape?

This new approach to biomedical/biological sciences, with large amounts of data at its core has had a large impact from health services to R&D, where applications / projects that involve at least one bioinformatics element are becoming increasingly common - if not the norm rather than the exception.  For example, in the pharmaceutical industry, bioinformatics tools and resources are used to identify drug targets, to perform drug repurposing, to identify drug candidates, to stratify patients, to study the effect of compounds in model systems, to perform simulations that can reduce the experimental burden associated with drug development, optimisation and testing, etc.   In health services, bioinformatics is used to analyse genetic tests, to process and visualise medical images, to build improved medical devices such as heart rate monitors, intelligent insulin pumps, etc. Further, bioinformatics is also central to the fields of personalised and precision medicine, which impact both the pharma side and the healthcare provider side.

As a result, companies that have bioinformatics as a central element of their R&D or even product are getting more and more numerous. These includes giants such as Illumina, Roche (see e.g. Roche’s Ariosa), Google (via Google Health), AstraZeneca and smaller players such as Sano Genetics, Inivata, Eagle Genomics, Genomics plc, Cambridge Cancer Genomics, Seven Bridges Genomics, amongst many others. Further, the recognition that data sharing, openness and collaborations are essential to realise the full potential of the field has led to an increasing prevalence of academic-industrial partnerships (which were previously uncommon in the pharmaceuticals field) such as e.g. the Open Targets initiative, and to a push towards open science.

What does this mean for the protection of intellectual property?

Bioinformatics projects will often result in the creation of various types of potentially valuable assets including data (whether raw or organised as a database), computer implemented methods / tools and new insights generated by the methods (e.g. biomarkers, drug candidate, patient stratification criteria, etc.) These assets can be protected through a combination of rights including copyrights, design rights, database rights, trade secrets and patents.

The choice of a protection strategy can seem complicated, and many aspects have to be considered such as:

  • what can legally be protected (which may be jurisdiction-dependent), what is practical to protect (e.g. trade secrets may not be appropriate where disclosure is necessary e.g. for regulatory purposes, or inherent to the product)
  • what is desirable/necessary to be able to publish and publicise
  • what should be protected in view of the long term strategy of the company and planned future developments of the product (which can be particularly challenging for example for machine learning-based products where many aspects of the product are expected to change quickly, or even where constant learning and evolution is intrinsic to the product).

Patents can be particularly well-suited to a strategy that balances commercial success through protection of the investment made in R&D, and a commitment to openness. However, perceived (often unjustified) difficulty in obtaining valuable protection for computer-implemented inventions (arguably the most common type of bioinformatics invention) has often discouraged applicants.

To the contrary, the current European approach to the patentability of computer implemented inventions in the field of bioinformatics and medical informatics can often be leveraged efficiently to obtain valuable method claims.  In a commercial context where, for an increasing number of companies, bioinformatics analysis pipelines or platforms are the commercially valuable asset and the main product, rather than the means towards a more classical ‘physical product’, failing to consider options for protection of these assets can be a very costly missed opportunity.

However, efficiently protecting this type of inventions typically requires: (a) a good understanding of the combined challenges associated with the protection of computer implemented inventions as well as diagnostic/therapeutic methods, and (b) a good understanding of both the biological/medical aspects and the computer science/mathematics aspects of the invention such that both can be appropriately described.  Considering that the field is relatively young, still in its growth phase and by its very nature multidisciplinary, this combined expertise can be hard to find and time consuming to acquire. At Mewburn Ellis, we understand that bioinformatics does not fit the traditional model whereby inventions belong either to the life sciences or to the software/engineering fields. We have built a bioinformatics team with specific expertise in this field, that builds upon and expands on our skills in the multiple fields that underlie bioinformatics inventions. 

What lies ahead in the field?

In our opinion, the future potential in the field, as is the case in similarly data-driven fields, will lie in what can be learnt by combining different types and sources of data. Artificial intelligence, including machine learning and deep learning will be key to this, as will be tools that enable efficient organisation and sharing of data. The availability of multiple ever growing data sources and ability to combine these will open new possibilities for machine learning-driven protein engineering, drug design, precision medicine, diagnostic systems, etc.

This will be accompanied by an increase in the amount of solutions that automatically generate predictions, simulations and recommendations, in fields where we are perhaps more used to trusting a human judgement, such as e.g. medicine. This comes with enormous potential benefits in terms of, at least, access to healthcare, diagnostic and prognostic accuracy and speed, improved therapies and treatment management. However, this will also pose new challenges associated with the regulation of such algorithmic tools, the assessment of liability (e.g. where an erroneous recommendation is made by a machine as opposed to a medical practitioner), and data privacy and availability.

Conclusion

Bioinformatics, the use of computer science, mathematics and statistics to analyse vast amounts of biological and medical data, is arguably the natural adaptation of the biological and medical sciences to the age of big data. It is now an integral part of how R&D in the field of biology and medicine is done, whether in academia, industry or private-public partnerships. This part is only going to grow as the amount of data available increases and so does our ability to learn from, analyse, share and integrate this data. Business, products and services that rely wholly or partially on bioinformatics tools have started to flourish and their numbers can only be expected to grow. The interdisciplinary and fast evolving nature of this field has enormous potential to improve human life, but also begs many questions in relation to (at least) IP strategy, data privacy, regulation and liability. Join us for a deep dive into these topics over the next few weeks…