Biobanks characteristics

Some considerations for a successful project collaboration

Biobanks are extremely diverse depending on the nature of the sample collections and associated data. Below are some factors important to consider when establishing a collaboration, supported by some interesting examples.


Cohort size

Biobanks that are of interest for industrial partnerships have the following features:
• Rare disease cohorts.
• Population isolates.
• High prevalence of consanguinity in society.
• Deep endophenotype profiles.
• Clinically extremely well characterized patient populations.
• Deep longitudinal electronic health information.


Biological samples

One feature that makes biobanks more valuable than registries is the availability of biological samples for research. Registries offer prospective observational data, but biological samples allow to search for predictive biomarkers, estimate causality and develop deeper mechanistic insights into disease pathophysiology.
• The majority of research biobanks simply store blood samples, while clinical biobanks keep almost every type of patient samples.
• The more types of biological material a biobank stores at large scale, the more differentiated from other collections it comes, and thus more attractive.
• Any of the following sample types are of interest: 1) whole blood; 2) saliva; 3) urine; 4) stool; 5) any tissue other than blood; 6) DNA extracted from whole blood; 7) plasma; 8) serum; 9) white blood fraction.


Molecular profiles

if the main focus areas are understanding disease pathophysiology, finding or developing biomarkers, and drug discovery biological data that has already been profiled at a molecular level is of interest.
• Molecular profiles offer direct readouts of biological processes and represent substantially more intellectual property information than merely questionnaires or health records.
• Only very few biobanks include molecular profiles other than genetic data available.
• Industry has a growing interest to involve molecular profiles in any R&D processes. In this regard, biobanks offer a possibility for industry-sponsored research models.
• Any of the following molecular profiles are of great interest: 1) whole genome sequencing data (WGS); 2) whole exome sequencing data (WES); 3) genotyping array-based genetic data; 4) transcriptomes e.g. gene expression; 5) proteomes; 6) metabolomes; 7) methylomes; 8) metagenomes e.g. microbiomes from human sample; and 9) lipidomes, among many others.


Source and depth of phenotype data

Most biobanks are cross-sectional, and all phenotype information available was collected during the recruitment process either through questionnaires, or objective measurements and experiments. In such settings most of the lifestyle and disease data are self-reported and thus less valuable for R&D.
• Biobanks which regularly perform physical evaluation visits to recruitment centers or can retrieve lifestyle and health status updates from electronic databases offer a more comprehensive overview of participants’ life trajectories, which enables insights into more sophisticated phenomenon or behavioral patterns. Another option to gather longitudinal data is to send questionnaires regularly to participants. This approach however has several limitations including a low response rate, and incompleteness e.g. participants answer the questions selectively. At the same time, if the participant pool is large, such as 23&Me ( with more than 8 million clients, electronic surveys is the only feasible model, and even a very low response rate (around 1%) will provide close to 100,000 unique observations.
• Industry values longitudinal data over cross-sectional and electronic health records, or readable readouts over self-reported questionnaires. But here shallow phenotype data will be compensated with large sample sizes; two good examples here are 23&Me with 8 million customers, and UK Biobank with 500,000 self-reported disease status (of note, many sub-cohorts have gone through very sophisticated cognitive test, imaging screens, or have a wider set of biological samples available).


Legislation and policies for data sharing

It is estimated that there are more than 50 million biological samples banked in Europe alone. At the same time, only a small fraction of the samples have proper consented for research, and an even smaller fraction has available longitudinal health data or possibility to recall.
Often, participants have given consent for only a specific study, or for non-commercial use. Thus the value of such collections is limited and not very attractive to industry. Although, if research and analytical steps are performed by an academic partner and only results shared, some consent limitations no longer hold.
• Of similar importance are data access, storage abroad, and sharing regulations. In some countries, like China and Russia, getting biological samples outside the country is very complicated. In others, if appropriate IRB approvals are in place, no particular limitations apply. In the case of Estonian Biobank, authorization must be applied for from the government or University Senate. Use of digital data is much less regulated, but for countries such as Denmark, Iceland and UK (in case of Genomics England) all data processing must be carried out on local servers, and use of cloud services is not allowed.
• In general cloud providers are accepted by European biobanks, but the cloud warehouses have to be located physically in Europe. For some projects industry partners prefer to interact directly with the raw data. In such case, if data cannot leave biobank servers, secure and scalable IT solutions need to be provided by the biobank in order to engage with certain types of projects.


Scientific track record of the data custodians

Biobanks that are actively used by the research community (such as UK Biobank) stand out both by the number of commercial partnerships, and total amount invested by industry. UK Biobank is widely used by the genetic epidemiology community, and has attracted more than 500M£ over the past 10 years. When data custodians themselves have an outstanding scientific track record, the biobank is much more likely to attract commercial partnerships.
• Scientists at deCODE Genetics have published an unprecedented number of high impact scientific reports and have attracted more than 1 billion USD over the past decade which has helped to transform deCODE Genetics into a global leader in human genetics.
• Actively researched biobank data is more structured and the quality is usually substantially higher due to more extensive harmonisation, validation and quality control processes.
• Furthermore, if access to biobank data is limited for third parties, biobank has to have a competent team of specialists (including statistics, bioinformatics, genetic epidemiology, informatics and medical experts) to process and analyze the data according to the industry needs. While having such a team on the payroll will substantially increase the biobanks financial costs. However in the long term it will lead to an increase in both scientific publications and numbers of commercial contracts.

The "Registers and Biobanks in Transition” strategic initiative is funded by EIT Health