InnoHealth Datalake Project


Nowadays, the big and fast increasing healthcare data their ever-growing exploitation potential prove the importance of data in modern healthcare. A six-fold increase in total data volume (from 153 Exabyte to 985 Exabyte; 1 Exabyte = 1 billion gigabytes) is expected in a good half-decade (2013-2020). Parallel with this development, similar trends have emerged in all fields of the healthcare sector and biomedicine as well. Impressive valuable results in research and innovation clearly demonstrate that the potential of data analysis and utilization goes far beyond the optimization of patient care and health services. In this context, it is also worth mentioning that the pathomechanism of most diseases is complex, i.e. it depends on a variety of factors (cf. ‘Genetics loads the gun and environment pulls the trigger’, F. Collins, 2010), which also underlines the importance of data in the understanding diseases and the development of new therapies. Exploiting all these opportunities, information technology, data science and technology need to meet new challenges from data collection through the development of new types of analytical tools and methods. In the past few years, health policy in Hungary has recognized all these trends and made progressive steps to improve the quality and cost-effectiveness of patient care and health care services through health data utilization. Of the actions, the establishment of the National Electronic Health Service Space (‘EESZT’; can be mentioned as an excellent example. Although, this experience shows that this path should be further pursued in the interest of modern patient care and prevention, utilization of data in R&D&I areas of domestic healthcare and health industry is also very important and forms additional important conceptual and practical tasks.

Project Participants and Goals

The above tasks form the key focus areas of our four-year project that is implemented by a consortium of E-Group ICT Software Zrt. and the University of Pécs (UP). The Project is registered in the National Grant System as Network analytical and data utilization opportunities in healthcare (project code: GINOP-2.2.1-15-2017-00067; short name: InnoHealth DataLake, IHDL).

The main aim of the project is to develop and implement the concept of a novel complex IT system (’Datalake’) capable of collecting, storing and analyzing all types of health data (which are generated in healthcare activities and services at UP, together with all relevant external data). It also serves as a prototype for domestic and regional (e.g. V4) healthcare systems. One of the most important tasks of the Datalake is to support innovation.

The Datalake’s principal capabilities were identified according to today’s and future needs of healthcare, which include the following operations: i) data collection (independent of size, type and source of data), ii) data storage and iii) data analyses (by state of the art analytical methods and tools) that support healthcare services and R&D&I activities.

Implementation of the Project

Three thoroughly interacting pillars were created to develop the Datalake concept and put it into practice: 1. Information technology (IT) pillar, 2. Medical and health sciences pillar, and 3. Legal pillar for data management, which develops and monitors compliance with GDPR. This project structure ensures interactions among each field and each type of activities, while providing a common platform for IT experts, engineers, clinicians, health scientists and lawyers working on the project. Specific aspects of the IT implementation are carefully designed based on published and own medical and health subprojects.

Because of diversity, collecting a wide variety of data is a difficult task. As model studies for the data collection step, subprojects utilizing various data types (including retrospective health records and sensor-collected prospective data) are organized, while providing synergism. However, the subprojects have roles beyond this. They themselves demonstrate the many-sided utilization potential of the data and Datalake system by opening new ways to improve quality and cost-efficiency of health care; and achieving new innovative diagnostic and/or therapeutic results. The following medical and health science subprojects are in progress.

Significance of the Project

Undoubtedly, each specific project outcome (information technology, medical-health services/sciences and legal results) is significant in itself. However, the full realization of the project goes well beyond its limits. The Datalake system opens the door to new opportunities of data utilization in many fields of the domestic healthcare practice and innovation.

All those capabilities and opportunities reinforce the enormous national value and strategic importance of healthcare data. The data is a key player not only in the national healthcare but also, because of its innovation potential, in the competitiveness of the national economy.