Data Intensive Research

Complex and large amounts of data have quickly become a pervasive aspect of our world, and the research endeavour is no exception. The increasing need for data-intensive research has propelled many scientists to re-think the ways in which their research is carried out.   

The increasingly popular term Big Data refers to massive volumes of “both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques”1. As advances in digital hardware and software technologies continue at an accelerating pace, researchers nowadays have to get to grips with an unprecedented amount of data. This has renewed interest in the field of applied Artificial Intelligence, whereby algorithms can to some extent be trained to mimic human behaviour and intuition in tackling sophisticated tasks. One of the key challenges for universities in this era of the 4th industrial revolution is therefore how to equip researchers and students with the skills required to make the most of such Big Data.

In this context, the new inter-disciplinary research field of Data Science uses scientific methods, processes, algorithms and systems to extract knowledge and insights from all forms of data, and Data-Intensive Research (DIR) can be characterized by the extraction of knowledge from “the huge amounts of data produced through experiments and high-throughput technologies…and disseminated through cyberinfrastructures”2.

As one of South Africa’s leading universities, with a network of international partners, UWC has several instances of data-intensive research, including the following:

The UWC Meerkat Cluster managed by the eResearch Office supports data-intensive research for UWC faculty, research staff, postdocs and postgraduate students. Meerkat consists of about 1000 cores and 5 TB of RAM and relies on the Slurm Workload Manager to assign resources to different users in a dynamic fashion, thus enabling their optimal use.

Ilifu (the isiXhosa word for cloud) was created in 2018 when a consortium of universities and research organisations in South Africa established a data-intensive research cloud, the ilifu facility, and invited researchers in astronomy and bioinformatics to start using the infrastructure. Now, ilifu is a regional node in the national infrastructure that brings together six partner institutions to create a hub for data intensive research in astronomy and bioinformatics.

IDIA (the Inter-university Institute for Data-Intensive Astronomy) aims to build capacity and expertise in data-intensive research within the South African astronomy research community and to support local researchers making the most of the MeerKAT radio telescope and eventually the Square Kilometre Array (SKA). It is a partnership of three South African universities, (UWC, the University of Cape Town, and the University of Pretoria).

SANBI (the South African National Bioinformatics Institute) is situated at UWC, and is a leading bioinformatics entity in Africa. It fosters local and regional collaborations on health-related topics that cover both communicable and non-communicable diseases.

1 What is Big Data?
2Data-Intensive Research