Research projects and collaborations

Agent frameworks in data science programming education

Summary: This project explores agent frameworks and open language models in the context of quantitative courses, especially those with elements of programming and data science. The main goals comprise:

a shared repository of curated datasets, reproducible and customisable teaching and learning hands-on projects based on multimodal content and open models, and reference papers and tutorials.
liaise with colleagues teaching data science and quantitative courses at Statistics and other departments to help them in designing AI-assisted teaching materials whenever there is interest.
work with other fellows on effective pedagogical approaches to “teach with AI” and share experiences and resources.

Project details:

Period: 2025-2026
Principal investigator: Dr. Marcos Barreto (LSE).
Funding: LSE & LinkedIn - LSE AI and Education Fellowships

What promotes experiental learning?

Summary: This project aims to identify skills gaps in the MSc programmes offered by the Department of Statistics compared to those skills required by employers, and necessary changes to the curricula of such programmes. The methodology involves:

surveying alumni of MSc programmes from 2018-2025
present and discuss findings across the School and especially to faculty and programme directors
engage with faculty and students to implement key changes addressing such gaps

Project details:

Period: March to June 2025
Principal investigator: Dr. Marcos Barreto (LSE). Students: Chandrika Kompalli and Iheb Bouzaiane. Supervison: Dr Madeleine Stevens (LSE)
Funding: LSE & LSESU - Change Makers 2025 (Poster)

Capacity Building on Generative AI Tools for Programming Education

Summary: This project aims at:

deepen research on pedagogical approaches related to programming education
build a community of practice around the use of generative artificial intelligence (AI) tools applied to programming education in quantitative courses

With the continuous and accelerated development of such tools and their wide use by students, educators have been challenged to acquire new skills and adapt their practices to keep students engaged and courses technically relevant. This project will allow for a more in-depth study on generative AI tools used as auxiliary resources in programming courses, and establish a community of practice comprising teaching staff and students focusing on capacity building on generative AI tools, design of teaching materials supported by a mixture of pedagogical methods, and the exchange of ideas, case studies, best practices, and mutual development.

Project details:

Period: March 2025 - June 2026
Principal investigator: Dr. Marcos Barreto (LSE). Students: Qurat-ul-ain Gul.
Funding: The British Academy - Talent Development Awards 2024-2025

Project outputs:

AI in Education Exchange: The AI and Education Exchange is a new online repository featuring GenAI case studies in teaching, learning, assessment, and curriculum development spanning both qualitative and quantitative disciplines in the social sciences. This platform will serve as an active space for colleagues to share experiences and learn from each other's evolving practice with generative AI in education. Information and contact here.
Ai in Education Showcase: The AI in Education Showcase will host monthly sessions during Autumn, Winter and Spring terms of the 2025/26 academic year. These 45-minute interactive meet-ups provide an opportunity for colleagues to present their innovative use of AI in teaching, learning, assessment, or curriculum design and development in both qualitative and quantitative disciplines in the social sciences.

Harnessing Artificial Intelligence and Data Linkage to Improve Outcomes for Justice-involved Young People

Summary: This project will look at how artificial intelligence can help solve problems in research on justice involvement and health, respectfully and ethically, with a focus on using “linked” data from databases of people who come into contact with the criminal justice system in Australia, the UK, and Brazil. We have already identified some possibilities of how AI can improve the body of evidence on health outcomes for young people charged or sentenced to community orders or youth detention. For example, there is so much information on family court data (and, sadly, coronial reports) on the circumstances that could have potentially avoided ill-health, self-harm, and even suicide for justice-involved people. We want to ensure AI is there to help and not to exacerbate further the disadvantages faced by some of the most vulnerable young people in our society.

Project details:

Period: 2024-2025
Principal investigators: Dr. Marcos Barreto (LSE), Dr. Lucas Calais-Ferreira (University of Melbourne)
Funding: University of Melbourne - Dyason Fellowships

Understanding Pedagogical Approaches for Large Language Models (LLMs) in Programming Education

Summary: This project explores how LLMs can be used as a programming aid in quantitative courses. The main goals comprise:

Identify current approaches regarding the use of closed LLM-based tools, such as ChatGPT, GitHub Copilot, and Bard, in programming tasks, based on literature review and educational case studies.
Assess the feasibility and efficacy of incipient open LLMs, such as StarCoder, as candidate tools for the design of teaching materials related to programming education.
Test different LLMs capabilities, such as a) text to code, b) code to code, and c) code to text.
Design teaching resources and encourage discussion around practices involving AI-assisted coding tools.

Project details:

Period: 2024-2025
Funding: LSE Eden Centre - Eden Development Fellowships

GENIAL - GENerative AI Tools as a Catalyst for Learning

Summary: A collaborative focus group on generative AI tools and their use for teaching and learning. Joint research and development project with Dr. Jonathan Cardoso-Silva (DSI).

Project details:

Period: 2023-2024
Funding: LSE Eden Centre - Scholarship of Teaching and Learning/Catalyst Fund
Useful links: GENIAL Website

Decolonising data science teaching and learning

Summary: Does the way data science is taught adequate and inclusive, or is everyone forced to follow the same approach? Do the datasets and case studies used for teaching and evaluation be representative of diverse contexts and historical perspectives, or do they represent a biased vision of such contexts? Do data science assessment activities effectively measure students' critical thinking and innovative skills, or do they simply assess their technical ability to memorize and repeat the same solutions? This education-focussed project concentrates on two aspects of data science in ligth of decolonising: teaching and assessment. We will investigate these and other questions, and produce some guidelines for incorporating decolonising concepts into teaching and assessment resources related to data science courses.

Project details:

Period: 2022
Funding: IEAP Fellowships (LSE Eden Centre)
Useful links: LSE Inclusive Education Action Plan Fellowships

Evaluating effects of social inequalities on the COVID-19 pandemic in a low- and middle-income country

Summary: This project aims at to create a Social Disparities Index (SDI) to measure inequalities relevant to the COVID-19 pandemic, such as unequal access to healthcare and regions more vulnerable to infection. In Brazil, markers of inequality are associated with COVID-19 morbidity and mortality. IDS will capture these markers from COVID-19 surveillance data and build a public visualisation dashboard to share the index and patterns of COVID-19 incidence and mortality with the broader community. This will enable health managers and policymakers to monitor the pandemic situation in the most vulnerable populations and target social and health interventions.

Project details:

Period: 2021 - 2022

Team: CIDACS, UFBA, LSE, London School of Hygiene and Tropical Medicine

Funding: International COVID-19 Data Alliance (ICODA - HDR UK)

Useful links: HDR UK

AI as a service for tackling COVID-19 in Brazil

Summary: This project aims to establish a cloud-based AI platform to support research and inform decisions related to Covid-19 in Brazil. The emphasis will be on the activities conducted by Rede CoVida, a Brazilian network of around 180 academics, policymakers, health workers, and the general public established in March/2020 to i) monitor the spread of the disease in Brazil, ii) design multi-purpose, real time prediction models, and iii) synthesize and disseminate scientific evidence. We will focus on the following research goals: i) design of a large-scale data lake and integration platform; ii) design and validation of mixed AI models for prediction and decision-making support; and iii) design of an interactive bibliometrics platform focusing on synthesis of evidence and correlations within the increasing literature related to Covid-19.

Project details:

Period: 2020 - 2021

Team: CIDACS, UFBA, LSE, London School of Hygiene and Tropical Medicine

Funding: AI for Social Good (GOOGLE)

Useful links: COVID-19 AI and Data Analytics Awards

Alert-early system of outbreaks with pandemic potential

Summary: This collaboration aims at to design a data-driven system for early-warning of respiratory viral disease outbreaks contributing to preparedness against epidemics.

Project details:

Period: 2021 - 2022

Team: CIDACS, UFRJ, UFBA, London School of Economics and Political Science, Northeastern University, Nanyang Technological Technological University, Pavia University, Meta (Facebook)

Funding: TBC

Useful links: AESOP Website , Presentation video

Risk of chronic clinical condition following previous hospitalisations by psychiatric disorder

Summary: This project aims at increasing knowledge over the relationship of mental disorders and other chronic conditions to ameliorate the lives of those affected. More specifically we want to i) estimate the the risk of hospitalisations or death by diabetes mellitus, cardiovascular diseases or stroke following a hospitalisation due to depressive disorders, alcohol and substance use-related disorders, and schizophrenia; ii) estimate the risk of the occurrence or death by tuberculosis following a hospitalisation due to depressive disorders, alcohol and substance use-related disorders, and schizophrenia; iii) investigate how these chronic conditions goes together in clusters and how these patterns evolve over time and ageing.

Project details:

Period: 2020 - 2022
Team: CIDACS, UFBA, University College London, London School of Hygiene and Tropical Medicine
Funding: Global Multimorbidity Seed Funding (UKRI)
Useful links: UKRI Global multimorbidity: seed funding 2019

Scaling up multimodal data fusion and analytical models over multiple-GPU systems

Summary: This project focuses on the exploitation of multi-GPUs systems to i) accelerate our probabilistic data fusion tool (AtyImo), more specifically preprocessing and data linkage methods, and ii) deploy and validate complex machine and deep learning models to analyze huge amounts of data built from Brazilian socioeconomic and public health care databases.

Project details:

Period: 2019 - 2021
Team: UFBA, SENAI-CIMATEC
Funding: Large-scale Applied Data Science (NVIDIA)
Useful links:

Design and validation of personalised risk prediction models over Brazilian health care data

Summary: This project aims at to i) define a set of diseases, at individual and municipality level, for which risk prediction models can effectively contribute to early detection and/or guidance of treatment; ii) establish proof-of-concept studies; iii) identify existing models adjustable to the Brazilian population; iv) perform deep experimentation of the proposed models; and v) generate a set of results to be validated by a panel of epidemiologists and statisticians, as well as governmental staff.

Project details:

Period: 2019 - 2021
Team: UFBA, University College London
Funding: Newton International Fellowship Follow-on Funding (The Royal Society)
Useful links:

Standardisation of wearable-based algorithms for healthcare applications in developing countries

Summary: This project aims at to develop a novel standardised framework to better inform algorithms for a more harmonised gait assessment in Parkinson's disease (PD), particularly for developing countries where guidance is lacking. This project will lead to the design of an online simulation tool to test algorithms. Additionally, it will outline an educational process for all clinicians to better understand the functionality of wearables/algorithms and resulting outcomes. This will better guide PD assessment for sustainable health, promoting and encouraging low-cost wearables as routine diagnostics in developing countries. This framework will also be adapted to the needs of those in developed regions.

Project details:

Period: 2018 - 2019
Team: Northumbria University, Insituto de Biociências de Rio Claro, University of Birmingham, University College London, UFBA
Funding: Frontiers of Engineering Seed Funding (Royal Academy of Engineering)
Useful links: RAEng current and recent awards

IMAPI - early childhood friendly municipal index

Summary: IMAPI was created to describe municipal contexts less or more favorable to early childhood development in Brazil and to support decision-making about early childhood. It has 31 indicators related to the provision of public policies, actions, and services, as well as family practices aimed at child development that reflect the five domains of the Nurturing Care Framework recommended by the World Health Organization, UNICEF and World Bank.

Project details:

Period: 2018 - 2020
Team: UnB, UFBA, Yale School of Public Health (USA), São Paulo Health Institute
Funding: Grand Challenges Explorations: Data science approaches to improve maternal and child health in Brazil (Gates Foundation), CNPq (Brazilian National Research Council)
Useful links: IMAPI Website

Integrating socioeconomic and health data to combat malaria

Summary: This project aims at to build a platform that routinely integrates data from malaria surveillance systems with healthcare data (incidence and hospitalization) and socioeconomic data (income and living conditions) captured from Brazilian governmental systems. An interactive visual mining dashboard will provide open access and support for data analysis, including forecast and multilayer visualisation models.

Project details:

Period: 2016 - 2019
Team: UFBA, Health Surveillance Foundation (Amazonas), Oswaldo Cruz Foundation (FIOCRUZ)
Funding: Design New Analytics Approaches for Malaria Elimination (Round 17) (Gates Foundation)
Useful links: Global Grand Challenges, Malaria database and visual analytics tool

Treating heterogeneity and uncertainty in data integration: case study on Brazilian databases

Summary: This project aims at to i) design and validation of a data integration model and related computing tools addressing heterogeneity, uncertainty and scalability targeted to big data integration; ii) support for some Brazil-UK ongoing projects: the 100 million cohort, the surveillance platform for Zika and microcephaly, and predictive analytics methods applied to malaria data (Post-doctoral research).

Project details:

Period: 2016 - 2018
Team: UFBA, University College London
Funding: Newton International Fellowships (The Royal Society, UK)
Useful links: Denaxas Lab

Design of a scientific repository (data lake) for big data applications

Summary: This project aims at to design and deploy a data repository (data lake) for big data applications. The first prototype comprises malaria surveillance data to support predictive analytics.

Project details:

Period: 2016 - 2021
Team: UFBA, Health Surveillance Foundation (Amazonas), Oswaldo Cruz Foundation (FIOCRUZ)
Funding: Bahia State Research Agency (FAPESB)
Useful links:

BAMBU - metropolitan network for trial and innovation on future internet

Summary: This project aims at to develop and implement an experimental metropolitan network for trial and innovation on future internet issues. This network will be based on the REMESSA existing network. Besides serving as an experimental sandbox for educational and research institutions of Bahia, we plan to link BAMBU with other national and international networks, through the FIBRE project.

Project details:

Period: 2015 - 2020
Team: UFBA, IFBA, Oswaldo Cruz Foundation (FIOCRUZ), RNP, LNCC, UFES, Florida International University, Philips
Funding: Bahia State Research Agency (FAPESB)
Useful links: BAMBU WebHome

Computational infrastructure to support big data applications in health

Summary: This project aims at to design a middleware for probabilistic record linkage of governmental databases: Cadastro Único (socioeconomic data), PBF (payments from Bolsa Família) and SUS (Brazilian National Health System). This middleware will provide data warehouse (ETL) routines for data quality assessment, data cleansing, and anonymization, as well as a Spark-based execution engine to support data linkage from these databases. The generated data marts are used by statisticians and epidemiologists to assess the efficiency of social programmes related to the incidence of some diseases (leprosy, tuberculosis, HIV/AIDS) on the beneficiary population.

Project details:

Period: 2014 - 2016
Team: UFBA, Oswaldo Cruz Foundation (FIOCRUZ)
Funding: Early Doctor Research Grant (UFBA)
Useful links:

Cloud computing infrastructure to support Bioinformatics and Robotics applications

Summary: This project aims at to i) improving our BOINC implementation designed for the GT-MC² and ii) developing a new implementation to support highly distributed applications based on Hadoop. We evaluated a number of Bioinformatics applications in both implementations (SGA for BOINC and SGA for Hadoop). We are also considering the utilization of hybrid parallel architectures (multicore + multi-GPU) in order to efficiently run these applications.

Project details:

Period: 2013 - 2015
Team: UFBA, UNEB, Polytechnic University of Valencia (UPV)
Funding: Scientific Initiation Grant (UFBA)
Useful links:

JiT-Clouds: highly scalable infrastructure-as-a-service

Summary: JiT-Clouds is a research effort carried out by a group of Brazilian Universities and Research Centers, sponsored by the Centro de Pesquisa e Desenvolvimento em Tecnologias Digitais para Informação e Comunicação (CTIC) held by the Ministry of Sciences and Technology. It aims at developing an alternative way to build public cloud infrastructures, based on the concept of Just-in-Time (JiT) deployment of the computing infrastructure.

Project details:

Period: 2011 - 2013
Team: UFCG, UFRGS, UFBA + 11 other universities and research laboratories
Funding: CTIC (Brazilian Ministry of Sciences and Technology)
Useful links: CTIC - JitClouds

GT-MC²: my scientific cloud

Summary: MC² is a cloud computing platform aimed to support e-science applications. It provides access to a large amount of computational resources for brief time intervals, storage, reproducibility of experiments and control of data provenance. This platform uses a PaaS model, allowing for the easy development and deployment of customized services and portals, accessed at the SaaS level. At the IaaS level, MC² employs a broker to efficiently provide access to high performance clusters, volunteer computing resources (based on BOINC), peer-to-peer computing resources (based on OurGrid) and cloud resources (based on Eucalyptus).

Project details:

Period: 2011 - 2013
Team: LNCC, UFCG, UFBA, UFC, UFRGS
Funding: CTIC/RNP (Brazilian Ministry of Sciences and Technology)
Useful links: RNP

Analysis of performance models applied to high-performance hybrid architectures

Summary: This project aims at to study performance models used for high performance processing in hybrid architectures composed by multicore CPUs and manycore GPUs. We want to identify and measure some metrics related to performance and processing capacity/elasticity, as well as limitations related to application execution, tools for applications development and other aspects related to each architecture. A set of applications belonging to different classes (highly coupled, bag of tasks and data-intensive) will be evaluated in terms of their requirements (resources needed, data movement etc), aiming at to define a set of operating characteristics for each class. As major outcomes, the project must generate a detailed analysis on the suitability of current performance models applied to hybrid architectures and propose some extensions in order to efficiently support such architectures.

Project details:

Period: 2011 - 2013
Team: UFBA, UNEB, UNIVASF
Funding: Scientific Initiation Grant (UFBA)
Useful links:

GT-UniT: monitoring the BitTorrent universe

Summary: This project aims at to develop a software infrastructure to monitor BitTorrent networks. The specific goals comprise the monitoring of Portuguese content, the popularity of specific contents and the traffic observed in some sub-networks. Experiments were executed in 6 servers hosted in the Brazilian internet backbone (points of presence) and more than 95 nodes in PlanetLab.

Project details:

Period: 2010 - 2012
Team: UFRGS, UFCG
Funding: CTIC/RNP (Brazilian Ministry of Sciences and Technology)
Useful links: RNP

PMM: modular multimedia platform

Summary: Design of a middleware and applications for a modular multimedia platform, offering services for digital video recording and interaction focused on digital television.

Project details:

Period: 2006 - 2008
Team: UFRGS, UNILASALLE, UFSC, Digitel
Funding: FINEP (Brazilian Ministry of Sciences and Technology)
Useful links:

MultiCluster: support for parallel programming on multiple clusters

Summary: This project aims at to define an integration model for heteregeneous cluster-based architectures composed by Myrinet, SCI, and Fast Ethernet. The main goals are to identify hardware and software requirements and provide a complete programming environment that allows the user to configure such architecture and distribute tasks according to his application needs. For such, we integrate different DECK implementations and use JXTA to aggregate resources from heterogeneous clusters (PhD research).

Project details:

Period: 2000 - 2006
Team: UFRGS, Universität Paderborn, Laboratoire d'Informatique de Grenoble (LIG/UJF)
Funding: CAPES (Brazilian Ministry of Education) - PhD Fellowship
Useful links: UFRGS - LUME Repository

DECK: parallel programming applied to cluster computing

Summary: This project focuses on the development of a parallel programming library called DECK (Distributed Execution and Communication Kernel) applied to clusters composed by different communication technologies (Fast Ethernet, Myrinet, and SCI). We developed a DECK version for each communication technology and evaluate its performance against MPI and Athapascan-0 (MSc research).

Project details:

Period: 1998 - 2000
Team: UFRGS, Laboratoire d'Informatique de Grenoble (LIG/UJF)
Funding: CAPES (Brazilian Ministry of Education) - MSc Fellowship
Useful links: UFRGS - LUME Repository

DPC++: distributed processing in C++

Summary: DPC++ applies object-orientation as a basis for distributed programming. The main focus is to extend the C++ programming language with abstractions for object distribution and communication, as well as a good load balancing among the resources. The user is not aware of such operating aspects as the DPC++ preprocessor performs all operations needed to distribute, communicate and coordinate distributed tasks and objects.

Project details:

Period: 1995 - 1998
Team: UFRGS
Funding: CNPq (Brazilian National Research Council)
Useful links:

ArMA-GAPP: study and application of vector architectures

Summary: This project uses a vector processor architecture (NCR GAPP) and some C-based tools we have developed to run and evaluate image processing applications.

Project details:

Period: 1993 - 1994
Team: UFRGS
Funding: CNPq (Brazilian National Research Council)
Useful links:

Research projects and collaborations

Agent frameworks in data science programming education

Project details:

What promotes experiental learning?

Project details:

Capacity Building on Generative AI Tools for Programming Education

Project details:

Project outputs:

Harnessing Artificial Intelligence and Data Linkage to Improve Outcomes for Justice-involved Young People

Project details:

Understanding Pedagogical Approaches for Large Language Models (LLMs) in Programming Education

Project details:

GENIAL - GENerative AI Tools as a Catalyst for Learning

Project details:

Decolonising data science teaching and learning

Project details:

Evaluating effects of social inequalities on the COVID-19 pandemic in a low- and middle-income country

Project details:

AI as a service for tackling COVID-19 in Brazil

Project details:

Alert-early system of outbreaks with pandemic potential

Project details:

Risk of chronic clinical condition following previous hospitalisations by psychiatric disorder

Project details:

Scaling up multimodal data fusion and analytical models over multiple-GPU systems

Project details:

Design and validation of personalised risk prediction models over Brazilian health care data

Project details:

Standardisation of wearable-based algorithms for healthcare applications in developing countries

Project details:

IMAPI - early childhood friendly municipal index

Project details:

Integrating socioeconomic and health data to combat malaria

Project details:

Treating heterogeneity and uncertainty in data integration: case study on Brazilian databases

Project details:

Design of a scientific repository (data lake) for big data applications

Project details:

BAMBU - metropolitan network for trial and innovation on future internet

Project details:

Computational infrastructure to support big data applications in health

Project details:

Cloud computing infrastructure to support Bioinformatics and Robotics applications

Project details:

JiT-Clouds: highly scalable infrastructure-as-a-service

Project details:

GT-MC2: my scientific cloud

Project details:

Analysis of performance models applied to high-performance hybrid architectures

Project details:

GT-UniT: monitoring the BitTorrent universe

Project details:

PMM: modular multimedia platform

Project details:

MultiCluster: support for parallel programming on multiple clusters

Project details:

DECK: parallel programming applied to cluster computing

Project details:

DPC++: distributed processing in C++

Project details:

ArMA-GAPP: study and application of vector architectures

Project details:

GT-MC²: my scientific cloud