- Meeting ID: 867 6409 6440
- passcode: 149120
13h00 - 13h50 – Carito Guziolowski (LS2N, École centrale de Nantes)
Machine Learning, Logic Programming and Learning Boolean Networks
This talk will present a work published on 2018 (Chebouba et al., BMC Bioinformatics),
applied to discriminate the response of Acute Myeloid Leukemia (AML) patients. The data we
used was obtained from the DREAM 9 challenge, specifically: given a phosphoproteomic
dataset composed of 231 measured proteins across 191 AML individuals, and knowing which
prognostic (“complete remission” or “resistant”) the individuals had after receiving treatment,
find the computational model that fits both prognostics. Our model was a family of Boolean
networks (BNs) that explained the data of remission/resistant classes of AML patients. We
later confirmed that the different rewiring of protein PIK3CA present in the learned BN
families, confirmed biological reality.
The core of this research focused on how phosphoproteomic data was handled in order to
produce discriminant data points. Such discriminant data points constitute an essential input
of a tool, caspo, we developed to learn BNs. Retrieving discriminant models is classically done
by machine learning and statistical methods. For the Dream 9 challenge these methods were
proposed and focused mainly on clinical variables, since they found them more discriminant of
AML patient response to treatment than proteomics data. Our graph-based approach applied
a machine learnign logical program to obtain a discriminant dataset, used later by caspo to
discover boolean gates between discriminant species connected in a prior regulatory network.
This approach is interesting in our current research because it allows to obtain discriminant
input-data, focusing on a specific subset of species across a specific subset of individuals, to
learn BNs. It relied on the high dimensionality of the dataset: 231 species measured for 191
AML individuals. Similar problems, for example considering single-cell data, have larger
dimensions (e.g. 30000 genes across 1500 cells) and may generate multiple choices for input-
data as well. Current machine learning approaches to handle single-cell data focus on
similarites or co-expression regions (e.g. WGCNA) to reduce the dimension of genes. We
believe it can be also interesting to focus on the differences, mutual-exclusiveness and
contrasts of single-cell expression, across multiple states to select genes behaviors that allow
us to learn a consensual model or families of BNs that explain different paths taken by cells to
arrive to the same cell-fate.
Dernière modification le 02/09/2022