Cardiac Proteomics and Signalling Laboratory

Open Student Positions

Are you a student searching for an opportunity to do independent research (199), a rotation project, or a summer internship? Please get in contact with us!

Positions are open starting Fall 2020.

High School Students: David Liem, MD, PhD ( - see additional information below.

Undergraduate and Graduate Students: Harry Caufield, PhD ( 

We are particularly interested in students with interests or experience in medicine, biological science, data science, and/or computer science. You do not need experience in all areas to be eligible for a research project.

Current research interests in our group with project opportunities include:

  1. Massive Information Extraction and Deep Analysis of Text Data: Implementing machine learning (ML)-supported information extraction approaches, adapting advanced data structures (e.g., knowledge graphs), and leveraging clinical coding systems (e.g., ICD-11) to extract relationships among clinical observations and disease diagnoses from clinical text documents. (Caufield, Zhou, & Garlid et al., 2018; Caufield et al., 2018; Liem & Maruli et al., 2018; and Ping et al., 2017)
  2. Harmonizing Knowledge and New Discovery of Therapeutics in Cardiovascular Medicine: Applying data science strategies to uncover hidden relationships among cardiovascular drugs, their molecular targets, and potential adverse effects in the setting of the pathogenesis of heart disease. (Caufield, Zhou, & Garlid et al., 2018; Ping et al., 2017; see rotation project by Samir Akre)
  3. ML-supported Omics Phenotyping of Human Diseases: Developing technological platforms and computational tools for in-depth analysis of integrated omics data to elucidate protein temporal dynamics, post-translational modifications, and metabolic responses during cardiac remodeling and disease progression. (Chung & Mirza et al., 2019; Wang & Choi et al., 2018; Lau et al., 2018; Perez-Riverol & Bai et al., 2017; Lam et al., 2016; and Lau et al., 2016)

Examples of recent rotation projects include “Evaluating Disease Normalization Methods on Biomedical Text” (Henry Zheng, Fall 2019) and “Developing Knowledge Graphs relating Cardiovascular Disease and Oxidative Stress”, (Samir Akre, Fall 2019).


The Ping research group has had a history of promoting high-quality research, advocating for integration between omics and data science, and receiving recognition from the international scientific community. Understanding cardiovascular physiology and pathophysiology has been a long-term goal of Dr. Ping’s research program. Her research also seeks to adapt emerging data science, natural language processing, machine learning, and information extraction techniques to solving relevant questions in cardiovascular medicine. Providing education and training to the next generation of investigators is a major mission of our laboratory. Dr. Ping has had 20 years of experience in mentoring students; 17 of her former trainees currently hold positions in academic institutions, including UCLA, UC Davis, University of Colorado, University of Heidelberg, and Fudan University. In parallel, 41 students trained in her laboratory hold positions in industry, including Amazon, Google, Microsoft, and Intel.


We request that interested students please provide a cover letter describing their research interests and a resume/CV.


We believe a diverse team will come up with the most creative solutions. We encourage applications from people with a diverse set of backgrounds.


Please see 'Research Projects for Undergraduate and Graduate Students' below for specific project opportunities.

High School Student Summer Internship

Program Description: The high school summer internship program offers stipend support for two high school students annually. Each awarded recipient will receive financial support for up to four weeks. The award may be used for travel, lodging, and living allowance, as we do NOT provide housing. Participants in the high school sum­mer internship pro­gram are expec­ted to work under instructors’ supervision on an assigned project. A report is required at the end of the internship, including an essay describing a specific topic, a slide presentation concisely visualizing key points, and an oral presentation showing communication/public speaking skills. An eva­lua­tion from the pro­ject super­vi­sor and Dr. Ping will be provided. Pre­vious reci­pients of this sum­mer internship award were all accep­ted by pro­mi­nent uni­ver­si­ties inc­lu­ding UCLA, UC Berkeley, Har­vard, Duke, and Stan­ford. Requests for recommendation letters from the project supervisor and Dr. Ping must be received 6-8 weeks prior to their due date.

Internship positions are available for Summer 2020!

For applicants of the 2020 summer internship, please submit the following materials in a single email to Dr. David Liem ( 

Letters of recommendation can be sent separately.

Postdoctoral Research Positions in Cardiovascular Data Science

NIH-supported Postdoctoral Fellowships are available in the Cardiovascular Data Science Program at UCLA in the Westwood Campus, Los Angeles, California.

Successful applicants will have an exciting opportunity to work with a group of highly talented and creative investigators on projects focusing on one of the following of these two areas of Data Science: 1) machine learning, specifically text mining both biomedical literature and electronic health records; and 2) the development of analytics and data sharing platforms on proteomic data. Salaries are commensurate with experience.

Qualifications: Candidates should have or be very close to obtaining a PhD, MD, MD/PhD, or equivalent doctoral degree. Applicants must have a strong background in quantitative biology or a related field. Experience in computation is highly desirable, but not required. It is essential that applicants have previously demonstrated a passionate work ethic, strong enthusiasm for Big Data research, and a propensity for scientific achievement. Please note NIH requirements for trainees.
All Applications should include the following:

  1. Curriculum vitae;
  2. Contact information of at least three references;
  3. Brief description of past research experience and accomplishments, current and future research interests, and expected availability date.

Please send inquiries to Dr. Dominic Ng: and specify the position to which you are applying. Applications will be reviewed immediately as received. Start dates are flexible.

The University of California, Los Angeles is an Equal Opportunity/ Affirmative Action Employer advancing inclusive excellence.

Research Projects for Undergraduate and Graduate Students


A study of Drug to Cardiovascular Disease (CVD) Associations with SemRep and Deep Learning

Description: Starting with well defined oxidative stress categories (e.g., Initiation, Regulation and Outcome of Oxidative Stress) and a list of drugs in cardiovascular disease (CVD), we will explore SemRep to extract all relevant SPO- triplets. We further build knowledge graphs with these triplets and prepare a muli-order association matrix to represent graph data structure. Using this graph structure, we will build a sequence prediction model for drug to CVD association. This project will provide a detailed analysis of drugs to CVD association with both qualitative evidence and quantitative scores.

Project leaders: David Liem (, Dibakar Sigdel (

Education goals: The students will learn how to work with innovative text mining tools (e.g., SemRep, CaseOLAP, Neo4J)  for biomedical documents and machine learning approach (RNN, LSTM)  for model development and implementation to answer important biomedical questions.

Scientific goals: The students will explore knowledge graphs for drug and CVD associations with a focus on oxidative stress categories (e.g., Initiation, Regulation and Outcome) and underlying molecular mechanism.


A study of Covid-19 Knowledge Graphs for different Age Groups and CVD Cases

Description: Covid-19 is caused by a coronavirus called SARS-CoV-2 and often presents with symptoms of high fever, cough and shortness of breath. In severe cases, Covid-19 may lead to acute respiratory distress syndrome (ARDS) and multiple organ dysfunction and eventually to death. It is clear that the severity and mortality of Covid-19  is much higher than any other known coronaviruses. New data from Covid-19 cases have indicated that the severity and mortality of this disease are significantly higher in elderly patients and patients with a history of CVD. Applying a Text Mining approach, the students will explore the role of risk factors such as ageing and several cardiovascular diseases (e.g., coronary artery disease) on the severity of Covid-19, and unravel possible underlying mechanisms.

Project leaders: David Liem (, Dibakar Sigdel (

Education goals: Students will learn how to apply innovative tools in text mining and knowledge graphs (e.g., Neo4J and Spark) for data exploration and for the development of search algorithms with specific tasks in biomedical scenarios.

Scientific goals: Students will learn how to hypothesize meaningful biomedical questions from available tools and databases in CVD and Covid-19. (e.g., Which age groups and pre-existing CVD significantly increase the risk of mortality in Covid-19, and what are the underlying mechanisms?) The search results can be further explored to investigate the underlying age based mechanism.


A study of Covid-19 Knowledge Graphs for Drugs and CVD Cases

Description: Covid-19 is caused by a coronavirus called SARS-CoV-2. It is believed that this virus has a pivotel interaction with the renin-angiotensin-aldosterone system to enter cells in the body. Accordingly, concerns exist that certain CVD drugs such as angiotensin-converting enzyme blockers (ACE inhibitors) and angiotensin receptor blockers (ARBs) may increase the susceptibility to SARS CoV-2 as well as the severity of Covid-19. In this project, the students will apply a text mining approach to create a Covid-19 KG for ACE inhibitors and ARBs and identify relevant underlying molecular pathways and mechanisms that may play a role in Covid-19.

Project leaders: David Liem (, Dibakar Sigdel (

Education goals:  The students will learn how to work with innovative tools in text mining and knowledge graphs (e.g., Neo4J and Spark) for data exploration and development of search algorithms for specific tasks in biomedical scenarios. 

Scientific goals:  To learn how to hypothesize meaningful biomedical questions from available tools and databases in CVD and Covid-19. (e.g., Which drug or drug category has a significant effect on the severity and mortality of Covid-19, and what are the underlying mechanisms?) The search results can be further explored to investigate underlying age based molecular mechanisms.


Mapping Collective Knowledge of the Cardiac Proteome

Description: By definition, we expect that a proteome lists each protein within a particular tissue or organ. A cardiac proteome, for example, should include identities and amounts of each protein in the heart. This definition becomes clouded once we begin considering specific conditions: how does an unhealthy (e.g., hypertrophic or failing) heart’s proteome differ from that of a healthy one? Does the proteome change over time? How may the proteome vary between hearts from male or female individuals? Our ability to address these questions may be limited by the samples used to define each proteome as well as by inherent experimental variability. We may search across current and past literature to rigorously define and merge differing (and in some cases, conflicting) observations of cardiac protein expression, with the goal of assembling an updated proteome of the human heart. This process requires intensive application of text mining coupled with an understanding of cardiac-specific biological pathways. This project will place particular focus on three types of proteins: contractile proteins, proteins impacted by oxidative stress, and proteins with metabolic functions (especially those involved in branched chain amino acid, or BCAA, metabolism) as these topics are foci of other lab efforts. Assembly of an updated cardiac proteome will produce a crucial reference for classification of a peptide’s relevance to the heart.

Project leader: Harry Caufield (

Education goals: An understanding of PubMed and the language used in biomedical research literature. Experience with obtaining text data through an API. Familiarity with computational methods for bibliometrics, text mining, information extraction, and natural language processing. Knowledge of biomolecular pathways in cardiac function.

Scientific goals: Construction of a literature-derived cardiac proteome, serving as a comprehensive resource for identification of proteins most relevant to healthy and diseased cardiac phenotypes.


Constructing an Integrated Cardiovascular Knowledge Graph to Discover Disease Phenotype Relationships

Description: Modern bioinformatics and biomedical informatics projects rely upon well-curated knowledge bases and data repositories. These resources contain structured information describing proteins (e.g., UniProtKB), biomolecular interactions (e.g., IntAct), or genotype-phenotype relationships (e.g., OMIM), among numerous other topics. Similarly, carefully engineered ontologies and coding systems define relationships between diseases (e.g., Disease Ontology; ICD) or broader sets of biomedical concepts (e.g., MeSH). Though each of these resources are data-rich and highly valuable, we rarely need to use any one of them in their entirety - and we would like to use knowledge curated from multiple sources, even when their structures present obstacles to data integration. By exploring the subset of each knowledge base and ontology through the perspective of cardiovascular disease research, we may identify the most relevant elements and unify them within a single graph structure. The resulting knowledge graph supports asking complex questions about cardiovascular phenomena. With some additional engineering, higher-level representations of these knowledge graphs can drive machine learning approaches for understanding cardiovascular disease. 

Project leader: Harry Caufield (

Education goals: An understanding of the technical methods required to integrate heterogeneous biomedical relationships described in text and knowledge bases. Skills to gain familiarity with include: data retrieval through APIs, text data analysis and natural language processing with Python, and data management in Neo4j. Experience with the data formats and structures used to store biomolecular data and metadata, as well as ontologies (e.g., OBO or OWL formats) and other data (e.g., JSON).

Scientific goals: Assemble a consistently-structured knowledge resource optimized for phenomena relevant to cardiovascular disease, including relationships between disease phenotypes, biomolecules, biomolecular pathways, symptoms, and therapeutics. Identify best practices for merging specific knowledge sources. Develop reusable code for obtaining and integrating knowledge base contents.


Knowledge Graph construction and analysis to support heart failure classification

Description: New cases of heart failure, or HF, are diagnosed by the millions each year. Not all hearts fail in the same manner, however: HF cases may be categorized by their percentage of healthy ejection fraction, or EF. An EF below 40% is considered HF with reduced EF (HFrEF) while HF with an EF greater than 50% - while often physiologically normal outside the context of disease - constitutes HF with preserved ejection fraction, or HFpEF. HFpEF is increasingly common and is distinguished from HFrEF by a variety of presentation factors, patient traits, comorbidities, and other factors such as systemic inflammation. How may we organize these varied factors in a consistent manner? If clinical and biomolecular correlates with HFrEF or HFpEF are structured as relationships, may we assemble them into a knowledge graph? What may this knowledge graph allow us to infer regarding HF classification?

Project leader: Harry Caufield (

Education goals: An understanding of the technical methods required to integrate heterogeneous biomedical relationships described in text and knowledge bases. Skills to gain familiarity with include: data retrieval through APIs, text data analysis and natural language processing with Python, and data management in Neo4j. The ability to analyze knowledge graphs (and, by extension, other networks of biomedical relationships) to identify relationships supporting conclusions about cardiovascular disease. Students will also gain knowledge of the symptomology of heart disease.

Scientific goals: Identify specific patterns of biomedical relationships associated with specific subtypes of heart failure, such that text describing heart failure may be classified without explicit definitions being present (e.g., HFpEF may be described implicitly).


Mass Spectrometry (MS)-based Proteomics in Cardiovascular Research 

Description: Proteomics is the large-scale study of proteomes within a biological system. Building on advances in mass spectrometry and data sciences, proteomics approaches have offered powerful means in understanding of cardiovascular diseases. Massive mass spectrometry datasets are the intersection between proteomics and data science. In this project, students will learn the proteomics sample processing techniques and gain the knowledge in mass spectrometry for applying downstream data analysis on studying cardiovascular diseases.

Project leaders: Dr. Ding Wang (, Dr. Dominic Ng (, Dr. Howard Choi (

Education goals: Students will learn the fundamental concepts of mass spectrometry, get familiar with sample preparation protocols and data acquisition workflow for MS-based proteomics, and learn how to extract the MS data for downstream data.

Scientific goals: Introduce fundamental concepts of mass spectrometry and proteomics to students. After the training, the students will be able to tell the differences between Top-down and bottom-up approaches, apprehend standard proteomic applications in biomedical research, and know what information can be retrieved from proteomic datasets.


Bioinformatics Pipelines for Proteomics Data Analyses

Description: Bioinformatics tools, including the Integrated Proteomics Pipeline (IP2), in-house generated software packages, are employed to characterize properties of individual protein at the proteome-level, in a high-throughput fashion. Publicly available kownledgebases (e.g., Uniprot & Reactome) support proteomics data analyses and enable further data interpretation.

Project leader: Dr. Howard Choi (

Education goals: Students will be introduced to several bioinformatics tools essential for proteomics data analyses. After the training, they will be able to independently utilize these resources to characterize biological variables of interest (e.g., Proteins, O-PTMs) from raw proteomics datasets. 

Scientific goals: Understand the fundamental concepts and/or algorithms of these bioinformatics resources. Get comfortable in applying bioinformatics tools to better characterize biological systems. They should develop a data-driven mindset different to the conventional hypothesis-driven approaches that once dominated biomedical investigations.   


O-PTM in Cardiovascular Biology and Medicine

Description: In a cardiac cell, the proteome consists of more than 200,000 proteins. Multiple proteins interact with each other to form a biological pathway. Each pathway performs a function and supports a cellular process. Changing the function of an individual protein may lead to alterations on the function of the entire pathway. Post-translational modification (PTM) is a common mechanism regulating protein structure and function. Oxidative stress is a redox imbalance when the generation and accumulation of reactive oxygen species (ROS) exceed the endogenous antioxidant capacity of living organisms. It is often involved with the progression of cardiovascular diseases (CVD). Oxidative stress sensitive post-translational modifications (O-PTMs) are typical features of proteins in human hearts; these O-PTMs are associated with healthy and/or diseased conditions. 

Project leaders: Dr. Ding Wang (, Dr. Dominic Ng (, Dr. Howard Choi (

Education goals: Oxidative stress biology: get familiar with common reactive oxygen species (ROS), ROS-generating enzymes, and antioxidants. O-PTMs: get familiar with 15 types of O-PTMs, know their AA targets and changes in m/z value. Extract O-PTM signatures of proteins: get components associated with a CV-relevant biological pathway; get their identification, subcellular distribution, and O-PTMs (e.g., modification type, modification site, occupancy). 
Scientific goals: Identify O-PTM changes unique to health and disease conditions of human hearts.  The similarity and differences between human and mouse protein homologues will be compared. These findings may offer opportunities to interpret phenotypic observations in human HF and mouse models under stress.