With the clean data scrolls archived, we begin the most important phase: Data Analysis. This is where we stop counting and start understanding—by linking the exposure (SuperPaste use) to the outcome (reduction in Dental Caries) to measure the impact on health.
1. Objectives of Data Analysis
The primary goals of analyzing the data are:
- To plan and program the analysis steps systematically.
- To account for chance (random errors), biases, and third factors (confounders).
- To assess causality (linking exposure to outcome).
- To measure the impact of the exposure.
2. The Seven-Step Analysis Strategy
Data analysis is a structured, sequential process. You must not jump steps (avoiding Post Hoc analysis—analysis driven by data without a prior plan).
| Step | Action | Purpose |
| 1. Identify Study Type | Review the protocol to confirm if the study is Descriptive (measuring quantity/indicator) or Analytical (testing a hypothesis). | Establishes the main framework (e.g., measuring Incidence or Prevalence, or calculating Relative Risk/Odds Ratio). |
| 2. Identify Main Variables | Clearly list the Outcomes, Exposures, and Potential Third Factors (confounders) that will be analyzed. | Focuses the effort on the core study questions. |
| 3. Get Familiar with Data | Perform Frequency Distribution of all variables, check for blanks/missing values, check ranges against the data dictionary, and look for duplicates/inconsistencies. | This is the crucial data cleaning and quality check phase. |
| 4. Characterize Population | Describe the study population using Descriptive Statistics (e.g., frequencies by age, gender, income, clinical features). | Gives a clear baseline picture of the study groups. |
| 5. Examine Association | Compare groups to test the a priori hypothesis (e.g., is SuperPaste use associated with fewer Caries?). Use the measure of association appropriate for the study design (Relative Risk for Cohort, Odds Ratio for Case-Control). | The most interesting step—determining the primary link. |
| 6. Create Additional Tables | Analyze new or interesting variables using simple two-way tables based on initial findings. | Exploratory analysis guided by the data. |
| 7. Conduct Advanced Analysis | Perform Dose-Response assessments, Stratification, and Multivariate Modeling. | Final, in-depth analysis to control for confounders and predict outcomes. |
3. Practical Tips for Analysis Planning
| Tip | Description |
| Prior Plan | Analysis must be planned well in advance. |
| Use Empty Tables | Prepare empty table shells (dummy tables) showing exactly how your results will look. The analysis phase is simply filling these shells. |
| Analyze by Stages | Proceed sequentially: Recoding $\rightarrow$ Descriptive Analysis $\rightarrow$ Analytical Analysis. |
| Avoid Post Hoc Analysis | Do not analyze data without a plan just because you “want something” or find a random association. |
Example: Analyzing the SuperPaste Study (Exposure and Outcome)
This process shows the sequential nature of analysis using the exercise/diabetes example:
| Stage | Action (Sequential) | Purpose |
| Recoding | Create new variables: Outcome (e.g., “Reduced Caries: Yes/No”). Key Variables (e.g., cut Age into groups, group income levels, group SuperPaste use into “daily/occasional/none”). | Prepares all variables for statistical testing. |
| Descriptive | Calculate the frequency of the outcome by each group (e.g., “What percentage of ‘daily users’ achieved Caries reduction?”). | Provides baseline insight. |
| Analytical (Univariate) | Examine the outcome one variable at a time (Univariate Analysis) (e.g., Caries reduction by age, Caries reduction by gender). | Finds crude associations. |
| Analytical (Stratified) | Examine Dose-Response (e.g., outcome by quartiles of SuperPaste use). Then, examine the main relationship (SuperPaste $\rightarrow$ Caries reduction) stratified by confounding factors (e.g., stratified by income level). | Assesses the influence of third factors. |
| Analytical (Multivariate) | Use a Logistic Regression Model to determine if SuperPaste use is an independent predictor of Caries reduction, while controlling for all key confounders (age, gender, income). | Provides the final, most robust estimate of association/causality. |
4. Software Recommendations
Crucially, avoid the temptation to use spreadsheets (like Excel) for data management and analysis, regardless of the study size. Spreadsheets lack the necessary data management and quality assurance capabilities.
It is highly recommended to use dedicated software that offers both data management and statistical analysis capabilities. Examples of free software include EpiInfo, which can create collection forms, enter data, analyze, and even map information.

Leave a Reply