This executive session combines four lecture modules:
The purpose is to move from broad interests to an initial, feasible, and justifiable research direction for your capstone proposal.
By the end of this session, you should be able to:
GRAD 695 is the first course in a two-course capstone sequence:
Proposal assignments are iterative. Revision based on feedback is expected each week.
Science is a systematic and creative way to investigate the world. It is not a single event but a dynamic process of asking questions, collecting evidence, and refining explanations.
Common goals include:
Goal clarity determines both research design and data requirements.
Objectivity does not mean having no bias. It means:
Analytics research often combines:
## 1.5 Hypothesis Formation
A hypothesis is a testable, falsifiable statement predicting the relationship between variables. Scientific inquiry requires stating hypotheses before data collection to avoid confirmation bias and selective reporting.
Two hypotheses anchor every empirical test:
In analytics contexts this translates to: “Does feature X improve predictive accuracy meaningfully beyond a baseline?”
## 1.6 Types of Research Design
Design choice determines the strength of your causal claims, feasibility, and generalizability. No design is universally best — the right choice depends on your research question, available data, and practical constraints.
Analytics context: Most capstone projects use observational or quasi-experimental designs because randomized experiments require controlled conditions rarely available with secondary data. This is entirely appropriate — observational research yields rigorous insights when validity threats are explicitly documented.
By the end of GRAD 695, you should have a publication-ready preliminary study manuscript of approximately 10 single-column pages (~5 dual-column pages in IEEE Access format, excluding references) with:
This serves as a feasibility and preliminary study submitted to TechRxiv. In the following course (GRAD 699), you will undertake the full research phase (~20 single-column / ~10 dual-column pages), executing this proposal and reporting the complete findings.
A strong proposal topic should be:
A research proposal is not an essay; it is a structured argument. Each section has a specific job in convincing the reader that your problem matters and your methods are sound. While the exact lengths will vary based on your topic, here is a typical page breakdown for a ~10 single-column page (~5 dual-column page in IEEE Access format) quantitative analytics manuscript with preliminary results (excluding references):
Good problem definition requires alignment among:
A topic can be interesting but still not feasible; feasibility is part of quality.
A clear scope should specify:
Many analytics projects involve abstract constructs that need measurable proxies.
Examples:
Construct: customer satisfaction
Variable: 1 to 7 satisfaction score
Construct: investment risk
Variable: monthly return volatility
Evaluate your project in four dimensions:
| Dimension | Guiding_Question | Current_Status | Action_Next_Week |
|---|---|---|---|
| Data | Is relevant, usable, and sufficiently representative data available? | ||
| Technology | Are required tools, software, and compute resources available? | ||
| Skills | Do I have or can I gain required domain and analytic skills quickly? | ||
| Time | Can this project be completed with realistic milestones in two semesters? |
## 3.5 Variable Roles and Relationships
Correctly identifying and labeling variables is one of the most important steps in proposal writing. Misclassifying a confounder as a covariate, or omitting a mediator, changes the interpretation of every result.
| Variable Role | Definition | Example |
|---|---|---|
| Independent (IV) | The predictor; what you manipulate or observe as a cause | Treatment group, age, education level |
| Dependent (DV) | The outcome; what you measure | Readmission rate, exam score, revenue |
| Confounder | Affects both IV and DV; must be controlled | SES affecting both diet and health outcomes |
| Moderator | Changes the strength of the IV → DV relationship | Gender moderates the stress → performance link |
| Mediator | Explains how IV affects DV | Motivation mediates study time → grade |
## 3.6 Literature Review Strategy
A rigorous literature review follows a systematic process to find, screen, and synthesize existing knowledge. This prevents duplication, establishes your study’s contribution, and informs your methods.
Step 1 – Build a keyword list: Generate synonyms and related terms for each concept in your research question.
Step 2 – Search databases: Google Scholar first for breadth; then HU Library, IEEE Xplore, PubMed, or ACM Digital for depth.
Step 3 – Screen results: Read titles and abstracts; exclude irrelevant or low-quality sources.
Step 4 – Full-text review: Read retained articles; note methods, findings, limitations.
Step 5 – Synthesize: Group findings by theme, not article. Identify gaps your proposal fills.
The biggest reason proposals fail is lack of alignment. Your proposal must contain a “Golden Thread” flowing logically from the original problem down to the exact mathematical operator used.
If your Research Question asks for a prediction, your methods cannot purely rely on descriptive ANOVA. If you are testing a causal effect, you need to control for confounders.
Analytics research uses data and analytical methods to answer practical or theoretical questions. Typical outputs include explanation, prediction, comparison, and decision support.
Common categories in analytics research:
For each candidate topic, ask:
| Topic | Problem Statement | Data Source | Candidate Method | Potential Contribution |
|---|---|---|---|---|
| Social Media Sentiment & Stock Returns | Can daily Twitter/X sentiment scores reliably predict next-day stock price movement for S&P 500 companies? | Twitter/X Academic API; Yahoo Finance historical prices | LSTM / Transformer sentiment model; regression or classification on returns | Quantify lag between social sentiment and price movement; compare signal strength across sectors |
| Hospital Readmission Prediction | Which patient features at discharge best predict 30-day readmission in Medicare records? | CMS Medicare claims data (public); hospital EHR de-identified datasets | Logistic regression; gradient boosting (XGBoost); survival analysis | Identify modifiable discharge factors; provide actionable risk scores for care teams |
| Student Dropout Early Warning | Can early course engagement signals identify at-risk students before the midterm withdrawal deadline? | University LMS logs (Canvas); institutional enrollment records | Logistic regression; random forest; time-series feature engineering | Build validated early-alert dashboard; reduce dropout by enabling timely advising interventions |
Validity is the degree to which a study measures what it claims to measure and supports the conclusions drawn. Four types of validity are critical for every proposal.
| Validity Type | Question It Answers | Common Threats |
|---|---|---|
| Internal | Does the IV actually cause changes in the DV? | Confounders, selection bias, maturation |
| External | Can findings generalize beyond the study sample? | Non-representative sample, artificial setting |
| Construct | Do our measures actually capture intended constructs? | Poor operationalization, mono-method bias |
| Statistical | Are our statistical conclusions trustworthy? | Low power, violated assumptions, multiple testing |
Ethical and reproducible practice is not a formality — it determines whether your work can be trusted, built upon, and defended.
Ethics essentials for analytics projects:
Reproducibility best practices:
Choose one claim and discuss how you would test it with data:
For your selected claim, identify:
Pick one concept in your topic area and complete:
A quick confirmation of course policies: - All sources must be properly cited in APA format. - Plagiarism (including undisclosed copy-pasting from generative AI) results in a zero. - Review the Academic Integrity module in Canvas.
Before submission, ensure your topic includes:
## 5.4 Evaluating Research Questions: FINER Criteria
The FINER framework (Hulley et al., 2013) provides a systematic checklist for determining whether a research question is ready to anchor a proposal.
| Criterion | Meaning | Self-Check Question |
|---|---|---|
| Feasible | Achievable with available resources | Can I realistically collect or access the data within two semesters? |
| Interesting | Matters to a community | Would an analytics practitioner or researcher find this worth reading? |
| Novel | Adds something new | Does this extend, replicate with variation, or challenge existing work? |
| Ethical | Meets ethical standards | Does the study comply with data privacy and IRB requirements? |
| Relevant | Addresses a meaningful problem | Does this connect to real-world analytics practice or public good? |
Do not wait until Week 13 to write the proposal. The proposal is built sequentially.
Strong projects begin with clear thinking, realistic scope, and disciplined iteration. Prioritize feasibility and clarity now so that later methods and writing become easier and more rigorous.