Introduction

1. Session Overview

This executive session combines four lecture modules:

  • Scientific Thinking
  • Course Overview
  • Problem Definition
  • Analytics Research Overview

The purpose is to move from broad interests to an initial, feasible, and justifiable research direction for your capstone proposal.

2. Learning Objectives

By the end of this session, you should be able to:

  1. Explain how scientific inquiry supports analytics research.
  2. Distinguish problem statements from research questions.
  3. Define project scope, population, sample, and key variables.
  4. Evaluate feasibility based on data, tools, skills, and timeline.
  5. Draft and justify initial capstone topic ideas.

3. Course Context (From Syllabus)

GRAD 695 is the first course in a two-course capstone sequence:

  • GRAD 695: acts as a feasibility or preliminary study (~10 single-column / ~5 dual-column pages in IEEE Access format, excl. references). You will build a publication-ready manuscript with preliminary results for TechRxiv submission.
  • GRAD 699: the full research phase (~20 single-column / ~10 dual-column pages) where you execute the proposal and report complete findings.

Proposal assignments are iterative. Revision based on feedback is expected each week.


Part 1: Scientific Thinking and Inquiry

1.1 What Is Science?

Science is a systematic and creative way to investigate the world. It is not a single event but a dynamic process of asking questions, collecting evidence, and refining explanations.

1.2 Goals of Science

Common goals include:

  • Describe phenomena accurately.
  • Explain why patterns occur.
  • Predict likely outcomes.
  • Support decisions and interventions.

Goal clarity determines both research design and data requirements.

1.3 Scientific Objectivity

Objectivity does not mean having no bias. It means:

  • Recognizing your assumptions.
  • Being transparent about decisions.
  • Following ethical and evidence-based practices.
  • Remaining open to disconfirming evidence.

1.4 Ways of Knowing in Research

Analytics research often combines:

  • Practical knowledge (what works)
  • Experiential knowledge (what we observe in context)
  • Rational knowledge (logical reasoning)
  • Discursive knowledge (peer discussion and critique)
  • Empirical evidence (data)


## 1.5 Hypothesis Formation

A hypothesis is a testable, falsifiable statement predicting the relationship between variables. Scientific inquiry requires stating hypotheses before data collection to avoid confirmation bias and selective reporting.

Two hypotheses anchor every empirical test:

  • Null Hypothesis (H₀): No effect, no difference, or no relationship exists.
  • Alternative Hypothesis (H₁): An effect, difference, or relationship exists in a specific direction.

In analytics contexts this translates to: “Does feature X improve predictive accuracy meaningfully beyond a baseline?”

## 1.6 Types of Research Design

Design choice determines the strength of your causal claims, feasibility, and generalizability. No design is universally best — the right choice depends on your research question, available data, and practical constraints.

Analytics context: Most capstone projects use observational or quasi-experimental designs because randomized experiments require controlled conditions rarely available with secondary data. This is entirely appropriate — observational research yields rigorous insights when validity threats are explicitly documented.


Part 2: Course Overview and Expectations

2.1 What You Will Produce

By the end of GRAD 695, you should have a publication-ready preliminary study manuscript of approximately 10 single-column pages (~5 dual-column pages in IEEE Access format, excluding references) with:

  • Introduction
  • Literature review
  • Methods section
  • Preliminary results

This serves as a feasibility and preliminary study submitted to TechRxiv. In the following course (GRAD 699), you will undertake the full research phase (~20 single-column / ~10 dual-column pages), executing this proposal and reporting the complete findings.

2.2 Proposal Expectations

A strong proposal topic should be:

  • Relevant to data analytics
  • Feasible within available time and resources
  • Supported by data (public, secondary, or appropriately simulated)
  • Justified with prior work and clear contributions

2.3 Policy Highlights

  • Attendance and communication are essential.
  • Proposal assignments build on each other and should be submitted on time.
  • Academic integrity applies to all writing and analysis.
  • Generative AI may support brainstorming and editing, but submitted work must be original and properly cited.

2.4 The Anatomy of a Scientific Proposal

A research proposal is not an essay; it is a structured argument. Each section has a specific job in convincing the reader that your problem matters and your methods are sound. While the exact lengths will vary based on your topic, here is a typical page breakdown for a ~10 single-column page (~5 dual-column page in IEEE Access format) quantitative analytics manuscript with preliminary results (excluding references):


Part 3: Problem Definition and Scope

3.1 Triangulation Mindset

Good problem definition requires alignment among:

  • Research goal
  • Existing evidence
  • Available resources
  • Scope boundaries
  • Expected outcomes

A topic can be interesting but still not feasible; feasibility is part of quality.

3.2 Define Scope Early

A clear scope should specify:

  • Population: who or what your findings apply to
  • Sample: what data you will actually analyze
  • Context boundaries: timeframe, geography, platform, or domain

3.3 Defining Constructs and Variables

Many analytics projects involve abstract constructs that need measurable proxies.

Examples:

  • Construct: customer satisfaction

  • Variable: 1 to 7 satisfaction score

  • Construct: investment risk

  • Variable: monthly return volatility

3.4 Feasibility Checklist

Evaluate your project in four dimensions:

  1. Data: Is appropriate data available and representative?
  2. Technology: Do you have required computational tools?
  3. Skills: Can you collect/process/analyze the data?
  4. Time: Can this be completed within two semesters?

Project Feasibility Worksheet
Dimension Guiding_Question Current_Status Action_Next_Week
Data Is relevant, usable, and sufficiently representative data available?
Technology Are required tools, software, and compute resources available?
Skills Do I have or can I gain required domain and analytic skills quickly?
Time Can this project be completed with realistic milestones in two semesters?

## 3.5 Variable Roles and Relationships

Correctly identifying and labeling variables is one of the most important steps in proposal writing. Misclassifying a confounder as a covariate, or omitting a mediator, changes the interpretation of every result.

Variable Role Definition Example
Independent (IV) The predictor; what you manipulate or observe as a cause Treatment group, age, education level
Dependent (DV) The outcome; what you measure Readmission rate, exam score, revenue
Confounder Affects both IV and DV; must be controlled SES affecting both diet and health outcomes
Moderator Changes the strength of the IV → DV relationship Gender moderates the stress → performance link
Mediator Explains how IV affects DV Motivation mediates study time → grade

## 3.6 Literature Review Strategy

A rigorous literature review follows a systematic process to find, screen, and synthesize existing knowledge. This prevents duplication, establishes your study’s contribution, and informs your methods.

Step 1 – Build a keyword list: Generate synonyms and related terms for each concept in your research question.

Step 2 – Search databases: Google Scholar first for breadth; then HU Library, IEEE Xplore, PubMed, or ACM Digital for depth.

Step 3 – Screen results: Read titles and abstracts; exclude irrelevant or low-quality sources.

Step 4 – Full-text review: Read retained articles; note methods, findings, limitations.

Step 5 – Synthesize: Group findings by theme, not article. Identify gaps your proposal fills.

3.7 The “Golden Thread” of Alignment

The biggest reason proposals fail is lack of alignment. Your proposal must contain a “Golden Thread” flowing logically from the original problem down to the exact mathematical operator used.

If your Research Question asks for a prediction, your methods cannot purely rely on descriptive ANOVA. If you are testing a causal effect, you need to control for confounders.

4.1 What Counts as Analytics Research?

Analytics research uses data and analytical methods to answer practical or theoretical questions. Typical outputs include explanation, prediction, comparison, and decision support.

4.2 Data and Method Families

Common categories in analytics research:

  • Structured data analysis
  • Machine learning (supervised and unsupervised)
  • Deep learning
  • Natural language processing
  • Computer vision
  • Signal processing
  • Predictive AI and generative AI applications

4.3 Topic Brainstorm Questions

For each candidate topic, ask:

  1. What is the concrete problem or unresolved question?
  2. What is already known in the literature or practice?
  3. What data can I access now?
  4. What method families are realistic for me?
  5. What impact could the results have?

Research Topic Brainstorm Worksheet — Example Entries
Topic Problem Statement Data Source Candidate Method Potential Contribution
Social Media Sentiment & Stock Returns Can daily Twitter/X sentiment scores reliably predict next-day stock price movement for S&P 500 companies? Twitter/X Academic API; Yahoo Finance historical prices LSTM / Transformer sentiment model; regression or classification on returns Quantify lag between social sentiment and price movement; compare signal strength across sectors
Hospital Readmission Prediction Which patient features at discharge best predict 30-day readmission in Medicare records? CMS Medicare claims data (public); hospital EHR de-identified datasets Logistic regression; gradient boosting (XGBoost); survival analysis Identify modifiable discharge factors; provide actionable risk scores for care teams
Student Dropout Early Warning Can early course engagement signals identify at-risk students before the midterm withdrawal deadline? University LMS logs (Canvas); institutional enrollment records Logistic regression; random forest; time-series feature engineering Build validated early-alert dashboard; reduce dropout by enabling timely advising interventions

4.4 Research Validity

Validity is the degree to which a study measures what it claims to measure and supports the conclusions drawn. Four types of validity are critical for every proposal.

Validity Type Question It Answers Common Threats
Internal Does the IV actually cause changes in the DV? Confounders, selection bias, maturation
External Can findings generalize beyond the study sample? Non-representative sample, artificial setting
Construct Do our measures actually capture intended constructs? Poor operationalization, mono-method bias
Statistical Are our statistical conclusions trustworthy? Low power, violated assumptions, multiple testing

4.5 Ethics and Reproducibility

Ethical and reproducible practice is not a formality — it determines whether your work can be trusted, built upon, and defended.

Ethics essentials for analytics projects:

  • Does your dataset contain personally identifiable information (PII)?
  • Has appropriate consent or a data use agreement been obtained?
  • Are protected groups (race, gender, health status) present, and how are fairness concerns addressed?
  • Have you considered potential harms from model errors or misuse?

Reproducibility best practices:

  • Document all data sources with version or access date.
  • Record all preprocessing decisions, not just final code.
  • Use a fixed random seed for any stochastic steps.
  • Store data and code in a version-controlled repository (e.g., GitHub).


Part 5: In-Class Activities

5.1 Claim Testing Discussion (Small Group)

Choose one claim and discuss how you would test it with data:

  • Social media presence increases business profits.
  • News headlines can predict stock prices.
  • Positive mentions on social media predict election outcomes.
  • Bedside manner improves patient treatment adherence.

For your selected claim, identify:

  1. Outcome variable
  2. Key predictor(s)
  3. Data source
  4. Likely limitations or confounders

5.2 Concept Definition Exercise

Pick one concept in your topic area and complete:

  • Concept definition (plain language)
  • Operational variable(s)
  • Measurement scale
  • Expected direction or hypothesis

5.3 Academic Integrity & Citations Module Check-in

A quick confirmation of course policies: - All sources must be properly cited in APA format. - Plagiarism (including undisclosed copy-pasting from generative AI) results in a zero. - Review the Academic Integrity module in Canvas.

5.4 Research Topic Brainstorm Assignment Prep

Before submission, ensure your topic includes:

  • A clear analytics-relevant problem
  • A feasible scope
  • Preliminary evidence that data exists
  • A short statement of expected contribution


## 5.4 Evaluating Research Questions: FINER Criteria

The FINER framework (Hulley et al., 2013) provides a systematic checklist for determining whether a research question is ready to anchor a proposal.

Criterion Meaning Self-Check Question
Feasible Achievable with available resources Can I realistically collect or access the data within two semesters?
Interesting Matters to a community Would an analytics practitioner or researcher find this worth reading?
Novel Adds something new Does this extend, replicate with variation, or challenge existing work?
Ethical Meets ethical standards Does the study comply with data privacy and IRB requirements?
Relevant Addresses a meaningful problem Does this connect to real-world analytics practice or public good?


Part 6: Week 1 Deliverables and Next Steps

6.1 Immediate Deliverables

  • Interest survey completion
  • Initial research topic brainstorm
  • Notes from concept-definition activity

6.2 Preparation for Week 2

  1. Build a keyword list for your topic.
  2. Find 5 to 8 relevant sources (start with review papers when possible).
  3. Draft 1 to 3 preliminary research questions.
  4. Record feasibility risks and mitigation plans.

6.3 Semester Timeline for Proposal Writing

Do not wait until Week 13 to write the proposal. The proposal is built sequentially.

6.4 Closing Note

Strong projects begin with clear thinking, realistic scope, and disciplined iteration. Prioritize feasibility and clarity now so that later methods and writing become easier and more rigorous.