This week we move from a broad research idea to a structured, justifiable research plan. Good research is always contextualized within the history of what has been done — you must show that you understand the existing landscape before you can make a meaningful contribution to it. To do this, we focus on three foundational skills:
A Framework is the intellectual structure for your study — an existing analytic, theoretical, or methodological tradition that guides how you approach your problem and interpret your results.
A Benchmark is your yardstick for success — an established standard from prior work that lets you answer the question “How good are my results compared to the best that’s been done?”
A Literature Review is the written argument that ties it all together — connecting existing research to your specific problem, establishing the importance of your research questions, and justifying your approach.
For different types of problems, frameworks exist for the best — or at least most common — approaches to that class of problem. A framework is not something you invent; it is something you adopt, adapt, and justify. Your choice signals to readers that you understand the research tradition your work belongs to.
Research frameworks fall into three broad categories, and a single study may incorporate more than one:
Analytic frameworks define the type of analysis you will conduct and the computational or statistical tools best suited for that analysis.
| Framework | Description | Example Use Case |
|---|---|---|
| Predictive Modeling | Build a model to estimate an unknown outcome from known inputs | Predicting student dropout from engagement data |
| Forecasting | Estimate future values of a time-series variable | Projecting enrollment trends over the next 5 years |
| Classification | Assign observations to discrete categories | Diagnosing disease from medical imaging |
| Clustering / Segmentation | Discover natural groupings in unlabeled data | Identifying patient risk profiles |
Theoretical frameworks provide the conceptual lens through which you interpret your problem and results. They connect your study to a body of scholarly thought.
| Framework | Description | Example Use Case |
|---|---|---|
| Game Theory | Models strategic decision-making among rational agents | Analyzing competitive pricing behavior |
| Information Theory | Quantifies information, uncertainty, and communication limits | Measuring entropy in communication networks |
| Tinto’s Model | Explains student persistence through academic and social integration | Predicting college dropout |
| Technology Acceptance Model | Explains user adoption of new technologies | Studying e-learning platform adoption |
Methodological frameworks define the overall research design strategy — how you will collect, generate, or analyze evidence.
| Framework | Description | Example Use Case |
|---|---|---|
| Experimental Design | Controlled manipulation of variables with randomization | Testing the effect of an intervention on outcomes |
| Quasi-Experimental | Comparison without full randomization | Pre/post analysis without a true control group |
| Data Mining | Extraction of patterns from large datasets | Mining electronic health records for adverse events |
| Survey Research | Systematic collection of self-reported data | Measuring student satisfaction across programs |
After reviewing the literature, choose the framework(s) that best fit your problem. A study may legitimately combine more than one — for example, using a predictive modeling analytic framework, guided by Tinto’s theoretical model, and executed through a data mining methodological approach.
Your justification should explain:
For the majority of analytic problems, some model or solution already exists. New models or solutions must be compared against them. As you define your problem and methodological approach, you must establish what the baseline or best current performance is for your task.
Without benchmarks, your result is just a number without meaning. A model with 83% accuracy could be excellent or mediocre — it depends entirely on what the field has already achieved.
Established datasets created specifically to test new models or solutions on common tasks. Using a standard benchmark dataset allows direct, apples-to-apples comparison with every other study that used it.
Examples:
Recorded performance values from prior high-quality studies for a given task under given conditions. These answer the question: “What is the best performance anyone has achieved on this specific problem?”
When reporting benchmark metrics, always capture:
| Field | Description |
|---|---|
| Author & Year | Who achieved this result |
| Task / Problem | Exactly what problem was being solved |
| Method | What model or approach was used |
| Performance Metric | The metric used (Accuracy, F1, AUC, RMSE, etc.) |
| Result | The numeric value |
| Conditions | Dataset, data splits, key parameters |
Exploratory or descriptive research generally does not have concrete numeric benchmarks, but it must still be compared to what is currently known about the topic or area. Your contribution is measured by how much new understanding you add relative to the existing body of knowledge — what patterns were previously unknown, what relationships were previously assumed but not demonstrated, or what populations were previously understudied.
Reading the chart: Your hypothetical F1 of 0.83 sits above the baseline (0.78) and prior work (0.82), and is competitive with the SOTA (0.85). Without this comparison, 0.83 has no interpretable meaning.
To determine the appropriate frameworks and benchmarks for your study, you must conduct a literature search. The goal is to accumulate 10–20 peer-reviewed sources, after which you have enough foundation to begin writing your literature review section.
| Source | Notes |
|---|---|
| Google Scholar | Best starting point; broad coverage; shows citation counts |
| Ask ChatGPT | Can suggest frameworks and papers — but always verify the sources exist and say what is claimed |
| Reference sections | Once you find a key paper, mine its reference list for related foundational work |
Source quality: Peer-reviewed journal articles and conference papers are the gold standard. White papers from reputable institutions are acceptable. Blog posts, news articles, and other informal sources are not acceptable as primary citations.
As you accumulate sources, track benchmark studies in a structured table:
| Author (Year) | Task / Problem | Method | Metric | Result | Notes |
|---|---|---|---|---|---|
| Smith et al. (2019) | Predicting dropout in MOOCs | Logistic Regression | Accuracy | 78% | Demographic features only. Good baseline. |
| Jones & Lee (2021) | Real-time dropout prediction | Random Forest | F1-Score | 0.85 | Clickstream data. Considered SOTA. |
| Chen (2020) | Identifying at-risk students | SVM | AUC | 0.82 | First two weeks of course only. |
Synthesis statement template: “The current literature shows that for [your task], state-of-the-art models achieve [SOTA metric]. Simpler baseline models achieve [baseline metric]. My goal is to exceed this baseline while [your specific contribution — e.g., using a more interpretable model, applying to a novel population, etc.].”
A literature review is not a summary exercise — it is a structured argument that does three specific things:
A well-written literature review is:
Avoid these common mistakes:
| What students write | What it actually is | Why it fails |
|---|---|---|
| “Smith (2020) found that X. Jones (2021) found that Y. Chen (2022) found that Z.” | An annotated bibliography | No argument, no connection, no synthesis |
| A list of paper summaries with no linking commentary | A data dump | Reader cannot see how it all relates to your project |
| Extensive technical explanation of algorithms you will use | A methods primer | Unless you are developing a new algorithm, briefly justify why the method fits — don’t explain how it works |
A good literature review reads like a coherent essay that happens to have citations, not like a collection of summaries stapled together.
Managing 10–20+ sources manually is error-prone. Zotero (https://www.zotero.org/) is the recommended free, open-source citation management tool.
What Zotero does:
HU’s Library has put together resources and guides to help you get started with Zotero.
Before writing, you need a plan. Three proven methods for organizing a literature review are:
A literature map is a diagram that links important concepts and clusters your sources around those concepts. It is the most visual approach and is excellent for seeing the structure of a large body of literature at a glance.
Many templates and examples for literature maps are available online. Search for “literature map examples template” to find formats that match your project’s complexity.
Essentially a literature map formatted as a bulleted outline. Each subheading represents a major concept or theme, and sources are listed under the subheading they belong to.
Rules of thumb: * Start from the broadest concepts and oldest, most foundational methods and progressively narrow to what is unknown or evolving * No more than 2 subheadings per page of finished text * At least 4 citations under each subheading
Example Topic: The influence of Twitter marketing on tech stocks
Subheading 1: Twitter Marketing
Subheading 2: Tech Industry Marketing
Subheading 3: Factors Impacting Tech Stocks
The most rigorous organizational method. Instead of just listing topics, you write the actual first sentence of each paragraph in your literature review before writing the full text. Each first sentence is a claim or premise that your supporting citations will back up.
This approach forces you to think through the logical argument your literature review is making — every sentence must connect to the next, moving from broad context toward your specific research questions.
Example Topic: Changes in communication styles of political leaders
- Most scholars and political commentators agree that Donald Trump is unlike any previous US president.
- During his campaign and early presidency, many claimed that Trump’s simplicity and directness were keys to his popularity.
- A second major distinguishing feature of Trump’s language is his self-confidence.
- We build on an extensive research tradition of exploring political leadership traits.
- If conveying simple messages with confidence is a powerful way to persuade voters, it is incumbent on researchers to determine effective ways to measure these dimensions reliably, quickly, and efficiently.
- A common distinction in language is between content words and function words.
- The eight categories of function words capture an underlying psychological dimension known as analytic thinking.
- Past work has validated the use of function words as markers of the analytic thinking dimension, including a study of college admission essays from over 25,000 incoming students.
- The analysis of function words has also been useful in identifying people’s relative status or “clout” in a social hierarchy.
- The computerized text analysis program LIWC was updated in 2015 to include empirically established measures of both analytic thinking and confidence.
- The algorithm developed from Kacewicz et al. is formally referred to as clout.
- The purpose of this project was to apply linguistic analyses to large corpora of texts from the American presidency, non-US leaders, legislative bodies, and cultural contexts spanning multiple decades.
Notice how each sentence builds logically on the last — this is a complete argumentative skeleton for a literature review before a single full paragraph is written.
Key reminders:
Your framework, benchmarks, and literature review are not separate deliverables — they are one integrated argument.
Example narrative statement:
“My research uses a predictive modeling analytic framework, grounded in an adapted version of Tinto’s Student Integration Model, to predict student dropout in online courses. A literature search revealed that the state-of-the-art F1-Score for this task is 0.85 (Jones & Lee, 2021), using clickstream features, with a baseline of 78% accuracy from simpler demographic models (Smith et al., 2019). My study extends this work by applying theoretically-grounded features tied explicitly to academic and social integration constructs, which prior benchmark studies did not use. By doing so, my model aims to be both predictive and interpretable — addressing a limitation in existing SOTA work that prioritizes performance over explainability.”