📊 Data Sciences Repository

ANLY 500 - Analytics I | Harrisburg University

Welcome to ANLY 500 Analytics I

This course provides comprehensive R programming instruction for data analytics, covering foundational concepts through advanced statistical modeling and visualization. Each week builds upon previous knowledge with hands-on, code-first workflows grounded in statistical theory and reproducible research practices.

Primary Textbook: Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications.

Author: Ziyuan Huang | Course: Analytics I | Institution: Harrisburg University

View Repository on GitHub 📄 Course Syllabus (PDF)

📚 Weekly Tutorials

Week 01: Introduction to R

Complete beginner's guide to R programming! Learn R basics, data types, structures (vectors, matrices, data frames), functions, data manipulation, visualization, and practical analysis with Palmer Penguins dataset. 40+ visualizations with plain English explanations.

Week 02: R for Data Analytics

Introduction to R programming, data types, descriptive statistics, and basic hypothesis testing. Learn central tendency, dispersion measures, and visualization fundamentals.

Week 03: Data Wrangling & EDA

Master data import, tidy data principles, dplyr verbs, handling missing values, and exploratory data analysis with ggplot2. Includes correlation and basic inference.

Week 04: Statistical Inference

Deep dive into hypothesis testing, confidence intervals, effect sizes (Cohen's d), statistical power, and Type I/II errors. Learn the fundamental equation: Outcome = Model + Error.

Week 05: Data Visualization

Comprehensive ggplot2 tutorial covering histograms, scatterplots, bar graphs, line graphs, data reshaping, and professional themes. Master the grammar of graphics!

Week 06: Data Screening Part 1

Essential data cleaning techniques: accuracy checking, handling missing data (MCAR, MAR, MNAR), multiple imputation with MICE, and multivariate outlier detection using Mahalanobis distance.

Week 07: Data Screening Part 2

Comprehensive assumption checking with 60+ visualizations! Master independence, multicollinearity, linearity, normality, and homoscedasticity. Includes influential cases, Cook's Distance, and transformation guide.

Week 08: Correlation Analysis

Complete correlation guide: Pearson, Spearman, Kendall, point-biserial, partial & semi-partial correlations. Modern visualizations with corrplot & ggpairs. Includes comparing correlations, effect sizes, and APA reporting.

Week 09: Linear Regression

Introduction to Linear Regression. Simple and multiple regression, hierarchical models, dummy coding, and regression diagnostics. Learn to predict outcomes and test specific hypotheses.

Week 10: Mediation & Moderation

Advanced regression topics: Mediation (mechanisms) and Moderation (interactions). Learn modern Bootstrapping methods, Simple Slopes analysis, and how to interpret "It Depends" effects.

Week 11: Comparing Two Means

Comparing two means using the t-test (Independent & Paired). Learn about Signal-to-Noise Ratio, Assumptions (Normality, Homogeneity), and Effect Sizes (Cohen's d).

Week 12: Comparing Several Means (ANOVA)

Introduction to One-Way ANOVA. Understand the F-Ratio (Signal-to-Noise), Post Hoc Tests (Bonferroni), and Effect Sizes (Omega Squared). Visualizing SST, SSM, and SSR.

🎯 Key Features

  • Code-First Approach: All examples are executable and reproducible
  • Theory-Grounded: IEEE-style citations from Field et al. (2012)
  • Beginner-Friendly: Extensive explanations and interpretations
  • Professional Quality: Publication-ready figures and tables
  • Reproducible: Seedhash integration for consistent results

📖 References

  • Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Lawrence Erlbaum Associates.
  • Field, A., Miles, J. and Field, Z. (2012). Discovering Statistics Using R. Sage Publications Ltd., London.
  • Hays, W. L. (1994). Statistics (5th ed.). Harcourt Brace.
  • Howell, D. C. (2012). Statistical Methods for Psychology (8th ed.). Cengage Learning.
  • Little, R. J. A., & Rubin, D. B. (2020). Statistical Analysis with Missing Data (3rd ed.). Wiley.
  • Tabachnick, B. G., & Fidell, L. S. (2019). Using Multivariate Statistics (7th ed.). Pearson.
  • Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Graphics Press.
  • Van Buuren, S. (2018). Flexible Imputation of Missing Data (2nd ed.). CRC Press.
  • Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (2nd ed.). Springer.

🤖 Data Sciences Repository

ANLY 530 - Machine Learning | Harrisburg University

Welcome to ANLY 530 Machine Learning

This course introduces the theory and practice of machine learning using Python with industry-standard libraries. Topics progress from classical supervised and unsupervised learning through deep neural networks, covering the end-to-end ML project workflow, model evaluation, hyperparameter tuning, and modern architectures.

Primary Textbook: Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media.

Author: Ziyuan Huang | Course: Machine Learning | Institution: Harrisburg University

View Repository on GitHub 📄 Course Syllabus (PDF)

📚 Weekly Tutorials

Week 01: Introduction to Machine Learning NEW

Overview of the ML landscape: supervised vs. unsupervised learning, batch vs. online learning, and instance-based vs. model-based learning. Covers the end-to-end ML project workflow and key challenges.

Week 02: Decision Trees NEW

Deep dive into the CART algorithm — Gini impurity, entropy, class probabilities, decision boundaries, regularization (cp, max_depth, minsplit), instability, regression trees, feature importance, and partial dependence plots. Includes IEEE-cited references to Géron (2019) and Boehmke & Greenwell (2020).

Week 03: Ensemble Learning & Random Forests NEW

From bagging and voting classifiers to Random Forests, Extra-Trees, and Gradient Boosting. Covers OOB evaluation, feature importance, hyperparameter tuning (mtry, nodesize), regression forests, and stacking. Includes IEEE-cited references to Géron (2019) and Boehmke & Greenwell (2020).

Weeks 04–15 Coming Soon

Gradient Boosting, XGBoost, SVMs, dimensionality reduction, neural networks, CNNs, RNNs, model evaluation, hyperparameter tuning, and production deployment strategies.

🎯 Key Features

  • Code-First Approach: All examples are executable and reproducible in Python
  • Theory-Grounded: Concepts tied to Géron (2019) and peer-reviewed sources
  • Hands-On: Real datasets with end-to-end pipelines
  • Modern Toolchain: Scikit-Learn, Keras, TensorFlow 2.x

📖 References

  • Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media.
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed.). Springer.
  • Boehmke, B. C., & Greenwell, B. M. (2020). Hands-On Machine Learning with R. CRC Press. https://bradleyboehmke.github.io/HOML/DT.html

📝 Data Sciences Repository

GRAD 695 - Research Methodology & Writing | Harrisburg University

Welcome to GRAD 695 Research Methodology & Writing

This course prepares students for analytics research through IEEE Access-format proposal development, research design fundamentals, and ethical research practices. Students learn to frame research questions, design methodological approaches, and communicate findings effectively in academic publication format.

Target Format: IEEE Access 2024 (dual-column) | Ethics: CITI Program Training Required

Author: Ziyuan Huang | Course: Research Methodology & Writing | Institution: Harrisburg University

View Repository on GitHub 📄 Course Syllabus (PDF)

📚 Course Content

Executive Session 01: Research Foundations NEW

Introduction to the research process, problem definition, analytics research methodology overview, and GRAD 695/699 two-course capstone sequence structure. Includes page count expectations and IEEE Access formatting requirements.

Week 02: Frameworks, Benchmarks & Literature Reviews NEW

Types of research frameworks (analytic, theoretical, methodological), benchmark datasets vs. metrics, conducting a literature search, and three methods for organizing a literature review (literature maps, subheadings, first sentences). Includes Zotero citation management and final guidance on writing a coherent literature review argument.

Weeks 03–15 Coming Soon

Research question formulation, methodology design, data collection, analysis planning, results presentation, discussion frameworks, and APA 7th edition writing conventions.

🎯 Key Features

  • Publication-Ready Format: IEEE Access 2024 dual-column template with LaTeX/Pandoc
  • Ethics First: CITI Program training required for IRB preparation
  • Two-Course Sequence: GRAD 695 (proposal ~5 dual-column pages) → GRAD 699 (full thesis ~10 dual-column pages)
  • Professional Writing: APA 7th edition citation and style requirements

📖 Key Resources

  • American Psychological Association. (2020). Publication Manual of the American Psychological Association (7th ed.). APA.
  • IEEE. (2024). IEEE Access Author Instructions. Available at ieeeaccess.ieee.org
  • CITI Program. Human Subjects Research Ethics Training. Available at about.citiprogram.org

🎓 Data Sciences Repository

ANLY 699 - Applied Project in Analytics | Harrisburg University

Welcome to ANLY 699 Applied Project in Analytics

This capstone course involves the application of analytics methods and techniques to real-world problems. Students conduct comprehensive research projects demonstrating mastery of analytics principles, tools, and APA 7th edition scientific writing standards. The course emphasizes end-to-end project execution from problem definition through final presentation.

Writing Standard: APA 7th Edition | Format: Research Paper & Presentation

Author: Ziyuan Huang | Course: Applied Project in Analytics | Institution: Harrisburg University

View Repository on GitHub 📄 Course Syllabus (PDF)

📚 Research Writing Guide

Week 01: Research Foundation

Understanding the scientific research process, components of research papers, and APA 7 overview with basic formatting standards for analytics research.

Week 02: Literature Review

Searching academic databases, evaluating source credibility, synthesizing research findings, and building theoretical frameworks for analytics projects.

Week 03: Research Questions

Formulating research questions, developing testable hypotheses, and defining variables and constructs in analytics research contexts.

Week 04: Methodology

Research design fundamentals, quantitative methodology, sampling strategies, validity and reliability, ethics and IRB considerations.

Week 05: Data Collection

Data sources and acquisition, survey design and measurement, data quality assessment, and documentation requirements.

Week 06: Data Analysis

Selecting appropriate statistical tests, checking assumptions and diagnostics, effect sizes and power analysis.

Week 07: Results

Reporting statistical findings, organizing results logically, integrating tables and figures, what to include vs. exclude.

Week 08: Discussion

Interpreting findings, linking results to literature, addressing limitations, implications and future research directions.

Week 09: Abstract & Introduction

Writing effective abstracts, introduction structure (funnel approach), establishing research significance, stating purpose and hypotheses.

Week 10: Citations & References

In-text citation formats, reference list formatting, managing citations in R Markdown, avoiding plagiarism.

Week 11: Figures & Tables

APA-compliant figure formatting, table design and numbering, captions and notes, R code for publication-ready graphics.

Week 12: Revision & Editing

Self-editing strategies, peer review process, common writing pitfalls, strengthening arguments and clarity.

Week 13: Final Draft

Integrating all components, final formatting checks, preparing submission materials, quality assurance checklist.

Week 14: Presentation

Creating effective research presentations, designing clear slides, delivering findings professionally, Q&A preparation.

🎯 Key Features

  • APA 7th Edition: Complete guide to scientific writing standards for analytics research
  • 14-Week Tutorial: Structured progression from foundation through final presentation
  • Applied Focus: Real-world analytics projects demonstrating end-to-end execution
  • Research Integration: Connecting data analysis with scientific methodology and reporting

📖 Key Resources

  • American Psychological Association. (2020). Publication Manual of the American Psychological Association (7th ed.). APA.
  • Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications.
  • 14-Week Research Writing Guide: RESEARCH_WRITING_GUIDE.md