研究生: |
詹益杰 Chan, Yi-Chieh |
---|---|
論文名稱: |
棒球投手受傷的復發事件分析 Statistical Analysis on the Baseball Pitcher’s Recurrent Injuries |
指導教授: |
呂翠珊
Lu, Tsui-Shan |
口試委員: |
呂翠珊
Lu, Tsui-Shan 蔡碧紋 Tsai, Pi-Wen 温啟仲 Wen, Chi-Chung |
口試日期: | 2024/06/25 |
學位類別: |
碩士 Master |
系所名稱: |
數學系 Department of Mathematics |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 37 |
中文關鍵詞: | 存活分析 、復發事件 、邊際模型 、棒球 、投手 |
英文關鍵詞: | survival analysis, recurrent event, marginal model, baseball, pitcher |
研究方法: | 實驗設計法 |
DOI URL: | http://doi.org/10.6345/NTNU202401050 |
論文種類: | 學術論文 |
相關次數: | 點閱:90 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
過去棒球數據分析主要著眼於選手的實力評估,像是使用Wins Above Replace-ment(WAR)值的指標,這些指標用來衡量一名球員相對於其他替代球員所能帶來的勝場數。此外,這些數據也被用來預測球員未來的表現,以及預測球隊在未來賽季中的勝場數等等。這些分析屬於賽博計量學的範疇,著重於數據背後的數學和統計模型。然而,過去的數據分析卻忽略了選手健康狀況在比賽中的重要性。究竟是什麼因素導致選手容易受傷受傷?是變化球的使用?與球種有關?或者是年齡、球速、以及球員的出賽次數和相關疲勞指標呢?從2015年到2023年間,Major League Baseball(MLB)開始記錄有達成一定局數限制下的投手以及他們的相關數據,包括他們的變化球種、球速、年齡等等。而一名投手的職業生涯中,可能發生多次受傷事故,這就意味著一位投手會有多個存活事件,為了探討這個問題,本論文使用存活分析中常見的復發事件邊際模型,包括Anderson-Gill 模型(AG)、Prentice-Williams-Peterson模型(PWP)和Wei-Lin-Weissfeld模型(WLW)。這些模型允許我們對擁有多次受傷事件的投手進行相關變數分析,從而更全面地理解各種因素對於投手健康和受傷風險的影響。我們也根據三個模型間的特性,比較模型間的差異。而在變數篩選時,將變數進行分類後,建立最佳模型以預測受傷風險。
In the past, baseball data analysis has primarily focused on evaluating player performance,including metrics such as Wins Above Replacement (WAR), which measures the number of wins a player contributes compared to a replacement-level player. Additionally, these metrics are used to predict future player performance and forecast a team’s wins in upcoming seasons. This analysis falls within the realm of sabermetrics, emphasizing the mathematical and statis-tical models behind the data. However, past data analysis has overlooked the importance of player health during games. What factors contribute to player injuries? Is it related to the use of breaking balls? Pitch types? Or perhaps age, pitch velocity, player appearances, and related fatigue indicators? From 2015 to 2023, the data on pitchers who achieved a certain inning threshold in Major League Baseball (MLB) were recorded, including their pitch types, velocity, age, and more. Over a pitcher’s career, multiple injury incidents may occur, meaning a pitcher will have multiple events. To address this, we turned to the use of common recurrent event marginal models in survival analysis, including the Anderson-Gill model (AG), Prentice- Williams-Peterson model (PWP), and Wei-Lin-Weissfeld model (WLW). These models allow us to analyze relevant variables for pitchers with multiple events comprehensively, thereby gain-ing a more comprehensive understanding of the various factors affecting pitcher health and in-jury risk. We determined which model to use based on the characteristics of the data among the three models and compared the performance of three models intensively. During variable selection and categorizing the variables, the most appropriate combinations were identified to establish the optimal model for prediction of injury risks.
E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,”Journal of the American Statistical Association, vol. 53, pp. 457–481, 1958.
D. R. Cox, “Regression models and life-tables,” Journal of the Royal Statistical Society:Series B (Methodological), vol. 34(2), pp. 187–202, 1972.
R. L. Prentice, B. J. Williams, and A. V. Peterson, “On the regression analysis of multi-variate failure time data,” Biometrika, vol. 68, no. 2, pp. 373–379, 1981.
P. K. Andersen and R. D. Gill, “Cox’s regression model for counting processes: A large sample study,” The Annals of Statistics, vol. 10, no. 4, pp. 1100–1120, 1982.
L. J. Wei, D. Y. Lin, and L. Weissfeld, “Regression analysis of multivariate incomplete failure time data by modeling marginal distributions,” Journal of the American Statistical Association, vol. 84, no. 408, pp. 1065–1073, 1989.
T. Fleming and D. Harrington, Counting Processes and Survival Analysis. John Wiley and Sons, 1991.
T. M. Therneau and P. M. Grambsch, Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
B. S. Leclerc, C. Bégin, E. Cadieux, L. Goulet, N. Leduc, M.-J. Kergoat, and P. Lebel, “Risk factors for falling among community-dwelling seniors using home-care services:an extended hazards model with time-dependent covariates and multiple events,” Chronic Dis Can, vol. 28, no. 4, pp. 111–120, 2008.
X. Xue, S. J. Gange, Y. Zhong, R. D. Burk, H. Minkoff, L. S. Massad, D. H. Watts, M. H. Kuniholm, K. Anastos, A. M. Levine, M. Fazzari, G. D’Souza, M. Plankey, J. M. Palef-sky, and H. D. Strickler, “Marginal and mixed-effects models in the analysis of human 36 papillomavirus natural history data,” Cancer Epidemiol Biomarkers Prev, vol. 19, no. 1,pp. 159–169, 2010.
L. S. Dalrymple, K. L. Johansen, G. M. Chertow, S.-C. Cheng, B. Grimes, E. B. Gold, and G. A. Kaysen, “Infection-related hospitalizations in older patients with esrd,” Am J Kidney Dis, vol. 56, no. 3, pp. 522–530, 2010.
I. Sagara, R. Giorgi, O. K. Doumbo, R. Piarroux, and J. Gaudart, “Modelling recurrent events: comparison of statistical models with continuous and discontinuous risk intervals on recurrent malaria episodes data,” Malar J, vol. 13, p. 293, 2014.
S. Ullah, T. J. Gabbett, and C. F. Finch, “Statistical modelling for recurrent events: an application to sports injuries,” Br J Sports Med, vol. 48, no. 17, pp. 1287–1293, 2014.
A.-K. Ozga, M. Kieser, and G. Rauch, “A systematic comparison of recurrent event models for application to composite endpoints,” BMC Med Res Methodol, vol. 18, no. 1, p. 2,2018.
L. M. Tesfaw and E. K. Muluneh, “Exploring and modeling recurrent birth events in ethiopia: Emdhs 2019,” BMC Pregnancy Childbirth, vol. 22, no. 1, p. 617, 2022.
林建甫, 存活分析(二版). 雙葉書廊, 2020.