With the goal of analyzing contributing factors to delays on TTC buses, streetcars and the subway, we decided to conduct exploratory data analysis on the geographic hotspots of subway delays, the top incidence types, the peak times for delays, and the distribution of delay durations for each transport type.
First, we plot the number of delays reported at each subway station in Toronto in 2024. As seen below, it seems like more delays occur at the endpoints of each line, namely at the Kennedy, Kipling, and Finch stations. Moreover, the figure suggests that the highest number of delays occurs at Bloor station, with 1008, although the average delay is only 1.97 minutes. Interacting with the full map, it suggests that disorderly patrons make up a substantial amount of the reasons for delays.
Hence, we decide to investigate the most frequent reasons for delays for each transport type, as in the below figures. This is vital to bring insight into how future delays might be mitigated and specific solutions to target. This plot suggests that mechanical reasons are the top reason for delays in buses, whereas operations and security are the top reasons for streetcars. Interestingly, the top reasons for delays in the subway are passenger-related, including disorderly patrons, illness and injury, matching what was previously suggested.
Then, we investigate the peak times for delays in each of these transport types. As shown below, the frequency of delays for the subway and streetcars seems relatively consistent over time, except from around 2am to 5am, when it is usually closed and with slight peaks at 8am and 4pm, at peak hour. This led us to include a variable to indicate rush hour for our model. Moreover, the bus has a significant peak at 5pm, supporting this decision.
After removing outliers from the data, we investigate how the distribution of the delay durations vary between the different transport types in the following figures. The violin plots suggest that the delay durations for buses has a large spread, as compared to streetcars and the subway. The subway seems to have a significant number of values near 0, while the streetcar has peaks at around 0 and 10 minutes. The animated figure shows how this varies over months: it seems that the number of delays increases in the summer months, although the general pattern remains consistent over the months. This motivated us to include season as a variable of interest in our models.