In this notebook, we\'ll explore time series forecasting using Facebook\'s Prophet library. Prophet is designed to make forecasting accessible to non-experts while still providing powerful capabilities for business analysts and data scientists.
Time series forecasting predicts future values based on historical patterns in data that changes over time. For ride bookings, this could help predict:
Prophet excels at handling:
Let\'s start by exploring our ride booking data!
Our ride booking dataset contains 150,000 rides spanning the entire year of 2024 (365 days). Here\'s what we discovered:
We\'ll focus on total daily ride demand (all booking attempts) to predict future business volume. This gives us insights into:
Next, let\'s aggregate the data by date to create our time series!
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[15], line 2</span> <span style='color:var(--green,#0a0)'> 1</span> <span style='color:#5f8787'><i># Convert the SQL result to pandas for Prophet</i></span> <span style='color:var(--green,#0a0)'>----> 2</span> df = <span style='background:var(--yellow,#a60)'>dataframe_5</span>.to_pandas() <span style='color:var(--green,#0a0)'> 4</span> <span style='color:#5f8787'><i># Convert date column to datetime</i></span> <span style='color:var(--green,#0a0)'> 5</span> df[<span style='color:var(--yellow,#a60)'>'ds'</span>] = pd.to_datetime(df[<span style='color:var(--yellow,#a60)'>'ds'</span>]) <span style='color:var(--red,#a00)'>NameError</span>: name 'dataframe_5' is not defined
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[16], line 9</span> <span style='color:var(--green,#0a0)'> 6</span> fig, ax = plt.subplots(<span style='color:var(--green,#0a0)'>1</span>, <span style='color:var(--green,#0a0)'>1</span>, figsize=(<span style='color:var(--green,#0a0)'>14</span>, <span style='color:var(--green,#0a0)'>6</span>)) <span style='color:var(--green,#0a0)'> 8</span> <span style='color:#5f8787'><i># Plot the time series</i></span> <span style='color:var(--green,#0a0)'>----> 9</span> ax.plot(<span style='background:var(--yellow,#a60)'>df</span>[<span style='color:var(--yellow,#a60)'>'ds'</span>], df[<span style='color:var(--yellow,#a60)'>'y'</span>], color=<span style='color:var(--yellow,#a60)'>'#1f77b4'</span>, linewidth=<span style='color:var(--green,#0a0)'>2</span>, label=<span style='color:var(--yellow,#a60)'>'Daily Ride Bookings'</span>) <span style='color:var(--green,#0a0)'> 11</span> <span style='color:#5f8787'><i># Add trend line</i></span> <span style='color:var(--green,#0a0)'> 12</span> days_numeric = np.arange(<span style='color:#008700'>len</span>(df)) <span style='color:var(--red,#a00)'>NameError</span>: name 'df' is not defined
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[18], line 16</span> <span style='color:var(--green,#0a0)'> 14</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>"📊 Training the model on 2024 data..."</span>) <span style='color:var(--green,#0a0)'> 15</span> <span style='color:#5f8787'><i># Fit the model</i></span> <span style='color:var(--green,#0a0)'>---> 16</span> model.fit(<span style='background:var(--yellow,#a60)'>df</span>) <span style='color:var(--green,#0a0)'> 18</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>"✅ Model training completed!"</span>) <span style='color:var(--green,#0a0)'> 19</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>f"📊 Model trained on </span><span style='color:#af5f87'><b>{</b></span><span style='color:#008700'>len</span>(df)<span style='color:#af5f87'><b>}</b></span><span style='color:var(--yellow,#a60)'> days of data"</span>) <span style='color:var(--red,#a00)'>NameError</span>: name 'df' is not defined
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>Exception</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[19], line 5</span> <span style='color:var(--green,#0a0)'> 2</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>"🔮 Generating predictions for the next 60 days..."</span>) <span style='color:var(--green,#0a0)'> 4</span> <span style='color:#5f8787'><i># Create future dates (60 days beyond our training data)</i></span> <span style='color:var(--green,#0a0)'>----> 5</span> future = <span style='background:var(--yellow,#a60)'>model.make_future_dataframe(periods=</span><span style='color:var(--green,#0a0)'><span style='background:var(--yellow,#a60)'>60</span></span><span style='background:var(--yellow,#a60)'>)</span> <span style='color:var(--green,#0a0)'> 6</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>f"📅 Prediction period: </span><span style='color:#af5f87'><b>{</b></span>future[<span style='color:var(--yellow,#a60)'>'ds'</span>].max().strftime(<span style='color:var(--yellow,#a60)'>'%Y-%m-</span><span style='color:#af5f87'><b>%d</b></span><span style='color:var(--yellow,#a60)'>'</span>)<span style='color:#af5f87'><b>}</b></span><span style='color:var(--yellow,#a60)'> (60 days ahead)"</span>) <span style='color:var(--green,#0a0)'> 8</span> <span style='color:#5f8787'><i># Generate predictions</i></span> <span style='color:var(--cyan,#0aa)'>File </span><span style='color:var(--green,#0a0)'>~/data/.venv/lib/python3.12/site-packages/prophet/forecaster.py:1864</span>, in <span style='color:var(--cyan,#0aa)'>Prophet.make_future_dataframe</span><span style='color:var(--blue,#00a)'>(self, periods, freq, include_history)</span> <span style='color:var(--green,#0a0)'> 1849</span> <span style='color:var(--yellow,#a60)'><i>"""Simulate the trend using the extrapolated generative model.</i></span> <span style='color:var(--green,#0a0)'> 1850</span> <span style='color:var(--green,#0a0)'> 1851</span> <span style='color:var(--yellow,#a60)'><i>Parameters</i></span> <span style='color:var(--green,#0a0)'> (...) 1861</span> <span style='color:var(--yellow,#a60)'><i>requested number of periods.</i></span> <span style='color:var(--green,#0a0)'> 1862</span> <span style='color:var(--yellow,#a60)'><i>"""</i></span> <span style='color:var(--green,#0a0)'> 1863</span> <span style='color:#008700'><b>if</b></span> <span style='color:#008700'>self</span>.history_dates <span style='color:#af00ff'><b>is</b></span> <span style='color:#008700'><b>None</b></span>: <span style='color:var(--green,#0a0)'>-> 1864</span> <span style='color:#008700'><b>raise</b></span> <span style='color:#d75f5f'><b>Exception</b></span>(<span style='color:var(--yellow,#a60)'>'Model has not been fit.'</span>) <span style='color:var(--green,#0a0)'> 1865</span> <span style='color:#008700'><b>if</b></span> freq <span style='color:#af00ff'><b>is</b></span> <span style='color:#008700'><b>None</b></span>: <span style='color:var(--green,#0a0)'> 1866</span> <span style='color:#5f8787'><i># taking the tail makes freq inference more reliable</i></span> <span style='color:var(--green,#0a0)'> 1867</span> freq = pd.infer_freq(<span style='color:#008700'>self</span>.history_dates.tail(<span style='color:var(--green,#0a0)'>5</span>)) <span style='color:var(--red,#a00)'>Exception</span>: Model has not been fit.
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[20], line 8</span> <span style='color:var(--green,#0a0)'> 5</span> ax1 = axes[<span style='color:var(--green,#0a0)'>0</span>] <span style='color:var(--green,#0a0)'> 7</span> <span style='color:#5f8787'><i># Plot historical data</i></span> <span style='color:var(--green,#0a0)'>----> 8</span> historical_data = <span style='background:var(--yellow,#a60)'>forecast</span>[forecast[<span style='color:var(--yellow,#a60)'>'ds'</span>] <= df[<span style='color:var(--yellow,#a60)'>'ds'</span>].max()] <span style='color:var(--green,#0a0)'> 9</span> future_data = forecast[forecast[<span style='color:var(--yellow,#a60)'>'ds'</span>] > df[<span style='color:var(--yellow,#a60)'>'ds'</span>].max()] <span style='color:var(--green,#0a0)'> 11</span> <span style='color:#5f8787'><i># Historical actual vs predicted</i></span> <span style='color:var(--red,#a00)'>NameError</span>: name 'forecast' is not defined
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[22], line 6</span> <span style='color:var(--green,#0a0)'> 3</span> <span style='color:#008700'>print</span>(<span style='color:var(--yellow,#a60)'>"="</span> * <span style='color:var(--green,#0a0)'>40</span>) <span style='color:var(--green,#0a0)'> 5</span> <span style='color:#5f8787'><i># Get a sample week to understand weekly patterns</i></span> <span style='color:var(--green,#0a0)'>----> 6</span> sample_week = <span style='background:var(--yellow,#a60)'>forecast</span>[forecast[<span style='color:var(--yellow,#a60)'>'ds'</span>] >= <span style='color:var(--yellow,#a60)'>'2024-12-23'</span>][<span style='color:var(--green,#0a0)'>0</span>:<span style='color:var(--green,#0a0)'>7</span>] <span style='color:#5f8787'><i># Last week of year</i></span> <span style='color:var(--green,#0a0)'> 7</span> weekdays = [<span style='color:var(--yellow,#a60)'>'Monday'</span>, <span style='color:var(--yellow,#a60)'>'Tuesday'</span>, <span style='color:var(--yellow,#a60)'>'Wednesday'</span>, <span style='color:var(--yellow,#a60)'>'Thursday'</span>, <span style='color:var(--yellow,#a60)'>'Friday'</span>, <span style='color:var(--yellow,#a60)'>'Saturday'</span>, <span style='color:var(--yellow,#a60)'>'Sunday'</span>] <span style='color:var(--green,#0a0)'> 9</span> <span style='color:#008700'><b>for</b></span> i, (_, row) <span style='color:#af00ff'><b>in</b></span> <span style='color:#008700'>enumerate</span>(sample_week.iterrows()): <span style='color:var(--red,#a00)'>NameError</span>: name 'forecast' is not defined
<span style='color:var(--red,#a00)'>---------------------------------------------------------------------------</span> <span style='color:var(--red,#a00)'>NameError</span> Traceback (most recent call last) <span style='color:var(--cyan,#0aa)'>Cell </span><span style='color:var(--green,#0a0)'>In[23], line 6</span> <span style='color:var(--green,#0a0)'> 4</span> <span style='color:#5f8787'><i># 1. Model Accuracy (Historical fit)</i></span> <span style='color:var(--green,#0a0)'> 5</span> ax1 = axes[<span style='color:var(--green,#0a0)'>0</span>, <span style='color:var(--green,#0a0)'>0</span>] <span style='color:var(--green,#0a0)'>----> 6</span> historical_actual = <span style='background:var(--yellow,#a60)'>df</span>[<span style='color:var(--yellow,#a60)'>'y'</span>] <span style='color:var(--green,#0a0)'> 7</span> historical_predicted = historical_data[<span style='color:var(--yellow,#a60)'>'yhat'</span>] <span style='color:var(--green,#0a0)'> 9</span> <span style='color:#5f8787'><i># Scatter plot of actual vs predicted</i></span> <span style='color:var(--red,#a00)'>NameError</span>: name 'df' is not defined
We successfully demonstrated Facebook\'s Prophet library using real ride booking data, creating a comprehensive time series forecasting solution:
Prophet proved to be an excellent choice for this stable, business-critical time series forecasting task!