Part 8: Feature Importance The model has been trained. The next natural question is: out of the 142 features that went in, which ones actually drove the predictions? Tree-based models like LightGBM give a direct answer to this question through feature importance scores. This post explains what those scores mean, why there are two different... Continue Reading →
Time Series with Machine Learning (Part 7)
Part 7: Training LightGBM; Gradient Boosting, Hyperparameters, and Early Stopping The feature set is ready. The validation split is in place. The next step is fitting the model, and understanding what is actually happening during that fit. LightGBM is a gradient boosted tree framework, which means the training process is iterative: it builds one tree... Continue Reading →
Part 4: What Happens After Someone Clicks (Google Analytics Explained)
In the previous part, I focused on how Google sees my blog. Now I want to answer a different question: What actually happens after someone clicks? Quick note: Google Analytics 4 (GA4) is the latest version of Google Analytics. It focuses on user behavior after someone lands on your site; what they do, how long... Continue Reading →
Part 3: How Google Sees Your Blog (Search Console Explained)
After setting everything up, I started seeing metrics. But I didnโt really understand them. Search Console gives you a lot of data, but without context, itโs hard to interpret. So I tried to answer a simple question: What do these numbers actually mean? What Search Console really shows Search Console doesnโt tell you who visited... Continue Reading →
Part 2: What actually happens after you start a blog?
When I restarted my blog, I thought the main effort would be writing. It wasnโt. The confusing part started after publishing. Resetting the timeline This blog technically goes back to 2016, but that period is not meaningful. There were a few scattered posts over the years: a short โIโll start a blogโ note (10.05.2016) my... Continue Reading →
Time Series with Machine Learning (Part 6)
Part 6: Measuring the Right Thing, Splitting the Right Way The Cost Function and the Validation Split Before training a model, two decisions need to be locked in. First, what does "good performance" actually mean? Which metric will be used to judge the model? Second, how will the available data be split so that the... Continue Reading →
Time Series with Machine Learning (Part 5)
Part 5: The Final Preparations Before Modelling The feature engineering is almost done. Four families of features now cover calendar position, lagged history, rolling averages, and exponentially weighted averages. Two housekeeping steps remain before the model can be trained: encoding the categorical columns properly and transforming the target variable. One-Hot Encoding The dataset still contains... Continue Reading →
Time Series with Machine Learning (Part 4)
Exponentially Weighted Mean Features The rolling mean from the previous post computes a simple average over a fixed window of past values. Every observation inside the window contributes equally, a sale from 364 days ago counts just as much as a sale from 2 days ago. The triangular window improved on this slightly by weighting... Continue Reading →
Time Series with Machine Learning (Part 3)
The lag features from the previous post handed the model raw historical sales values, the exact number sold on a specific day 91 or 364 days ago. That is precise, but precision is not always what you want. A single day's sales figure is noisy. It might have been a public holiday, an unusual promotion,... Continue Reading →
Roller Skating (Day 10)
Plow Stop After trying the toe stop drag, I wanted to learn a stopping method that feels a bit more controlled. So today I worked on the plow stop (also known as plough stop). The plow stop starts from a position that already feels quite different from normal skating. Feet are wide apart, knees are... Continue Reading →
