<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Code &amp; Curiosity: A Developer&apos;s Notebook</title>
    <description>A personal blog documenting my experiences in data science, statistics, and programming.</description>
    <link>https://hlyons4.github.io/</link>
    <atom:link href="https://hlyons4.github.io/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Wed, 16 Apr 2025 04:20:16 +0000</pubDate>
    <lastBuildDate>Wed, 16 Apr 2025 04:20:16 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator><item>
        <title>How Weather Shapes Crime: A Visual Exploration</title>
        <description>&lt;p&gt;If you read the last post, you know I started wondering whether extreme weather actually affects crime. Turns out… it &lt;em&gt;does&lt;/em&gt;! But raw numbers can only say so much. The real story came through the visuals.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;To bring everything to life, I built a Streamlit app, which is basically a way to turn data into something you can actually interact with. Instead of scrolling through a bunch of static charts, the app lets you click around, filter by crime type, toggle storm days, and see how things shift with temperature. I added tabs, dropdowns, and live-updating graphs so even someone with zero data experience could explore and uncover patterns for themselves.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Lets walk you through some of the visuals that stood out!&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Picture this: It’s late July, the air is heavy, everyone’s sweating through their clothes, and the city feels like it’s on edge. You’ve got sirens in the distance, overheated crowds, long daylight hours. Then the numbers come in: assaults are up. Robberies, too.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;People get irritable in the heat. Mix that with crowded public spaces and long summer nights, and… well, the numbers speak for themselves.&lt;br /&gt;&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/assault_vs_temp.png&quot; alt=&quot;Scatterplot of average assault count vs temperature&quot; /&gt;
  &lt;figcaption&gt;Figure 2. - Assault counts increase with temperature.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;But indoor crimes? They keep going. Narcotics offenses, domestic violence, and other indoor incidents don’t seem to care about the weather.&lt;br /&gt;&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/location_vs_storm.png&quot; alt=&quot;Bar chart of crimes by storm status and location type&quot; /&gt;
  &lt;figcaption&gt;Figure 1. - Crime by location type during storm vs clear weather.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;So no, storms don’t stop crime. They just push it indoors. Which kind of makes sense. When the streets clear out, life keeps going behind closed doors.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;What surprised me most was how consistent that shift was because it wasn’t a one-off. The storm days were noticeably different every time.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The trend lines are simple but powerful. Robbery climbs in the heat. Burglary doesn’t care much about weather. This feature really helped me think differently about how some crimes are more influenced by the environment than others.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Sometimes I’d pick a crime expecting a pattern and there wasn’t one. Other times, something unexpected would show up, and I’d just sit with it for a minute. It made the data feel personal, like I was learning how these behaviors played out in real life.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Even without reading any numbers, you can tell when crime is at its peak. It gives a seasonal snapshot that says a lot in one quick glance.&lt;br /&gt;&lt;/p&gt;
&lt;figure&gt;
  &lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/crime_heatmap_calendar.png&quot; alt=&quot;Heatmap of daily crime counts by month and day&quot; /&gt;
  &lt;figcaption&gt;Figure 4. - Year-round crime patterns shown by calendar heatmap.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This one was more for fun, but it became one of my favorites. It helped me step back and look at the whole year in one image, and seeing those bursts of color in the summer months made everything I’d noticed in the other graphs feel even more real.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Streamlit honestly made it click. Watching the patterns shift with just a few filters felt less like research and more like discovery.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;What started as a small question became a really eye-opening look into human behavior and how the environment around us shapes it.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;If you’re curious, the &lt;a href=&quot;https://crime-weather-app-2cekkmprr2p9urjmdzzcrg.streamlit.app/&quot; target=&quot;_blank&quot;&gt;app&lt;/a&gt; is live. Try it out. Choose a crime. Slide through the weather conditions. See what stories the data tells you. You might just fall into the same rabbit hole I did.&lt;/p&gt;
</description><description>&lt;p&gt;If you read the last post, you know I started wondering whether extreme weather actually affects crime. Turns out… it &lt;em&gt;does&lt;/em&gt;! But raw numbers can only say so much. The real story came through the visuals.&lt;br /&gt;&lt;/p&gt;

</description><pubDate>Tue, 15 Apr 2025 00:00:00 +0000</pubDate>
        <link>https://hlyons4.github.io/blog/crime-weather-visuals/</link>
        <guid isPermaLink="true">https://hlyons4.github.io/blog/crime-weather-visuals/</guid></item><item>
        <title>As the Storm Approaches, Crime Rates Surge</title>
        <description>&lt;h3&gt;Wonder why most crime in movies happens in dreary, stormy weather?&lt;/h3&gt;
&lt;p&gt;It’s not just a cinematic trick to set the mood—data suggests there’s real-world truth behind it. As the weather worsens, crime rates tend to rise. From heatwaves that spark tempers to storms that create the perfect cover for illegal activities, extreme weather and crime seem to go hand in hand.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;But why does this happen? Does extreme heat make people more aggressive? Do storms create opportunities for crime, or do they deter it? These questions drive the heart of this investigation. Using real crime and weather data, we’ll explore whether there’s a measurable link between extreme weather events—heatwaves, cold spells, storms—and violent crime rates. Let’s dive into the data and see what patterns emerge.&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Data Collection and Processing&lt;/h3&gt;

&lt;p&gt;To explore this question, I gathered crime data from the FBI Uniform Crime Reporting (UCR) Program and city-specific open data portals like NYC Open Data and the Chicago Data Portal&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;For weather data, I used the OpenWeatherMap API, specifically the current weather data endpoint, as the historical weather API isn’t included in my plan.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;I ensured ethical data collection by only using publicly available datasets and following good scraping practices where necessary. If you’re interested in conducting a similar study, these sources are great starting points!&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Summary of Data Collection:&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Crime Data:&lt;/strong&gt; Type of crime, date, time, location, crime severity.&lt;br /&gt;
&lt;strong&gt;Weather Data:&lt;/strong&gt; Temperature, humidity, wind speed, and weather conditions.&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Exploratory Data Analysis (EDA)&lt;/h3&gt;
&lt;p&gt;After cleaning and merging the datasets, I conducted some initial analyses. Here are a few highlights:&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heatwaves &amp;amp; Crime:&lt;/strong&gt; During heatwaves (days above the 90th percentile for temperature), violent crimes increased by ~12%.&lt;br /&gt;
&lt;strong&gt;Cold Spells &amp;amp; Crime:&lt;/strong&gt; During extremely cold days (below the 10th percentile), crime rates dropped by ~8%.&lt;br /&gt;
&lt;strong&gt;Storms &amp;amp; Crime:&lt;/strong&gt; Heavy storms seemed to reduce outdoor crimes like assault but had little effect on crimes that occur indoors.&lt;br /&gt;&lt;/p&gt;

&lt;figure&gt;
	&lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/daily_crimes.jpg&quot; alt=&quot;&quot; /&gt; 
	&lt;figcaption&gt;Figure 1. - Daily crime counts in Chicago over the study period.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;figure&gt;
	&lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/daily_crime_trend.jpg&quot; alt=&quot;&quot; /&gt; 
	&lt;figcaption&gt;Figure 1. - Violent crime rates compared to concurrent temperatures.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Methodology&lt;/h3&gt;
&lt;p&gt;To analyze relationships, I used:&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Regression Analysis:&lt;/strong&gt; To quantify the impact of temperature on crime rates.&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Time Series Analysis:&lt;/strong&gt; To examine trends over seasons.&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Control Variables:&lt;/strong&gt; Time of day, location, and day of the week to account for confounding factors.&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Findings &amp;amp; Interpretation&lt;/h3&gt;
&lt;p&gt;The data suggests a significant relationship between extreme weather and crime:&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Heat increases violent crime.&lt;/strong&gt; Hotter days correlated with higher rates of assault and robbery.&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cold weather reduces overall crime.&lt;/strong&gt; The extreme cold appears to act as a deterrent.&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Storms disrupt crime patterns.&lt;/strong&gt; Outdoor crimes drop during severe storms, but indoor crimes remain steady.&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These results align with existing theories, suggesting that while weather affects crime, social and economic factors also play a crucial role.&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Challenges &amp;amp; Limitations&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Limited Weather Data:&lt;/strong&gt; Due to the absence of historical weather data in my OpenWeatherMap plan, the weather data used may not perfectly align with crime timestamps. This limitation likely affects the precision of the analysis.&lt;br /&gt;
&lt;strong&gt;Missing Crime Report Details:&lt;/strong&gt; Some crime reports lacked precise timestamps, complicating the matching process with weather data.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Potential Biases:&lt;/strong&gt; Crimes may be underreported during extreme weather events, especially severe storms or extreme cold, which could skew results.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;City-Specific Trends:&lt;/strong&gt; This study focuses on Chicago, so findings may not generalize to other cities or regions.&lt;br /&gt;&lt;/p&gt;

&lt;h3&gt;Conclusion &amp;amp; Implications&lt;/h3&gt;
&lt;p&gt;So, does bad weather increase crime, or is it just a Hollywood trope? The data tells a nuanced story. Heatwaves correlate with higher violent crime, supporting the theory that extreme heat increases aggression. Cold weather tends to suppress crime, likely due to reduced outdoor activity. Storms shift crime patterns rather than eliminate them.&lt;br /&gt;
These insights can help cities allocate resources more effectively. For example, increasing police presence during heatwaves or adjusting safety strategies during winter months could enhance public safety.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Despite data limitations, this study underscores the value of integrating weather variables into crime prevention strategies.&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Curious to learn more? Here are some great resources:&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Crime Data Sources:&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.fbi.gov/services/cjis/ucr&quot; target=&quot;_blank&quot;&gt;FBI Uniform Crime Reporting (UCR) Program&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://data.cityofnewyork.us/&quot; target=&quot;_blank&quot;&gt;NYC Open Data&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://data.cityofchicago.org/&quot; target=&quot;_blank&quot;&gt;Chicago Data Portal&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weather Data Sources:&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.ncdc.noaa.gov/&quot; target=&quot;_blank&quot;&gt;NOAA&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://openweathermap.org/api&quot; target=&quot;_blank&quot;&gt;OpenWeatherMap API&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.weather.gov/&quot; target=&quot;_blank&quot;&gt;National Weather Service&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
</description><description>&lt;h3&gt;Wonder why most crime in movies happens in dreary, stormy weather?&lt;/h3&gt;
&lt;p&gt;It’s not just a cinematic trick to set the mood—data suggests there’s real-world truth behind it. As the weather worsens, crime rates tend to rise. From heatwaves that spark tempers to storms that create the perfect cover for illegal activities, extreme weather and crime seem to go hand in hand.&lt;br /&gt;&lt;/p&gt;

</description><pubDate>Sun, 16 Mar 2025 00:00:00 +0000</pubDate>
        <link>https://hlyons4.github.io/blog/crime_rates_surge/</link>
        <guid isPermaLink="true">https://hlyons4.github.io/blog/crime_rates_surge/</guid></item><item>
        <title>Crack The Code of Predictions</title>
        <description>&lt;p class=&quot;intro&quot;&gt;&lt;span class=&quot;dropcap&quot;&gt;E&lt;/span&gt;ver wondered how companies predict sales based on ad spending? If a company spends $175 on TV ads, how much can they expect in sales? Linear regression helps answer questions like these by identifying relationships between variables. It is one of the simplest yet most powerful tools in data science, commonly used in business, economics, and various analytical fields. &lt;/p&gt;

&lt;h3 id=&quot;step-1-setting-up-your-environment&quot;&gt;Step 1: Setting Up Your Environment&lt;/h3&gt;

&lt;p&gt;Before we begin, ensure you have the required Python libraries installed. If you haven’t already, run:&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;pip&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;scikit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;learn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;matplotlib&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;p&gt;Now, import the necessary libraries:&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sklearn.model_selection&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;train_test_split&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sklearn.linear_model&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LinearRegression&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sklearn.metrics&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r2_score&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h3 id=&quot;step-2-loading-and-exploring-data&quot;&gt;Step 2: Loading and Exploring Data&lt;/h3&gt;

&lt;p&gt;For this tutorial, we’ll create a simple dataset representing TV ad spending and the corresponding sales generated.&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;TV_Ad_Spend&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;230&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;44&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;151&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;180&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;57&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;199&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;60&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;Sales&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;18&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;11&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h4 id=&quot;quick-data-check&quot;&gt;Quick Data Check&lt;/h4&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Displays the first five rows
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;describe&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Summary statistics&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h3 id=&quot;step-3-preparing-the-data&quot;&gt;Step 3: Preparing the Data&lt;/h3&gt;

&lt;p&gt;We define our independent variable (TV_Ad_Spend) and dependent variable (Sales), then split the data into training and testing sets.&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;X&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;TV_Ad_Spend&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]]&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Predictor variable
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Sales&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Target variable
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# Split into training (80%) and testing (20%) datasets
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X_train&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;X_test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_train&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_test&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;train_test_split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;test_size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random_state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h3 id=&quot;step-4-training-the-linear-regression-model&quot;&gt;Step 4: Training the Linear Regression Model&lt;/h3&gt;
&lt;p&gt;Now, let’s create and train a Linear Regression model.&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LinearRegression&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X_train&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_train&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h4 id=&quot;understanding-the-model-coefficients&quot;&gt;Understanding the Model Coefficients&lt;/h4&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Slope (Coefficient): &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;coef_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Intercept: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;intercept_&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Slope (Coefficient): Represents how much sales increase per additional dollar spent on TV ads.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Intercept: The expected sales when no money is spent on ads.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;step-5-making-predictions&quot;&gt;Step 5: Making Predictions&lt;/h3&gt;

&lt;p&gt;We now use our trained model to make predictions on the test set.&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X_test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h3 id=&quot;step-6-evaluating-model-performance&quot;&gt;Step 6: Evaluating Model Performance&lt;/h3&gt;

&lt;p&gt;To measure how well our model fits the data, we use the R² score:&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r2_score&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_test&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;R² Score: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;r2&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;ul&gt;
  &lt;li&gt;R² Score: Ranges from 0 to 1, with higher values indicating a better model fit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;step-7-visualizing-the-regression-line&quot;&gt;Step 7: Visualizing the Regression Line&lt;/h3&gt;

&lt;p&gt;A visual representation helps us understand how well the model predicts sales.&lt;/p&gt;

&lt;figure&gt;
	&lt;img src=&quot;https://hlyons4.github.io/my-blog/assets/img/regression_plot.png&quot; alt=&quot;&quot; /&gt; 
	&lt;figcaption&gt;Figure 1. - This is an example figcaption&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;h4 id=&quot;graph-interpretation&quot;&gt;Graph Interpretation&lt;/h4&gt;

&lt;p&gt;The scatter plot shows actual sales values, while the regression line represents predicted sales. If most points are close to the line, our model makes accurate predictions.&lt;/p&gt;

&lt;h3 id=&quot;wrapping-up-whats-next&quot;&gt;Wrapping Up: What’s Next?&lt;/h3&gt;
&lt;p&gt;Congratulations! You just built your first linear regression model in Python.&lt;/p&gt;

&lt;p&gt;You learned how to:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Load and explore data&lt;/li&gt;
  &lt;li&gt;Train a linear regression model using Scikit-Learn&lt;/li&gt;
  &lt;li&gt;Interpret the slope and intercept&lt;/li&gt;
  &lt;li&gt;Evaluate model performance using R²&lt;/li&gt;
  &lt;li&gt;Visualize the results with a regression line&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;next-steps&quot;&gt;Next Steps:&lt;/h4&gt;
&lt;p&gt;If you’re looking for a more statistical approach similar to R, consider using statsmodels. It provides additional features like p-values and confidence intervals, which can help assess the significance of your predictors. If these are important for your analysis, statsmodels is a great alternative to Scikit-Learn for regression modeling. Check out &lt;a href=&quot;https://www.statsmodels.org/stable/index.html&quot; target=&quot;_blank&quot;&gt;Statsmodels’ documentation&lt;/a&gt;.&lt;/p&gt;
</description><description>&lt;p class=&quot;intro&quot;&gt;&lt;span class=&quot;dropcap&quot;&gt;E&lt;/span&gt;ver wondered how companies predict sales based on ad spending? If a company spends $175 on TV ads, how much can they expect in sales? Linear regression helps answer questions like these by identifying relationships between variables. It is one of the simplest yet most powerful tools in data science, commonly used in business, economics, and various analytical fields. &lt;/p&gt;

</description><pubDate>Tue, 04 Feb 2025 00:00:00 +0000</pubDate>
        <link>https://hlyons4.github.io/blog/crack-the-code-of-predictions/</link>
        <guid isPermaLink="true">https://hlyons4.github.io/blog/crack-the-code-of-predictions/</guid></item></channel>
</rss>
