Functional Programming × Data Science

F# Applications
in Data Science

Exploring the Palmer Penguins dataset through the lens of functional programming. Type-safe data pipelines, elegant transformations, and interactive visualizations.

344 Data Points
3 Species
4 Visualizations
100% F#
Explore Visualizations
Program.fs
open PenguinData.DataLoad
open PenguinData.DataClean
open PenguinData.Visualization

let main argv =
    let raw = loadPenguins "penguins.csv"
    let cleaned = cleanData raw

    scatterFlipperVsMass cleaned
    billLengthBoxplot cleaned
    histBodyMass cleaned
    scatterBillAreaVsMass cleaned
    
    0

Functional Advantages for Data Science

F# brings unique strengths that complement traditional data science tools

Type Safety

Catch data type errors at compile time, not runtime. Strong typing prevents entire categories of bugs.

Pipeline Operators

Chain transformations naturally with |>. Data flows through functions in a readable, logical sequence.

Immutability

Data transformations without side effects. Safer, more predictable code that's easier to reason about.

Pattern Matching

Elegant handling of missing data and edge cases. Express complex logic clearly and concisely.

REPL Support

Interactive exploration via F# Interactive. Test hypotheses and iterate quickly on analysis.

.NET Ecosystem

Access to mature libraries and enterprise integration. Leverage the full power of the .NET platform.

From Raw Data to Insights

A functional approach to data transformation

01

Load

CSV ingestion with Deedle DataFrames

Frame.ReadCsv("penguins.csv") 344 rows × 9 columns
02

Clean

Handle missing values & validate ranges

Frame.filterRows |> Frame.mapCols 342 valid rows
03

Engineer

Create derived features

df?bill_length * df?bill_depth + bill_area_mm²
04

Visualize

Interactive Plotly.NET charts

Chart.Scatter |> Chart.saveHtml 4 visualizations

Interactive Visualizations

Explore the relationships in penguin morphological data

Flipper Length vs Body Mass

Scatter Plot

Strong positive correlation between flipper length and body mass across all penguin species.

Bill Length Distribution

Box Plot

Bill length distribution showing median, quartiles, and potential outliers in the dataset.

Body Mass Distribution

Histogram

Frequency distribution of body mass revealing distinct clusters corresponding to species.

Bill Area vs Body Mass

Scatter Plot

Derived feature (bill area) showing relationship with body mass—a feature engineering example.

Code Highlights

Elegant functional patterns in action

DataClean.fs — Missing Value Handling
let cleanData (df: Frame<int,string>) =
    df
    |> Frame.filterRows (fun _ row ->
        let billLength = row.TryGetAs<float>("bill_length_mm")
        let billDepth = row.TryGetAs<float>("bill_depth_mm")
        let flipperLength = row.TryGetAs<float>("flipper_length_mm")
        let bodyMass = row.TryGetAs<float>("body_mass_g")
        
        // Keep rows with at least 3 out of 4 measurements
        let validCount = 
            [billLength.HasValue; billDepth.HasValue; 
             flipperLength.HasValue; bodyMass.HasValue]
            |> List.filter id
            |> List.length
        validCount >= 3
    )

Functional Missing Value Handling

Using TryGetAs<T> with OptionalValue provides compile-time safe handling of potentially missing data. The pipeline operator chains the filtering logic clearly.

Visualization.fs — Chart Generation
let scatterFlipperVsMass (df: Frame<int,string>) =
    let flipperData = df?flipper_length_mm |> Series.values |> Seq.map float
    let massData = df?body_mass_g |> Series.values |> Seq.map float
    
    Chart.Scatter(
        x = flipperData,
        y = massData,
        mode = StyleParam.Mode.Markers
    )
    |> Chart.withTitle "Flipper Length vs Body Mass"
    |> Chart.withXAxisStyle "Flipper Length (mm)"
    |> Chart.withYAxisStyle "Body Mass (g)"
    |> Chart.saveHtml("plots/flipper_vs_mass.html")

Declarative Visualization

Plotly.NET enables fluent chart construction. The ? operator provides dynamic column access, while pipelines configure and save the chart in one expression.

Technology Stack

F# / .NET 9 Functional-first language
Deedle 3.0 Data frames & series
Plotly.NET Interactive visualizations
FSharp.Data Type providers