Exploring the Palmer Penguins dataset through the lens of functional programming. Type-safe data pipelines, elegant transformations, and interactive visualizations.
open PenguinData.DataLoad
open PenguinData.DataClean
open PenguinData.Visualization
let main argv =
let raw = loadPenguins "penguins.csv"
let cleaned = cleanData raw
scatterFlipperVsMass cleaned
billLengthBoxplot cleaned
histBodyMass cleaned
scatterBillAreaVsMass cleaned
0
F# brings unique strengths that complement traditional data science tools
Catch data type errors at compile time, not runtime. Strong typing prevents entire categories of bugs.
Chain transformations naturally with |>. Data flows through functions in a readable, logical sequence.
Data transformations without side effects. Safer, more predictable code that's easier to reason about.
Elegant handling of missing data and edge cases. Express complex logic clearly and concisely.
Interactive exploration via F# Interactive. Test hypotheses and iterate quickly on analysis.
Access to mature libraries and enterprise integration. Leverage the full power of the .NET platform.
A functional approach to data transformation
CSV ingestion with Deedle DataFrames
Frame.ReadCsv("penguins.csv")
344 rows × 9 columns
Handle missing values & validate ranges
Frame.filterRows |> Frame.mapCols
342 valid rows
Create derived features
df?bill_length * df?bill_depth
+ bill_area_mm²
Interactive Plotly.NET charts
Chart.Scatter |> Chart.saveHtml
4 visualizations
Explore the relationships in penguin morphological data
Strong positive correlation between flipper length and body mass across all penguin species.
Bill length distribution showing median, quartiles, and potential outliers in the dataset.
Frequency distribution of body mass revealing distinct clusters corresponding to species.
Derived feature (bill area) showing relationship with body mass—a feature engineering example.
Elegant functional patterns in action
let cleanData (df: Frame<int,string>) =
df
|> Frame.filterRows (fun _ row ->
let billLength = row.TryGetAs<float>("bill_length_mm")
let billDepth = row.TryGetAs<float>("bill_depth_mm")
let flipperLength = row.TryGetAs<float>("flipper_length_mm")
let bodyMass = row.TryGetAs<float>("body_mass_g")
// Keep rows with at least 3 out of 4 measurements
let validCount =
[billLength.HasValue; billDepth.HasValue;
flipperLength.HasValue; bodyMass.HasValue]
|> List.filter id
|> List.length
validCount >= 3
)
Using TryGetAs<T> with OptionalValue provides compile-time safe handling of potentially missing data. The pipeline operator chains the filtering logic clearly.
let scatterFlipperVsMass (df: Frame<int,string>) =
let flipperData = df?flipper_length_mm |> Series.values |> Seq.map float
let massData = df?body_mass_g |> Series.values |> Seq.map float
Chart.Scatter(
x = flipperData,
y = massData,
mode = StyleParam.Mode.Markers
)
|> Chart.withTitle "Flipper Length vs Body Mass"
|> Chart.withXAxisStyle "Flipper Length (mm)"
|> Chart.withYAxisStyle "Body Mass (g)"
|> Chart.saveHtml("plots/flipper_vs_mass.html")
Plotly.NET enables fluent chart construction. The ? operator provides dynamic column access, while pipelines configure and save the chart in one expression.