06. Examples & Tutorials
Example: SPXY Split on Custom Data
using DataSplits, Distances
# Suppose `df` is a DataFrame with features and target
df = DataFrame(rand(100, 5), :auto)
y = rand(100)
train, test = split((Matrix(df), y), SPXYSplit(0.75))
Example: Custom Splitter Implementation
Suppose you want to create a splitter that always assigns the first 80% of samples to train:
struct First80Split <: SplitStrategy end
function first80(N, s, rng, data)
cut = floor(Int, 0.8 * N)
train_pos = 1:cut
test_pos = (cut+1):N
return train_pos, test_pos
end
function _split(data, ::First80Split; rng=nothing)
split_with_positions(data, nothing, first80; rng=rng)
end
Example: Group-aware Splitting
using DataSplits
# Suppose you have cluster assignments for your data
clusters = [rand(1:5) for _ in 1:100]
using Clustering
res = Clustering.ClusteringResult(clusters)
train, test = split(X, ClusterShuffleSplit(res, 0.7))