Veri Pipeline — ETL, Lakehouse & Polars
ETL vs ELT paradigması, Airflow DAG, Delta Lake ve Apache Iceberg ile lakehouse mimarisi, Polars lazy evaluation, Spark temelleri ve Great Expectations ile veri kalitesi.
Feature Store — Feast & Online/Offline
Training/serving skew problemi, Feast mimarisi, feature view tanımı, batch materializasyon, online serving, time-travel query ve feature monitoring.
Veri Etiketleme — Label Studio & Weak Supervision
Etiketleme maliyeti, Label Studio kurulumu, NER ve bbox annotation, inter-annotator agreement, Snorkel ile programatik etiketleme ve active learning döngüsü.
Veri Versiyonlama — DVC & LakeFS
ML reproducibility için DVC kurulumu, remote storage, dvc.yaml pipeline, experiment tracking, LakeFS ile data lake branch/merge ve data contracts.
Streaming ML — Kafka, Flink & Drift Detection
Batch vs streaming inference, Kafka temelleri, Flink DataStream API, Bytewax ile Python-first streaming, concept/data/model drift tespiti ve retraining trigger.