{"templateId":"markdown","sharedDataIds":{"sidebar":"sidebar-sidebars.yaml"},"props":{"metadata":{"markdoc":{"tagList":["admonition"]},"redocly_category":"Products","product_name":"Machine Learning","type":"markdown"},"seo":{"title":"ML Experiment Tracking and Model Management","description":"Treasure Data Product Documentation · Collect and Unify · Segment and Activate · Experiment and Analyze · Decisioning Automate with AI Scale and Trust.","siteUrl":"https://docs.treasuredata.com","lang":"en-US","llmstxt":{"hide":false,"sections":[{"title":"Table of contents","includeFiles":["**/*"],"excludeFiles":[]}],"excludeFiles":[]}},"dynamicMarkdocComponents":[],"compilationErrors":[],"ast":{"$$mdtype":"Tag","name":"article","attributes":{},"children":[{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"ml-experiment-tracking-and-model-management","__idx":0},"children":["ML Experiment Tracking and Model Management"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["ML experiment tracking is the process of organizing, recording, and analyzing the results of machine learning experiments. This document explains how to create a workflow to enable ML experiment tracking."]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can find the complete ML experiment tracking workflow code in ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/ml_experiment.dig"},"children":["Treasure Boxes"]}]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":[{"$$mdtype":"Tag","name":"strong","attributes":{},"children":["Table of Contents"]}]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#track-ml-experiments"},"children":["Track ML Experiments"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#record-evaluation-results-for-each-model"},"children":["Record Evaluation Results for each Model"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"#detect-drift-in-model-performance-over-time"},"children":["Detect Drift in Model Performance over Time"]}]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"track-ml-experiments","__idx":1},"children":["Track ML Experiments"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["As a best practice, as part of an end-to-end data processing workflow, you should track each ML experiment using a \"",{"$$mdtype":"Tag","name":"em","attributes":{},"children":["track_experiment\""]}," task following a train task. The ",{"$$mdtype":"Tag","name":"em","attributes":{},"children":["track_experiment"]}," task issues a SQL query to record ML experiment information and the model name into a TD table named \"automl_experiments\". Sample Workflow Code, is as follows:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"+create_db_tbl_if_not_exists:\n  td_ddl>: null\n  create_databases:\n    - '${ output_database}'\n  create_tables:\n    - automl_experiments\n    - automl_eval_results\n+train:\n  ml_train>:\n    docker:\n      task_mem: 128g\n    notebook: gluon_train\n    model_name: 'gluon_model_${session_id}'\n    input_table: '${input_database}.${train_data_table}'\n    target_column: '${target_column}'\n    time_limit: '${fit_time_limit}'\n    share_model: true\n    export_leaderboard: '${output_database}.leaderboard_${train_data_table}'\n    export_feature_importance: '${output_database}.feature_importance_${train_data_table}'\n+track_experiment:\n  td>: queries/track_experiment.sql\n  insert_into: '${output_database}.automl_experiments'\n  last_executed_notebook: '${automl.last_executed_notebook}'\n  user_id: '${automl.last_executed_user_id}'\n  user_email: '${automl.last_executed_user_email}'\n  model_name: 'gluon_model_${session_id}'\n  shared_model: '${automl.shared_model}'\n  task_attempt_id: '${attempt_id}'\n  session_time: '${session_local_time}'\n  engine: presto\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["The above workflow code generates the following example content in the ",{"$$mdtype":"Tag","name":"em","attributes":{},"children":["automl_experiments"]}," table:"]},{"$$mdtype":"Tag","name":"div","attributes":{"className":"md-table-wrapper"},"children":[{"$$mdtype":"Tag","name":"table","attributes":{"className":"md"},"children":[{"$$mdtype":"Tag","name":"thead","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"task_attempt_id"},"children":["task_attempt_id"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"session_time"},"children":["session_time"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"user_id"},"children":["user_id"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"user_email"},"children":["user_email"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"model_name"},"children":["model_name"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"shared_model"},"children":["shared_model"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"notebook_url"},"children":["notebook_url"]}]}]},{"$$mdtype":"Tag","name":"tbody","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["849779333"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 7:19:18"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["7776"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["xxx@treasure-data.com"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161722236"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["b4a568da-e6f3-4057-b694-e2e19bf0e924"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["https://console.treasuredata.com/app/workflows/automl/notebook/4a3c431b3aea4705b32a47d85ca46368"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["849772621"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 7:08:30"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["7776"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["xxx@treasure-data.com"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161721046"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["94ad5d0e-89ac-4836-99c4-2bc8f975ccbe"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["https://console.treasuredata.com/app/workflows/automl/notebook/b390b932d4a64fd3a2dc3b75503430fb"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["849768123"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 7:01:13"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["7777"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["yyy@treasure-data.com"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161720337"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["4f2351a3-dd8c-418e-8057-4c8ec9a90cbe"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["https://console.treasuredata.com/app/workflows/automl/notebook/e8b3319c982345a48ff74db0003d7c9c"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["849760942"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 6:49:50"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["7776"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["xxx@treasure-data.com"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161718676"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["93e68b09-1a2f-4049-bb89-2bfe596ca9b3"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["https://console.treasuredata.com/app/workflows/automl/notebook/b02959b1469e4b9c86ec6c6809acc5ff"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["849753199"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 6:36:36"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["7776"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["xxx@treasure-data.com"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161717236"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["a7e456d3-8fcf-4173-afb7-f2d58bb985cd"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["https://console.treasuredata.com/app/workflows/automl/notebook/d3dcbbab99774bd594106a496ec2b2ab"]}]}]}]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["In the table, each records contains model name, details of the user who created the models, the session time when a model is created, and link to the generated notebook."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"record-evaluation-results-for-each-model","__idx":2},"children":["Record Evaluation Results for each Model"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can optionally record each model's quality using an evaluation dataset. The following workflow is an example recording model quality that uses ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://en.wikipedia.org/wiki/Receiver_operating_characteristic"},"children":["AUROC"]},", a standard evaluation measure for classification problems. The ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["record_evaluation"]}," task records evaluation results in the automl_eval_results table."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"+predict:\n  ml_predict>:\n    docker:\n      task_mem: 64g\n    notebook: gluon_predict\n    model_name: 'gluon_model_${session_id}'\n    input_table: '${input_database}.${test_data_table}'\n    output_table: '${output_database}.predicted_${test_data_table}_${session_id}'\n+evaluation:\n  td>: queries/auc.sql\n  table: '${output_database}.predicted_${test_data_table}_${session_id}'\n  target_column: '${target_column}'\n  positive_class: ' >50K'\n  store_last_results: true\n  engine: hive\n+record_evaluation:\n  td>: queries/record_evaluation.sql\n  insert_into: '${output_database}.automl_eval_results'\n  engine: presto\n  model_name: 'gluon_model_${session_id}'\n  test_table: '${input_database}.${test_data_table}'\n  session_time: '${session_local_time}'\n  auc: '${td.last_results.auc}'\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Treasure Data's Hive execution engine supports Hivemall, which supports a number of evaluation measures. See ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://hivemall.github.io/eval/binary_classification_measures.html"},"children":["Hivemall document for details"]}]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Example content in \"automl_eval_results\" table:"]},{"$$mdtype":"Tag","name":"div","attributes":{"className":"md-table-wrapper"},"children":[{"$$mdtype":"Tag","name":"table","attributes":{"className":"md"},"children":[{"$$mdtype":"Tag","name":"thead","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"session_time"},"children":["session_time"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"model_name"},"children":["model_name"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"ml_datasets.gluon_test"},"children":["ml_datasets.gluon_test"]},{"$$mdtype":"Tag","name":"th","attributes":{"data-label":"auroc"},"children":["auroc"]}]}]},{"$$mdtype":"Tag","name":"tbody","attributes":{},"children":[{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-06-06 6:21:40"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_164947310"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["ml_datasets.gluon_test"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["0.9226243033"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-06-14 6:49:22"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_166350110"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["ml_datasets.gluon_test"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["0.9299335758"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-06-15 7:35:30"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_166532223"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["ml_datasets.gluon_test"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["0.9300292252"]}]},{"$$mdtype":"Tag","name":"tr","attributes":{},"children":[{"$$mdtype":"Tag","name":"td","attributes":{},"children":["2023-05-18 7:19:18"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["gluon_model_161722236"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["ml_datasets.gluon_test"]},{"$$mdtype":"Tag","name":"td","attributes":{},"children":["0.9238149699"]}]}]}]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"detect-drift-in-model-performance-over-time","__idx":3},"children":["Detect Drift in Model Performance over Time"]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["\"Drift\" is a term used in machine learning to describe how the performance of a machine learning model slowly gets worse or stale over time. There are two main types for drifts: data drift and ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://en.wikipedia.org/wiki/Concept_drift"},"children":["concept drift"]},". Both data drift and concept drift can lead to a decline in the performance of a machine learning model."]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Using the following workflow tasks, you can records each model's accuracy and quality to detect drift in data and model performance. You can use a scheduled workflow job to keep track of model performance and give a warning if the model performance drifts."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["There are several schemes for drift detection. See the following example workflow to identify a degradation in ML model performance using an evaluation measure. When a drift is detected, you can trigger an alert email, as follows:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"# timezone: PST\n# schedule:\n#  daily>: 07:00:00\n+evaluation:\n  td>: queries/auc.sql\n  table: '${output_database}.predicted_${test_data_table}_${session_id}'\n  target_column: '${target_column}'\n  positive_class: ' >50K'\n  store_last_results: true\n  engine: hive\n+alert_if_drift_detected:\n  if>: '${td.last_results.auc < 0.93}'\n  _do: null\nmail>: null\ndata: 'Detect drift in model performance. AUC was ${td.last_results.auc}.'\nsubject: Drift detected\nto:\n  - me@example.com\nbcc:\n  - foo@example.com\n  - bar@example.com\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://docs.digdag.io/scheduling_workflow.html?highlight=schedule"},"children":["schedule workflow executions"]}," for drift detection. And when drift is detected, you can send alert email or rebuild a model using a ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://docs.digdag.io/operators/if.html"},"children":["conditional operator"]},"."]}]}]},"headings":[{"value":"ML Experiment Tracking and Model Management","id":"ml-experiment-tracking-and-model-management","depth":1},{"value":"Track ML Experiments","id":"track-ml-experiments","depth":1},{"value":"Record Evaluation Results for each Model","id":"record-evaluation-results-for-each-model","depth":1},{"value":"Detect Drift in Model Performance over Time","id":"detect-drift-in-model-performance-over-time","depth":1}],"frontmatter":{"seo":{"title":"ML Experiment Tracking and Model Management"}},"lastModified":"2026-01-27T10:05:25.000Z","pagePropGetterError":{"message":"","name":""}},"slug":"/products/customer-data-platform/machine-learning/automl/advanced-topics/ml-experiment-tracking-and-model-management","userData":{"isAuthenticated":false,"teams":["anonymous"]},"isPublic":true}