Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ray-preview.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Report metrics

Use ray.train.report to push metrics from train_loop_per_worker to the run.
ray.train.report({"loss": loss.item(), "accuracy": acc, "epoch": epoch})
Reported values appear in:
  • The Ray dashboard’s Train tab
  • The Result object returned by trainer.fit()
  • Any logger callbacks attached to the run

Built-in loggers

from ray.train.callbacks.mlflow import MLflowLoggerCallback
from ray.train.callbacks.wandb import WandbLoggerCallback
from ray.train.callbacks.tbx import TBXLoggerCallback

run_config = RunConfig(callbacks=[
    MLflowLoggerCallback(experiment_name="finetune"),
    WandbLoggerCallback(project="ray-train-demo"),
    TBXLoggerCallback(),
])

Ray dashboard

ray dashboard (or the URL printed by ray.init) shows:
  • Per-run metrics over time
  • Worker utilization (CPU, GPU, memory)
  • Reported checkpoints

TensorBoard

Logs land under <storage_path>/<run_name>/. Point TensorBoard at the directory:
tensorboard --logdir s3://bucket/runs/

Custom callbacks

Subclass TrainCallback for custom integrations:
from ray.train.callbacks import TrainCallback

class SlackNotifier(TrainCallback):
    def on_trial_result(self, iteration, trials, trial, result, **info):
        if result["loss"] < 0.1:
            send_slack(f"Trial {trial.trial_id} hit loss < 0.1")

run_config = RunConfig(callbacks=[SlackNotifier()])

Profiling workers

Use the dashboard’s “Stack Trace” and “py-spy” actions on a worker to capture a flame graph or stack snapshot of a running training job.

Next steps

Observability

Cluster-wide metrics, logs, and tracing.

Run config

All callback options.