skypilot-users

Is it possible to install a custom version of Ray in a Skypilot environment?

Ananth is trying to install a custom version of Ray in a Skypilot environment to run xgboost-ray. He is facing issues with the version compatibility and dashboard display. He is also concerned about multiple Ray clusters being created due to setup commands. An example configuration and setup script are provided in the discussion thread.

An

Ananth G

Asked on Feb 16, 2024

Yes, it is possible to install a custom version of Ray in a Skypilot environment.

Here is a full YAML example that is functional:

num_nodes: 3
resources:
  ports:
    # ray ports
    - 8265-8266
  
setup: |
  echo "Running setup."
  # We are installing a separate ray cluster than the one installed by skypilot.
  conda deactivate
  conda activate custom_ray
   if [ $? -ne 0 ]; then
    conda create -n custom_ray python=3.10 -y
    conda activate custom_ray
    conda install pip
    pip install 'ray[default]==2.8.1'
  fi

run: |
  num_nodes=`echo "$SKYPILOT_NODE_IPS" | wc -l`
  head_ip=`echo "$SKYPILOT_NODE_IPS" | head -n1`
  conda deactivate
  conda activate dwn_ray
  if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
    ps aux | grep ray | grep 6379 &> /dev/null || ray start --head  --disable-usage-stats --port 6379
    sleep 10
  else
    sleep 15
    ps aux | grep ray | grep 6379 &> /dev/null || ray start --address $head_ip:6379 --disable-usage-stats
  fi

Here are some key points to consider:

  1. Use a setup script to install the custom Ray version:

conda deactivate
conda activate custom_ray
if [ $? -ne 0 ]; then
  conda create -n custom_ray python=3.10 -y
  conda activate custom_ray
  conda install pip
  pip install 'ray[default]==2.8.1'
fi
  1. Ensure proper activation and installation steps in the setup script to avoid conflicts with the existing Ray version.

  2. Use conditional statements to start Ray head node only on specific nodes:

if [ "$SKYPILOT_NODE_RANK" == "0" ]; then
  ps aux | grep ray | grep 6379 &>/dev/null || ray start --head --disable-usage-stats --port 6379
  sleep 10
fi
  1. Address issues with multiple Ray clusters by controlling the setup commands based on node rank.

  2. Install Ray with necessary components for dashboard functionality:

pip install 'ray[default]'
  1. Monitor log messages for IP address conflicts and ensure proper connection to the Ray cluster.

  2. Adjust the setup script and installation commands as needed to achieve the desired custom Ray version installation and functionality in the Skypilot environment.

Feb 17, 2024Edited by