In this blog, I will share my experience with a Proof of Concept (POC) lab deployment of VMware Cloud Foundation (VCF) on VxRail 5.1.1, where I encountered a failure during the VxRail bring-up process. This post aims to provide insights into the issues faced, the troubleshooting steps taken, and the eventual resolution to help others who may face similar challenges
Before diving into the issue, here's a quick overview of my lab environment:
VCF Version: 5.1.1
VxRail Version: 8.0.210 (used in the VCF deployment)
Hardware: Dell VxRail nodes
Network: Fully configured with required VLANs for management, vSAN, vMotion, and NSX-T
The goal was to deploy VCF on VxRail using SDDC Manager and the VxRail Manager to orchestrate the deployment of a new VxRail cluster.
The Issue: VxRail Bring-Up Failure
During the initial deployment, everything seemed to be proceeding as expected. However, the VxRail bring-up process failed at the node validation step. The error message displayed was generic, offering little insight into what went wrong:
"Failed to complete the VxRail bring-up. Please check the logs for more details."
This failure halted the entire VCF deployment process, requiring immediate troubleshooting.
Troubleshooting Steps
Log Analysis: The first step was to dig into the logs to understand the root cause. The relevant logs were located on the VxRail Manager VM /var/log/microservice_log/dayone.log
Found that the instance is running , though the bring up failed but in the background it was still running.
We checked the VxRail db and found the below inputs
psql -U postgres vxrail
vxrail=# select id ,state from system.operation_status;
id | state
----+---------
1 | FAILED
2 | FAILED
3 | STARTED
Resolution and Successful Deployment
Updated the VxRail db status to FAILED
Restarted marvin service (service vmware-marvin restart)
Deployment completed.
Deploying VCF on VxRail 5.1.1 is a complex task that requires careful planning and attention to detail. While failures can be frustrating, they provide valuable learning experiences. By systematically troubleshooting the issues and resolving them, I was able to successfully bring up the VxRail cluster in my POC lab, paving the way for a successful VCF deployment.
If you encounter similar issues, I hope this blog post provides useful guidance and helps you troubleshoot and resolve them efficiently.
Comments