Dataset Overview
The analysis began with 39 hotels characterized by 6 numerical features: Comfort, Room Count, Cuisine Quality, Sports Facilities, Beach Access, and Price. Traditional star ratings (0-5 stars) were used as a baseline for comparison.
Correlation Analysis
Initial exploration revealed interesting relationships between variables:
- Strongest correlation: Cuisine quality and Price (0.57)
- Moderate correlation: Comfort and Cuisine (0.56)
- Weakest correlation: Room count and Price (-0.03)
- Beach access showed minimal correlation with comfort (-0.05)
Principal Component Analysis
PCA was applied to reduce dimensionality while preserving maximum variance:
- Data standardization ensured equal feature weighting
- Two principal components captured 67.1% of total variance
- PC1 explained 43.6% of variance (primarily comfort, cuisine, price)
- PC2 explained 23.6% of variance (sports facilities, beach access)
Key Findings
Visual Cluster Patterns
The PCA visualization revealed natural groupings that didn't align perfectly with star ratings. Hotels with similar service profiles clustered together regardless of their official star classification.
This suggested that traditional star ratings might not fully capture the multidimensional nature of hotel quality and service offerings.
Feature Importance
Comfort, cuisine quality, and price were the most influential factors in the first principal component, while sports facilities and beach access dominated the second component.
Room count showed minimal impact on the overall segmentation, indicating it's less relevant for customer perception of hotel quality.