The Problem
The Department of Families, Housing, Community Services and Indigenous Affairs (FaHCSIA) aims to improve the lives of Australians by creating opportunities for economic and social participation by individuals, families and communities. To assist families and individuals with their personal money management, the Department recognised that they needed to better understand people's financial behaviour and identify potential obstacles to changing behaviour.
The results of the 2008 ANZ Survey of Adult Financial Literacy in Australia were available, but included almost 800 variables containing responses to a 140 page questionnaire. How could meaningful information be extracted from so many variables?
The Data Analysis Australia Approach
In the marketing world, segmentation analysis is often undertaken to highlight where marketing efforts may be focussed, for example by identifying those who are most likely to buy a product. A segmentation approach was taken here to identify groups of people who were more likely to need assistance with money management issues, as well as other groups who were already coping well. After defining the segments, detailed profiles of each segment were created (undertaken by The Social Research Centre) to provide FaHCSIA with the information needed to develop individualised strategies to best assist each segment.
Segmentation analysis is not a statistical technique in itself, but more an art that combines several statistical techniques. This combination may include some or all of the following:
- Principal component analysis (PCA);
- Factor analysis;
- Cluster analysis;
- Classification and regression trees (CART);
- Discriminant analysis;
- Correspondence analysis;
- Random forests; and
- many others.
While some people may believe it is possible to standardise segmentation analysis by choosing a single technique, such as CART, and always applying this technique, Data Analysis Australia's experience has been that there is no one-size-fits-all approach to segmentation. Every dataset is different and an approach that works for one dataset may give meaningless results for another. Therefore, one of the most important stages of the analysis is exploration of the data.
Data Analysis Australia explored a number of approaches, using various combinations of PCA and cluster analysis, to determine the most appropriate approach for this particular dataset. A key consideration was the robustness of the analysis - relatively minor changes in the input variables should not significantly change the final outcomes. Another consideration was the size of the resulting segments. One approach that was explored (and discarded) suggested that respondents should be separated into six segments. However approximately 90% of respondents were grouped into two of the segments, and the numbers in the remaining four segments were therefore so small as to be meaningless.
The final approach taken was to first use PCA and factor analysis to create a small number of new variables that were linear combinations of many of the original variables. By focussing on the first few components from the PCA, much of the 'noise' within the data was eliminated, but the underlying structure was retained.
Cluster analysis was performed on the factor scores, resulting in the identification of five segments of respondents, with each segment reporting particular attitudes and behaviours. As the factors used for the cluster analysis themselves were understandable, this resulted in meaningful segments.