By Ray Rivera, Director, Solutions Management, Workforce Planning and Analytics, SAP
For the past year I have been commuting to the Palo Alto office from Monterey, a distance of 90 miles. The journey often is completed in two segments, one of which is by train from Gilroy to Palo Alto on Caltrain, a commuter rail line serving San Francisco and Silicon Valley. During the first six months I tended to gaze out at scenery in the distance and be taken by the beauty of the golden foothills decked by stately oak trees.
But lately my attention has been focused on the untidy region near the tracks, where there is much evidence of vigorous, edge-of society activity, and a lot of wisdom to be gained by watching with an analytical eye.
Sometimes the beautiful is not nearly as instructive as the ugly. And likewise, the grungy world of rail lines reveals some worthwhile wisdom about analytics.
Who is sleeping here?
One of the few areas in the U.S. where the unemployment rate has been falling in 2012 is Silicon Valley and the Bay Area in general. As Caltrain commuter ridership has increased accordingly, trains have become crowded, so much so that Caltrain has increased midday service so that non commuters have more options. Yet as Bay Area employment picks up there has also been a curious increase of homeless persons camping regularly at the Gilroy station.
Surprise! They are not unemployed; they are the working poor. Broadly reduced unemployment cuts across all income levels, and in the Bay Area subminimum wage jobs and well-paid skilled tech sector jobs have been increasing simultaneously.
Inhabiting the Gilroy station also facilitates labor force participation. The station is a regional transit center served by local buses, Greyhound, and even charters to Mexico, enabling station inhabitants to be mobile. Inhabitants can head to another town to find employment on a moment’s notice. At the railway station they can be easily fetched for day labor (Gilroy is a highly productive agricultural community, noted especially for garlic and mushrooms), and they can develop a symbiotic relationship with taxi drivers, who may offer local transport or information in exchange for tasks. The station also is patrolled regularly by the local police and Amtrak employees, and is therefore safer than most other areas where they could camp.
Lesson 1: Don’t uncritically accept two observations appearing together as fact, or as evidence that one relates to the other in a meaningful way. Let the data tell you what the relationships are.
Who is living here?
After traveling about 20 miles northbound, just past Coyote Creek, homeless encampments begin to be visible from the tracks. Continuing northward to Communications Hill in south San Jose on through to Tamien and past the HP Pavilion downtown, numerous grimy tents and disheveled campsites can be seen in fields, ravines, oleander thickets, among scrub in vacant lots and groves in freeway landscaping, against bridge abutments, and even crammed between fence segments or tangled in chaparral. From the view of a train passenger they all look the same: people living in squalor, foolishly hording trash, and consuming whatever handouts can be found.
Not quite. Those who live under bridges and against abutments are much different from those who live in fields or ravines. The key differences are access to water and safety of location. Higher functioning homeless tend to locate near water sources and in sheltered, inconspicuous areas, while those more ravaged by drugs and psychiatric conditions tend to camp in more exposed areas, away from water sources, and at much higher danger.
And those overloaded shopping carts are often not evidence of shiftlessness or derangement, but rather mobility and self-sufficiency. The shopping carts both transport and guard their few possessions, indicating some degree of wherewithal. But they also are used to transport scrap metal for recycling, or items found in dumpsters that can be repaired and sold to consignment shops, or in some cases illegal drugs.
Lesson 2: Things that look the same can actually be qualitatively different. Use analytics to train the eye, and classify things that look the same but in truth are very different.
Who is murdering here?
For many years San Jose has been one of the safest large cities
in the United States. Yet during the summer of 2012, the city experienced its highest homicide rate in 15 years
, with numerous incidents believed to be gang-related violence.
Like many cities, San Jose is operating at a substantial budget deficit, exacerbated by highly visible pension obligations to retired public safety personnel. In order to maintain fiscally sound in the short-term, San Jose has been forced to lay off over 200 officers. Of course, it is very tempting to attribute a precipitous rise in violent crime to recent police layoffs
. However, the correlation between number of officers and violent crime is a controversial, inconclusive
, and politically charged measure. More practically it is a coarse figure, unlikely to have much utility in predicting future risks, or determining where the remaining resources are best allocated.
Could the patterns of homelessness near the Caltrain tracks provide better data in predicting violent crime in San Jose?
The homeless are highly vulnerable to maltreatment, so sudden movements of shantytowns could indicate areas that are becoming dangerous to inhabit. In the months leading up to the increase in homicides, many homeless encampments in the Tamien neighborhood and northwest toward the Interstate 880 overpass were abandoned or dismantled voluntarily, and personal possessions not retrieved. In some neighborhoods, encampment sites showed evidence of uncontrolled fires or vandalism, while in other neighborhoods illegally dumped residential refuse remained unscavenged. In all these neighborhoods, graffiti on the fences, walls and buildings adjacent to the Caltrain tracks were updated frequently, sometimes with highly ornate, mural-sized efforts. In analytics terms, the graffiti could have produced a text analysis to accompany the spatial analytics of encampment movements.
Lesson 3: Don’t approach analytics with the expected answer already in mind. And in particular don’t turn the input measure into the output measure, especially if the measure carries a lot of political baggage. According to Goodhart’s Law
, doing so will neutralize the information content of the measure, and according to Campbell’s Law,
likely corrupt the analysis and the processes that you are trying to improve.
What are we watching here?
Long train rides provide a lot of data and unique perspectives to analytical minds. For those of us riding Caltrain, it is easy to fix our gaze on the eternal California foothills and generate reflections from a detached, philosophical view, while ignoring everything else that is going on in the complex, imperfect, grubby world of human industry. But getting down into the grit and grime along the tracks does not grant anyone special access to insight, or impart some virtue that somehow reveals the bare truth of things. Even in the “real world” we are just as likely to miss important relationships, or let our own cleverness get in way.