Blog Archive

24 February 2013

On data access and digital dissent


Jeremy Hammond on Aaron Swartz and the Criminalization of Digital Dissent.

A statement released, Feb 20th, by Jeremy Hammonds lawyer. This is Jeremy Hammond in his own words, from solitary confinement.
 Jeremy Hammond on Aaron Swartz and the Criminalization of Digital Dissent
 The tragic death of internet freedom fighter Aaron Swartz reveals the government’s flawed “cyber security strategy” as well as its systematic corruption involving computer crime investigations, intellectual property law, and government/corporate transparency. In a society supposedly based on principles of democracy and due process, Aaron’s efforts to liberate the internet, including free distribution of JSTOR academic essays, access to public court records on PACER, stopping the passage of SOPA/PIPA, and developing the Creative Commons, make him a hero, not a criminal. It is not the “crimes” Aaron may have committed that made him a target of federal prosecution, but his ideas – elaborated in his “Guerrilla Open Access Manifesto” – that the government has found so dangerous. The United States Attorney’s aggressive prosecution, riddled with abuse and misconduct, is what led to the death of this hero. This sad and angering chapter should serve as a wake up call for all of us to acknowledge the danger inherent in our criminal justice system.
Aaron’s case is part of the recent aggressive, politically-motivated expansion of computer crime law where hackers and activists are increasingly criminalized because of alleged “cyber-terrorist” threats. The United States Attorney for the Southern District of New York, Preet Bharara, whose office is prosecuting me and my co-defendants in the Lulzsec indictment, has used alarmist rhetoric such as the threat of an imminent “Pearl Harbor like cyber attack” to justify these prosecutions. At the same time the government routinely trains and deploys their own hackers to launch sophisticated cyber attacks against the infrastructure of foreign countries, such as the Stuxnet and Flame viruses, without public knowledge, oversight, declarations of war, or consent from international authorities. DARPA, US Cyber Command, the NSA, and numerous federally-contracted private corporations openly recruit hackers to develop defensive and offensive capabilities and build Orwellian digital surveillance networks, designed not to enhance national security but to advance U.S. imperialism. They even attend and speak at hacker conferences, such as DEFCON, offer to bribe hackerspaces for their research, and created the insulting “National Civic Hacker Day” – efforts which should be boycotted or confronted every step of the way.
Aaron is a hero because he refused to play along with the government’s agenda, instead he used his brilliance and passion to create a more transparent society. Through the free software movement, open publishing and file sharing, and development of cryptography and anonymity technology, digital activists have revealed the poverty of neo-liberalism and intellectual property. Aaron opposed reducing everything to a commodity to be bought or sold for a profit.
The rise in effectiveness of, and public support for, movements like Anonymous and Wikileaks has led to an expansion of computer crime investigations – most importantly enhancements to 18 U.S.C § 1030, the Computer Fraud and Abuse Act (CFAA). Over the years the CFAA has been amended five times and has gone through a number of important court rulings that have greatly expanded what the act covers concerning “accessing a protected computer without authorization.” It is now difficult to determine exactly what conduct would be considered legal. The definition of a “protected computer” has been incrementally expanded to include any government or corporate computer in or outside the U.S. “Authorization,” not explicitly defined by the CFAA, has also been expanded to be so ambiguous that any use of a website, network, or PC that is outside of the interest, agenda, or contractual obligations of a private or government entity could be criminalized. In Aaron’s case and others the government has defined violating a service’s Acceptable Use Policy (AUP), Terms of Service (TOS), or End-User License Agreement (EULA) as illegal. Every time you sign up for a service like Gmail, Hotmail, or Facebook and click the “I agree” button that follows a long contract that no one ever reads, you could be prosecuted under the CFAA if you violate any of the terms.
The sheer number of everyday computer users who could be considered criminals under these broad and ambiguous definitions enables the politically motivated prosecution of anyone who voices dissent. The CFAA should be found unconstitutional under the void-for-vagueness doctrine of the due process clause. Instead, Congress proposed bills last year which would double the statutory maximum sentences and introduce mandatory minimum sentences, similar to the excessive sentences imposed in drug cases which have been widely opposed by many federal and state judges.
The “Operation Payback” case in San Jose, California is another miscarriage of justice where 16 suspected Anonymous members (including a 16 year old boy) allegedly participated in a denial-of-service action against PayPal in protest of it’s financial blockade of Wikileaks. Denial-of-service does not “exceed authorized access,” as it is virtually indistinguishable from standard web requests. It is more akin to an electronic sit-in protest, overloading the website’s servers making it incapable of serving legitimate traffic, than a criminal act involving stolen private information or destruction of servers. PayPal’s website was only slow or unavailable for a matter of hours, yet these digital activists face prison time of more that 10 years, $250,000 in fines, and felony convictions because the government wants to criminalize this form of internet protest and send a warning to would be Wikileaks supporters.
Another recent case is that of Andrew “Weev” Auernheimer, who last November was convicted under the CFAA. Andrew discovered that AT&T was publishing customer names and email addresses on it’s public-facing website, without password protection, encryption, or firewalls. Instead of acknowledging their own mistake in violating customer privacy, AT&T sought prison time for Andrew. Andrew has defended his actions saying, “We have not only a right as Americans to analyze things that corporations publish and make publicly accessible but perhaps a moral obligation to tell people about it.”
I am currently facing multiple computer hacking conspiracy charges due to my alleged involvement with Anonymous, LulzSec, andAntiSec, groups which have targeted and exposed corruption in government institutions and corporations such as Stratfor, The Arizona Department of Public Safety, and HB Gary Federal. My potential sentence is dramatically increased because the Patriot Act expanded the CFAA’s definition of “loss.” This allowed Stratfor to claim over 5 million dollars in damages, including the exorbitant cost of hiring outside credit protection agencies and “infosec” corporations, purchasing new servers, 1.6 million dollars in “lost potential revenue” for the time their website was down, and even the cost of a 1.3 million dollar settlement for a class action lawsuit filed against them. Coupled with use of “sophisticated means” and “affecting critical infrastructure” sentence enhancements, if convicted at trial I am facing a sentence of 30-years-to-life.
Dirty trial tactics and lengthy sentences are not anomalies but are part of a fundamentally flawed and corrupt two-tiered system of “justice” which seeks to reap profits from the mass incarceration of millions, especially people of color and the impoverished. The use of informants who cooperate in exchange for lighter sentences is not just utilized in the repressive prosecutions of protest movements and manufactured “terrorist” Islamophobic witch-hunts, but also in most drug cases, where defendants face some of the harshest sentences in the world.
For Aaron Swartz, himself facing 13 felony CFAA charges, it is likely that it was this intense pressure from relentless and uncompromising prosecutors, who, while being aware of Aaron’s psychological fragility, continued to demand prison time, that led to his untimely death.
Due to widespread public outrage, there is talk of congressional investigations into the CFAA. But since the same Congress had proposed increased penalties not even one year ago, any efforts at reform are unlikely to be more than symbolic. What is needed is not reform but total transformation; not amendments but abolition. Aaron is a hero to me because he did not wait for those in power to realize his vision and change their game, he sought to change the game himself, and he did so without fear of being labeled a criminal and imprisoned by a backwards system of justice.
We the people demand free and equal access to information and technology. We demand transparency and accountability from governments and big corporations, and privacy for the masses from invasive surveillance networks.
The government will never be forgiven. Aaron Swartz will never be forgotten.

Event page for Courthouse Support Rally https://www.facebook.com/events/567165369962801/
Get involved with the Jeremy Hammond Support Network
FreeHammond.org
On Facebook https://www.facebook.com/supporthammond
Twitter https://twitter.com/Free_Hammond
The address to mail postcards to Jeremy is as follows:
Jeremy Hammond – #18729-424, Metropolitan Correctional Center, 150 Park Row , New York, New York, 10007

23 February 2013

Algos for big data

http://www.industryweek.com/emerging-technologies/algorithms-silent-game-changer-big-data


Algorithms: The Silent Game Changer in Big Data

By Radhika Subramanian, CEO of Emcien.

To keep pace with the Big Data explosion, we must take the human element out of the equation and let the computers take over.

Hype is synonymous with Big Data. But data is nothing new.
What’s new is that manufacturers are finally learning how to draw critical insights from that data for competitive advantage. The key to that, of course, is having the knowledge and understanding to spend wisely on knowledge discovery and getting the data to talk to us.
According to the results of a recent Gartner study, enterprise data is on pace to grow 650% over the next few years, with 80% of that data in unstructured form. This is more than any data scientist -- or person -- can manage.
People’s ability to process data hasn’t changed much over the years, but the technology we employ to help us process it changes constantly. Therein lays the promise of Big Data. Specifically, Big Data analytics that rely on algorithmsare our only hope of keeping up with the data explosion.

Algorithms and Big Data

Every time we have faced technology challenges that are computationally intensive, algorithms have come to our rescue, helped us to overcome the barriers and we have pushed the frontier of innovation. A few good examples are:
  • Operating systems are algorithms that enable computers to interact with humans. Without operating systems there would no PCs, laptops or smart phones.
  • An algorithm that defines how content is stored and transmitted enables the Internet, or the World Wide Web. These were world-changing algorithms and today we have more than 200 Billion web sites at our disposal.
  • Google’s ranked search is an algorithm that lets us quickly search for highly relevant content on the web site. 
And yes, there are many more. What they all share in common in the ability to break through barriers that limit our ability to reach across get to the next frontier of innovation. Big data poses a ton of such barriers -- and smart algorithms will help us to overcome these barrier.
The biggest problem with a traditional approach to data is that it’s typically an extension of old thinking and hence time-consuming because it may fail to consider the barriers facing us today -- the volume of data and the cost and time to sift it.
It’s time for an entirely new approach that addresses these growing needs.
This new approach demands a paradigm shift that focuses on the following:
• A fundamental change in the role played by analysts from data-miners to insight-evaluators.
• Fast and efficient methods that automatically convert data to insight for analysts to evaluate and operationalize.
• Continual improvement of these automation methods to keep up with the speed of data and critical need for timely insights.
The next wave of challenges is upon us. How can we scale these efforts to automate getting the gold from the data?
The data-scientist approach is labor intensive, time and cost prohibitive -- and most importantly cannot scale to the speed of data. The new approaches need to automate the process just like processes before them such as automated mining of gold and other precious metals from dirt. The analogy is so very appropriate.
Manufacturing companies are finally looking to harness the power of Big Databecause it drives enormous opportunity for business improvement; however, this is still just the first inning of the game, and I would caution against investing too heavily in approaches that rely in hardware and staffing increases.
If a method you’re considering is intrusive, watch out! It’s not sustainable.
Ask yourself how much training the new tool is going to require, and whether or not it demands hiring a lot of experts to manage. Do you really need a team of data scientists to surface insight from data? How smoothly will it fit into your current business to help drive intelligent decisions with insight from data. The best approaches are ones that are sustainable.
And let me leave you with one more question to ponder as you go about your busy day -- Why is it that we have so much data, yet less than 1% of data is analyzed? It is because we have been so consumed, due to our hoarding mentality, with saving all this data while thinking not enough of how to get to the gold in them thar hills!

Emcien CEO Radhika Subramanian is a seasoned entrepreneur with decades of experience helping organizations utilize the insight buried within their data. Tweet to @RadhikaAtEmcien & join her Big Data Apps LinkedIn group.


14 February 2013

Data Scientist: The Sexiest Job of the 21st Century

1 Jan 2015: Here is a link to a more recent article on data science: http://insidebigdata.com/2014/03/14/death-data-science-greatly-exagerated/

---

http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/pr

Data Scientist: The Sexiest Job of the 21st Century

by Thomas H. Davenport and D.J. Patil
When Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a start-up. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren’t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently missing in the social experience. As one LinkedIn manager put it, “It was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.”
Goldman, a PhD in physics from Stanford, was intrigued by the linking he did see going on and by the richness of the user profiles. It all made for messy data and unwieldy analysis, but as he began exploring people’s connections, he started to see possibilities. He began forming theories, testing hunches, and finding patterns that allowed him to predict whose networks a given profile would land in. He could imagine that new features capitalizing on the heuristics he was developing might provide value to users. But LinkedIn’s engineering team, caught up in the challenges of scaling up the site, seemed uninterested. Some colleagues were openly dismissive of Goldman’s ideas. Why would users need LinkedIn to figure out their networks for them? The site already had an address book importer that could pull in all a member’s connections.
Luckily, Reid Hoffman, LinkedIn’s cofounder and CEO at the time (now its executive chairman), had faith in the power of analytics because of his experiences at PayPal, and he had granted Goldman a high degree of autonomy. For one thing, he had given Goldman a way to circumvent the traditional product release cycle by publishing small modules in the form of ads on the site’s most popular pages.
Through one such module, Goldman started to test what would happen if you presented users with names of people they hadn’t yet connected with but seemed likely to know—for example, people who had shared their tenures at schools and workplaces. He did this by ginning up a custom ad that displayed the three best new matches for each user based on the background entered in his or her LinkedIn profile. Within days it was obvious that something remarkable was taking place. The click-through rate on those ads was the highest ever seen. Goldman continued to refine how the suggestions were generated, incorporating networking ideas such as “triangle closing”—the notion that if you know Larry and Sue, there’s a good chance that Larry and Sue know each other. Goldman and his team also got the action required to respond to a suggestion down to one click.
It didn’t take long for LinkedIn’s top managers to recognize a good idea and make it a standard feature. That’s when things really took off. “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. They generated millions of new page views. Thanks to this one feature, LinkedIn’s growth trajectory shifted significantly upward.
A New Breed
Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.
Much of the current enthusiasm for big data focuses on technologies that make taming it possible, including Hadoop (the most widely used framework for distributed file system processing) and related open-source tools, cloud computing, and data visualization. While those are important breakthroughs, at least as important are the people with the skill set (and the mind-set) to put them to good use. On this front, demand has raced ahead of supply. Indeed, the shortage of data scientists is becoming a serious constraint in some sectors. Greylock Partners, an early-stage venture firm that has backed companies such as Facebook, LinkedIn, Palo Alto Networks, and Workday, is worried enough about the tight labor pool that it has built its own specialized recruiting team to channel talent to businesses in its portfolio. “Once they have data,” says Dan Portillo, who leads that team, “they really need people who can manage it and find insights in it.”
Who Are These People?
If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive. None of those tasks is as straightforward as it is with other, established organizational roles. Start with the fact that there are no university programs offering degrees in data science. There is also little consensus on where the role fits in an organization, how data scientists can add the most value, and how their performance should be measured.
The first step in filling the need for data scientists, therefore, is to understand what they do in businesses. Then ask, What skills do they need? And what fields are those skills most readily found in?
More than anything, what data scientists do is make discoveries while swimming in data. It’s their preferred method of navigating the world around them. At ease in the digital realm, they are able to bring structure to large quantities of formless data and make analysis possible. They identify rich data sources, join them with other, potentially incomplete data sources, and clean the resulting set. In a competitive landscape where challenges keep changing and data never stop flowing, data scientists help decision makers shift from ad hoc analysis to an ongoing conversation with data.
Data scientists realize that they face technical limitations, but they don’t allow that to bog down their search for novel solutions. As they make discoveries, they communicate what they’ve learned and suggest its implications for new business directions. Often they are creative in displaying information visually and making the patterns they find clear and compelling. They advise executives and product managers on the implications of the data for products, processes, and decisions.
Given the nascent state of their trade, it often falls to data scientists to fashion their own tools and even conduct academic-style research. Yahoo, one of the firms that employed a group of data scientists early on, was instrumental in developing Hadoop. Facebook’s data team created the language Hive for programming Hadoop projects. Many other data scientists, especially at data-driven companies such as Google, Amazon, Microsoft, Walmart, eBay, LinkedIn, and Twitter, have added to and refined the tool kit.
What kind of person does all this? What abilities make a data scientist successful? Think of him or her as a hybrid of data hacker, analyst, communicator, and trusted adviser. The combination is extremely powerful—and rare.
Data scientists’ most basic, universal skill is the ability to write code. This may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards. More enduring will be the need for data scientists to communicate in language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both.
But we would say the dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distill them into a very clear set of hypotheses that can be tested. This often entails the associative thinking that characterizes the most creative scientists in any field. For example, we know of a data scientist studying a fraud problem who realized that it was analogous to a type of DNA sequencing problem. By bringing together those disparate worlds, he and his team were able to craft a solution that dramatically reduced fraud losses.
Perhaps it’s becoming clear why the word “scientist” fits this emerging role. Experimental physicists, for example, also have to design equipment, gather data, conduct multiple experiments, and communicate their results. Thus, companies looking for people who can work with complex data have had good luck recruiting among those with educational and work backgrounds in the physical or social sciences. Some of the best and brightest data scientists are PhDs in esoteric fields like ecology and systems biology. George Roumeliotis, the head of a data science team at Intuit in Silicon Valley, holds a doctorate in astrophysics. A little less surprisingly, many of the data scientists working in business today were formally trained in computer science, math, or economics. They can emerge from any field that has a strong data and computational focus.
It’s important to keep that image of the scientist in mind—because the word “data” might easily send a search for talent down the wrong path. As Portillo told us, “The traditional backgrounds of people you saw 10 to 15 years ago just don’t cut it these days.” A quantitative analyst can be great at analyzing data but not at subduing a mass of unstructured data and getting it into a form in which it can be analyzed. A data management expert might be great at generating and organizing data in structured form but not at turning unstructured data into structured data—and also not at actually analyzing the data. And while people without strong social skills might thrive in traditional data professions, data scientists must have such skills to be effective.
Roumeliotis was clear with us that he doesn’t hire on the basis of statistical or analytical capabilities. He begins his search for data scientists by asking candidates if they can develop prototypes in a mainstream programming language such as Java. Roumeliotis seeks both a skill set—a solid foundation in math, statistics, probability, and computer science—and certain habits of mind. He wants people with a feel for business issues and empathy for customers. Then, he says, he builds on all that with on-the-job training and an occasional course in a particular technology.
Several universities are planning to launch data science programs, and existing programs in analytics, such as the Master of Science in Analytics program at North Carolina State, are busy adding big data exercises and coursework. Some companies are also trying to develop their own data scientists. After acquiring the big data firm Greenplum, EMC decided that the availability of data scientists would be a gating factor in its own—and customers’—exploitation of big data. So its Education Services division launched a data science and big data analytics training and certification program. EMC makes the program available to both employees and customers, and some of its graduates are already working on internal big data initiatives.
As educational offerings proliferate, the pipeline of talent should expand. Vendors of big data technologies are also working to make them easier to use. In the meantime one data scientist has come up with a creative approach to closing the gap. The Insight Data Science Fellows Program, a postdoctoral fellowship designed by Jake Klamka (a high-energy physicist by training), takes scientists from academia and in six weeks prepares them to succeed as data scientists. The program combines mentoring by data experts from local companies (such as Facebook, Twitter, Google, and LinkedIn) with exposure to actual big data challenges. Originally aiming for 10 fellows, Klamka wound up accepting 30, from an applicant pool numbering more than 200. More organizations are now lining up to participate. “The demand from companies has been phenomenal,” Klamka told us. “They just can’t get this kind of high-quality talent.”
Why Would a Data Scientist Want to Work Here?
Even as the ranks of data scientists swell, competition for top talent will remain fierce. Expect candidates to size up employment opportunities on the basis of how interesting the big data challenges are. As one of them commented, “If we wanted to work with structured data, we’d be on Wall Street.” Given that today’s most qualified prospects come from nonbusiness backgrounds, hiring managers may need to figure out how to paint an exciting picture of the potential for breakthroughs that their problems offer.
Pay will of course be a factor. A good data scientist will have many doors open to him or her, and salaries will be bid upward. Several data scientists working at start-ups commented that they’d demanded and got large stock option packages. Even for someone accepting a position for other reasons, compensation signals a level of respect and the value the role is expected to add to the business. But our informal survey of the priorities of data scientists revealed something more fundamentally important. They want to be “on the bridge.” The reference is to the 1960s television show Star Trek, in which the starship captain James Kirk relies heavily on data supplied by Mr. Spock. Data scientists want to be in the thick of a developing situation, with real-time awareness of the evolving set of choices it presents.
Considering the difficulty of finding and keeping data scientists, one would think that a good strategy would involve hiring them as consultants. Most consulting firms have yet to assemble many of them. Even the largest firms, such as Accenture, Deloitte, and IBM Global Services, are in the early stages of leading big data projects for their clients. The skills of the data scientists they do have on staff are mainly being applied to more-conventional quantitative analysis problems. Offshore analytics services firms, such as Mu Sigma, might be the ones to make the first major inroads with data scientists.
But the data scientists we’ve spoken with say they want to build things, not just give advice to a decision maker. One described being a consultant as “the dead zone—all you get to do is tell someone else what the analyses say they should do.” By creating solutions that work, they can have more impact and leave their marks as pioneers of their profession.
Care and Feeding
Data scientists don’t do well on a short leash. They should have the freedom to experiment and explore possibilities. That said, they need close relationships with the rest of the business. The most important ties for them to forge are with executives in charge of products and services rather than with people overseeing business functions. As the story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes.
LinkedIn isn’t the only company to use data scientists to generate ideas for products, features, and value-adding services. At Intuit data scientists are asked to develop insights for small-business customers and consumers and report to a new senior vice president of big data, social design, and marketing. GE is already using data science to optimize the service contracts and maintenance intervals for industrial products. Google, of course, uses data scientists to refine its core search and ad-serving algorithms. Zynga uses data scientists to optimize the game experience for both long-term engagement and revenue. Netflix created the well-known Netflix Prize, given to the data science team that developed the best way to improve the company’s movie recommendation system. The test-preparation firm Kaplan uses its data scientists to uncover effective learning strategies.
There is, however, a potential downside to having people with sophisticated skills in a fast-evolving field spend their time among general management colleagues. They’ll have less interaction with similar specialists, which they need to keep their skills sharp and their tool kit state-of-the-art. Data scientists have to connect with communities of practice, either within large firms or externally. New conferences and informal associations are springing up to support collaboration and technology sharing, and companies should encourage scientists to become involved in them with the understanding that “more water in the harbor floats all boats.”
Data scientists tend to be more motivated, too, when more is expected of them. The challenges of accessing and structuring big data sometimes leave little time or energy for sophisticated analytics involving prediction or optimization. Yet if executives make it clear that simple reports are not enough, data scientists will devote more effort to advanced analytics. Big data shouldn’t equal “small math.”
The Hot Job of the Decade
Hal Varian, the chief economist at Google, is known to have said, “The sexy job in the next 10 years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”
If “sexy” means having rare qualities that are much in demand, data scientists are already there. They are difficult and expensive to hire and, given the very competitive market for their services, difficult to retain. There simply aren’t a lot of people with their combination of scientific background and computational and analytical skills.
Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s. In those days people with backgrounds in physics and math streamed to investment banks and hedge funds, where they could devise entirely new algorithms and data strategies. Then a variety of universities developed master’s programs in financial engineering, which churned out a second generation of talent that was more accessible to mainstream firms. The pattern was repeated later in the 1990s with search engineers, whose rarefied skills soon came to be taught in computer science programs.
One question raised by this is whether some firms would be wise to wait until that second generation of data scientists emerges, and the candidates are more numerous, less expensive, and easier to vet and assimilate in a business setting. Why not leave the trouble of hunting down and domesticating exotic talent to the big data start-ups and to firms like GE and Walmart, whose aggressive strategies require them to be at the forefront?
The problem with that reasoning is that the advance of big data shows no signs of slowing. If companies sit out this trend’s early days for lack of talent, they risk falling behind as competitors and channel partners gain nearly unassailable advantages. Think of big data as an epic wave gathering now, starting to crest. If you want to catch it, you need people who can surf.
Thomas H. Davenport is a visiting professor at Harvard Business School, a senior adviser to Deloitte Analytics, and a coauthor of Judgment Calls (Harvard Business Review Press, 2012). D.J. Patil is the data scientist in residence at Greylock Partners, was formerly the head of data products at LinkedIn, and is the author of Data Jujitsu: The Art of Turning Data into Product (O’Reilly Media, 2012).