npressfetimg-430.png

What Is the State of Data Science Today? – Columbia University

The book points out that the term “data science” only came to be used widely in 2010. What current use of data science could you not have imagined in 2010?

Wing: The most obvious answer is deep neural networks, an artificial intelligence approach to building a computer inspired by modeling the neural connections in the brain. Deep neural networks have a plethora of applications and are having a disruptive and transformative impact on almost every sector. Only in 2012, with the advent of big data and big compute, did the research community and then the private sector see how these networks could “solve” AI tasks such as speech recognition and image classification that had been studied since the 1960s. The breakthrough came about because of enormous amounts of digital data, data used to train deep neural networks.

Wiggins: To this, I’ll add the real pervasiveness of data science across different industries. The job description “data scientist” became prominent at LinkedIn and Facebook in the first decade of the new millennium; William Cleveland of AT&T earlier used the term in a paper in 2001 to propose a new field. But in 2010 it was an aspiration that making sense of data in a way that transforms your business could be possible not just for “big tech” companies like AT&T, Facebook, or LinkedIn, but for a wide variety of companies. It has certainly been transformative at The New York Times. Similarly, a wide variety of academic fields are now transformed by machine learning. In 2010 it was clear that machine learning was having a huge impact in a few branches of natural science, like computational biology, but now almost every academic field has a locus of research activity around how machine learning is opening up new questions and answers!

Your book outlines some of the major promises and perils of data science. If you had to name a single biggest promise of data science–something that isn’t happening yet, that you’re most excited about–what would it be?

Wing: The biggest promise of data science is to address societal challenges like health care and climate change. We can use medical images, health records, and genetic data to better predict whether someone will get a disease or even how someone might respond to a specific treatment. We can use machine learning and physics-based simulations to build better climate models. While we are seeing early forays into using AI and data science for these challenges, so much more can be done.

The biggest challenge is addressing the issue of fairness. For example, an individual judge may rule differently depending on the time of day and different judges may rule differently depending on their own biases. Using automated tools, one hopes to smooth out those differences in judgment. However, current AI techniques, such as deep neural networks, rely on large amounts of data to build such an automated decision system. If historical data is used to produce this system, then it will capture and reflect the same biased human judgments of the past. What we’ve discovered is that it is difficult technically and philosophically to build “fair” systems.

I am currently advocating a research agenda called “Trustworthy AI” which is a call to arms for three computer science communities—the AI community, the cybersecurity community, and the formal methods community—to work together to address both the promise and perils of AI. 

What are you each teaching this year at Columbia?

Wing: In spring 2019 I taught a graduate-level course on privacy-preserving technologies. Based on my work at Microsoft, I wanted our students to know that there exist industry-strength point solutions to point problems in privacy. These scalable computational solutions draw on hardware, cryptography, statistics, and mathematics. These ideas made it into chapter 10 of our book.

Wiggins: In the fall I teach the capstone course for applied mathematics majors, working with groups of students to do original research on topics of their own interest, and to present to their peers. Over the decades I’ve taught this class, more and more projects have been around data, machine learning, and the impact of data. This term we had presentations on gerrymandering and mathematical modeling of migration, for example. Students are able to do analyses they couldn’t have done years ago, with great open-source machine learning methods; what’s more, students are far more aware of the ethical consequences of these methods. It’s continually a class in which the students teach me the future.

In the spring, professor Matt Jones and I will teach our “Data: Past, Present, and Future” course again. Developing this class has really opened my eyes to a historical appreciation for data, and how our world came to be shaped by data and data-empowered algorithms. One lesson here is that the future is in our hands, with no fate but what we make. In class, we discuss it as an unstable three-player game among corporations, governments, and the individuals who provide the data and talent to these corporations. I’m optimistic about how our students, both technologists and humanists alike, are so engaged with understanding data and our role in shaping data’s future.

Source: https://news.google.com/__i/rss/rd/articles/CBMiPGh0dHBzOi8vbmV3cy5jb2x1bWJpYS5lZHUvbmV3cy93aGF0LXN0YXRlLWRhdGEtc2NpZW5jZS10b2RhedIBAA?oc=5

Related Posts