Data Skeptic

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

0 Likes     0 Followers     2 Subscribers

Sign up / Log in to like, follow, recommend and subscribe!

Website
http://dataskeptic.com
Description
Data Skeptic alternates between short mini episodes with the host explaining concepts from data science to his non-data scientist wife, and longer interviews featuring practitioners and experts on interesting topics related to data, all through the eye of scientific skepticism.
Language
🇬🇧 English
last modified
2019-03-15 16:51
last episode published
2019-03-15 15:00
publication frequency
6.81 days
Contributors
Kyle Polich author   owner  
Explicit
false
Number of Episodes
259
Rss-Feeds
Detail page
Categories
Technology Education Science & Medicine Higher Education

Recommendations


Episodes

Date Thumb Title & Description Contributors
15.03.2019

Simultaneous Translation at Baidu

While at NeurIPS 2018, Kyle chatted with Liang Huang about his work with Baidu research on simultaneous translation, which was demoed at the conference.
Kyle Polich with guest Liang Huang author
8.03.2019

Human vs Machine Transcription

Machine transcription (the process of translating audio recordings of language to text) has come a long way in recent years. But how do the errors made during machine transcription compare to the errors made by a human transcriber? Find out in this epi...
Kyle Polich with guest Andreas Stolcke author
1.03.2019

seq2seq

A sequence to sequence (or seq2seq) model is neural architecture used for translation (and other tasks) which consists of an encoder and a decoder. The encoder/decoder architecture has obvious promise for machine translation, and has been successfully ...
22.02.2019

Text Mining in R

Kyle interviews Julia Silge about her path into data science, her book Text Mining with R, and some of the ways in which she's used natural language processing in projects both personal and professional. Related Links https://stack-survey-2018.glitch....
Kyle Polich with guest Julia Silge author
15.02.2019

Recurrent Relational Networks

One of the most challenging NLP tasks is natural language understanding and reasoning. How can we construct algorithms that are able to achieve human level understanding of text and be able to answer general questions about it? This is truly an open pr...
Kyle Polich with guest Rasmus Berg Palm author
8.02.2019

Text World and Word Embedding Lower Bounds

In the first half of this episode, Kyle speaks with Marc-Alexandre Côté and Wendy Tay about Text World.  Text World is an engine that simulates text adventure games.  Developers are encouraged to try out their reinforcement learning skills building age...
1.02.2019

word2vec

Word2vec is an unsupervised machine learning model which is able to capture semantic information from the text it is trained on. The model is based on neural networks. Several large organizations like Google and Facebook have trained word embeddings (t...
Kyle Polich and Linh Da Tran author
25.01.2019

Authorship Attribution

In a recent paper, Leveraging Discourse Information Effectively for Authorship Attribution, authors Su Wang, Elisa Ferracane, and Raymond J. Mooney describe a deep learning methodology for predict which of a collection of authors was the author of a gi...
Kyle Polich with guests Elisa Ferracane and Su Wang author
18.01.2019

Very Large Corpora and Zipf's Law

The earliest efforts to apply machine learning to natural language tended to convert every token (every word, more or less) into a unique feature. While techniques like stemming may have cut the number of unique tokens down, researchers always had to f...
11.01.2019

Semantic search at Github

Github is many things besides source control. It's a social network, even though not everyone realizes it. It's a vast repository of code. It's a ticketing and project management system. And of course, it has search as well. In this episode, Kyle inter...
Kyle Polich with guest Hamel Husain author
4.01.2019

Let's Talk About Natural Language Processing

This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of th...
Kyle Polich with guest Lucy Park author
28.12.2018

Data Science Hiring Processes

Kyle shares a few thoughts on mistakes observed by job applicants and also shares a few procedural insights listeners at early stages in their careers might find value in.
25.12.2018

Holiday Reading - Epicac

Epicac by Kurt Vonnegut.
21.12.2018

Drug Discovery with Machine Learning

In today's episode, Kyle chats with Alexander Zhebrak, CTO of Insilico Medicine, Inc. Insilico self describes as artificial intelligence for drug discovery, biomarker development, and aging research. The conversation in this episode explores the ways i...
14.12.2018

Sign Language Recognition

At the NeurIPS 2018 conference, Stradigi AI premiered a training game which helps players learn American Sign Language. This episode brings the first of many interviews conducted at NeurIPS 2018. In this episode, Kyle interviews Chief Data Scientist Ca...
Kyle Polich with guest Carolina Bessega author
7.12.2018

Data Ethics

 This week, Kyle interviews Scott Nestler on the topic of Data Ethics. Today, no ubiquitous, formal ethical protocol exists for data science, although some have been proposed. One example is the INFORMS Ethics Guidelines. Guidelines like this are rathe...
Kyle Polich with guest Scott Nestler author
30.11.2018

Escaping the Rabbit Hole

Kyle interviews Mick West, author of Escaping the Rabbit Hole: How to Debunk Conspiracy Theories Using Facts, Logic, and Respect about the nature of conspiracy theories, the people that believe them, and how to help people escape the belief in false in...
Kyle Polich with guest Mick West author
23.11.2018

Theorem Provers

Fake news attempts to lead readers/listeners/viewers to conclusions that are not descriptions of reality.  They do this most often by presenting false premises, but sometimes by presenting flawed logic. An argument is only sound and valid if the conclu...
16.11.2018

Automated Fact Checking

Fake news can be responded to with fact-checking. However, it's easier to create fake news than the fact check it. Full Fact is the UK's independent fact-checking organization. In this episode, Kyle interviews Mevan Babakar, head of automated fact-chec...
Kyle Polich with guest Mevan Babakar from Full Fact author
9.11.2018

Single Source of Truth

In mathematics, truth is universal.  In data, truth lies in the where clause of the query. As large organizations have grown to rely on their data more significantly for decision making, a common problem is not being able to agree on what the data is. ...
Kyle Polich and Linh Da Tran author
2.11.2018

Detecting Fast Radio Bursts with Deep Learning

Fast radio bursts are an astrophysical phenomenon first observed in 2007. While many observations have been made, science has yet to explain the mechanism for these events. This has led some to ask: could it be a form of extra-terrestrial communication...
Kyle with guest Gerry Zhang author
26.10.2018

Being Bayesian

This episode explores the root concept of what it is to be Bayesian: describing knowledge of a system probabilistically, having an appropriate prior probability, know how to weigh new evidence, and following Bayes's rule to compute the revised distribu...
Kyle Polich and Linhda Tran author
19.10.2018

Modeling Fake News

This is our interview with Dorje Brody about his recent paper with David Meier, How to model fake news. This paper uses the tools of communication theory and a sub-topic called filtering theory to describe the mathematical basis for an information chan...
Kyle Polich with guest Dorje Brody author
12.10.2018

louvain

Without getting into definitions, we have an intuitive sense of what a "community" is. The Louvain Method for Community Detection is one of the best known mathematical techniques designed to detect communities. This method requires typical graph data i...
5.10.2018

Cultural Cognition of Scientific Consensus

In this episode, our guest is Dan Kahan about his research into how people consume and interpret science news. In an era of fake news, motivated reasoning, and alternative facts, important questions need to be asked about how people understand new info...
Kyle Polich with guest Dan Kahan author
28.09.2018

False Discovery Rates

A false discovery rate (FDR) is a methodology that can be useful when struggling with the problem of multiple comparisons. In any experiment, if the experimenter checks more than one dependent variable, then they are making multiple comparisons. Natura...
21.09.2018

Deep Fakes

Digital videos can be described as sequences of still images and associated audio. Audio is easy to fake. What about video? A video can easily be broken down into a sequence of still images replayed rapidly in sequence. In this context, videos are simp...
Kyle Polich with guest Siwei Lyu author
14.09.2018

Fake News Midterm

In this episode, Kyle reviews what we've learned so far in our series on Fake News and talks briefly about where we're going next.
7.09.2018

Quality Score

Two weeks ago we discussed click through rates or CTRs and their usefulness and limits as a metric. Today, we discuss a related metric known as quality score. While that phrase has probably been used to mean dozens of different things in different cont...
31.08.2018

The Knowledge Illusion

Kyle interviews Steven Sloman, Professor in the school of Cognitive, Linguistic, and Psychological Sciences at Brown University. Steven is co-author of The Knowledge Illusion: Why We Never Think Alone and Causal Models: How People Think about the World...
Kyle Polich with guest Steven Sloman author
24.08.2018

Click Through Rates

A Click Through Rate (CTR) is the proportion of clicks to impressions of some item of content shared online. This terminology is most commonly used in digital advertising but applies just as well to content websites might choose to feature on their hom...
17.08.2018

Algorithmic Detection of Fake News

The scale and frequency with which information can be distributed on social media makes the problem of fake news a rapidly metastasizing issue. To do any content filtering or labeling demands an algorithmic solution. In today's episode, Kyle interviews...
Kyle Polich with guests Mike Tamir and Kai Shu author
10.08.2018

Ant Intelligence

If you prepared a list of creatures regarded as highly intelligent, it's unlikely ants would make the cut. This is expected, as on an individual level, ants do not generally display behavior that most humans would regard as intelligence. In fact, it mi...
Kyle Polich with guest Deborah Gordon author
3.08.2018

Human Detection of Fake News

With publications such as "Prior exposure increases perceived accuracy of fake news", "Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning", and "The science of fake news", Gordon ...
Kyle Polich with guest Gordon Pennycook author
27.07.2018

Spam Filtering with Naive Bayes

Today's spam filters are advanced data driven tools. They rely on a variety of techniques to effectively and often seamlessly filter out junk email from good email. Whitelists, blacklists, traffic analysis, network analysis, and a variety of other tool...
20.07.2018

The Spread of Fake News

How does fake news get spread online? Its not just a matter of manipulating search algorithms. The social platforms for sharing play a major role in the distribution of fake news. But how significant of an impact can there be? How significantly can bot...
Kyle Polich and Filippo Menczer author
13.07.2018

Fake News

This episode kicks off our new theme of "Fake News" with guests Robert Sheaffer and Brad Schwartz. Fake news is a new label for an old idea. For our purposes, we will define fake news information created to deliberately mislead while masquerading as a ...
Kyle Polich with Robert Shaeffer and A. Brad Schwartz author
11.07.2018

Dev Ops for Data Science

We revisit the 2018 Microsoft Build in this episode, focusing on the latest ideas in DevOps. Kyle interviews Cloud Developer Advocates Damien Brady, Paige Bailey, and Donovan Brown to talk about DevOps and data science and databases. For a data scienti...
Kyle Polich with Damien Brady, Paige Bailey, and Donovan Brown author
6.07.2018

First Order Logic

Logic is a fundamental of mathematical systems. It's roots are the values true and false and it's power is in what it's rules allow you to prove. Prepositional logic provides it's user variables. This episode gets into First Order Logic, an extension t...
29.06.2018

Blind Spots in Reinforcement Learning

An intelligent agent trained in a simulated environment may be prone to making mistakes in the real world due to discrepancies between the training and real-world conditions. The areas where an agent makes mistakes are hard to find, known as "blind spo...
Kyle Polich with guest Ramya Ramakrishnan author
22.06.2018

Defending Against Adversarial Attacks

In this week’s episode, our host Kyle interviews Gokula Krishnan from ETH Zurich, about his recent contributions to defenses against adversarial attacks. The discussion centers around his latest paper, titled “Defending Against Adversarial Attacks by L...
15.06.2018

Transfer Learning

On a long car ride, Linhda and Kyle record a short episode. This discussion is about transfer learning, a technique using in machine learning to leverage training from one domain to have a head start learning in another domain. Transfer learning has so...
Kyle Polich author
8.06.2018

Medical Imaging Training Techniques

Medical imaging is a highly effective tool used by clinicians to diagnose a wide array of diseases and injuries. However, it often requires exceptionally trained specialists such as radiologists to interpret accurately. In this episode of Data Skeptic,...
1.06.2018

Kalman Filters

Thanks to our sponsor Galvanize A Kalman Filter is a technique for taking a sequence of observations about an object or variable and determining the most likely current state of that object. In this episode, we discuss it in the context of tracking our...
Kyle Polich and Linh Da Tran author
25.05.2018

AI in Industry

There's so much to discuss on the AI side, it's hard to know where to begin. Luckily,  Steve Guggenheimer, Microsoft’s corporate vice president of AI Business, and Carlos Pessoa, a software engineering manager for the company’s Cloud AI Platform, talke...
18.05.2018

AI in Games

Today's interview is with the authors of the textbook Artificial Intelligence and Games.
11.05.2018

game-theory

00000374 00000371 0000661B 00005D1F 00005582 00005582 000077C6 00007EFE 0015D205 0015CC1A
Kyle Polich and Linh Da Tran author
4.05.2018

The Experimental Design of Paranormal Claims

In this episode of Data Skeptic, Kyle chats with Jerry Schwarz from the Independent Investigations Group (IIG)'s SF Bay Area chapter about testing claims of the paranormal. The IIG is a volunteer-based organization dedicated to investigating paranormal...
Kyle Polich with guest Jerry Schwartz from the Independent Investigations Group author
27.04.2018

Winograd Schema Challenge

Our guest this week, Hector Levesque, joins us to discuss an alternative way to measure a machine’s intelligence, called Winograd Schemas Challenge. The challenge was proposed as a possible alternative to the Turing test during the 2011 AAAI Spring Sym...
Kyle Polich with guest Hector Levesque author
20.04.2018

The Imitation Game

This week on Data Skeptic, we begin with a skit to introduce the topic of this show: The Imitation Game. We open with a scene in the distant future. The year is 2027, and a company called Shamony is announcing their new product, Ada, the most advanced ...