The Global Startup Ecosystem: Automating Understanding with John Bohannon
9 Jul 2021
John Bohannon, Director of Science at Primer AI, discusses his time as the “Indiana Jones of Journalism” and explains how he helps build machines to read and write – automating intelligent understanding of large datasets.
Hosted by Jeff Kavanaugh, VP and Head of the Infosys Knowledge Institute.
“I brought my skills to bear from science journalism. I put myself in the customer's shoes, and I had a wonderful team to work with to try and create new algorithms to process text in order to help people find the information they need.”
“There is this expression that software is eating the world. Well, machine learning is eating software.”
- John Bohannon
- In 2017 natural language processing tools just were not there yet. Everything was built from scratch, and mainly rule-based heuristics. Engineers had to be the machine and figure out the step-by-step process for an algorithm to make sense of text. There was no fancy machine learning. The machine looked for patterns which the engineer figured out for the machine.
- Now, across the data science space, people are switching out all of the old heuristic-based natural language processing approaches. They are replacing them with machine learning. There is this expression that software is eating the world. Well, machine learning is eating software.
- Primer developed a tool called saliency which gives people a peek into machine learning black box. The machine, with saliency, takes a huge document and focuses in on the small information that was most useful in its classification effort. This is especially helpful for engineers to conduct root-cause analysis of errors the model makes, which they remedy.
- If you are solving a problem at the global scale, it will have large volume and diversity of information. The context in which these problems exist is less predictable and defined. That presents challenges that you do not worry much about when you have small-scale problems. You cannot define the range of inputs the algorithm will see or the range of contexts.
- Knowledge bases should be a self-updating. They should be a system that listens to the world, across all information streams that matter. It should do the passive job of keeping track of everything a person learns and cares about in their world. To this day, people still have the tedious job of entering data into a system, bit by bit, and tidying it. It just soaks up the life force of people who should be free to be synthetic and creative.
Jeff introduces John
How did you go from getting a Ph.D. in molecular biology from Oxford to becoming an investigative data journalist embedded with NATO in Afghanistan?
What do you do at Primer?
What's your thought process behind focusing on machine learning, natural language processing, et cetera?
John compares machine model learning to the black box concept.
What's the difference about solving problems at [a global] scale, versus something that's a little smaller, almost at a toy level?
What are some of the other kinds of challenges that you're looking to solve at Primer that you see now and you see around the corner?
Looking at the corporate world today, are there any specific challenges that you think should be solved, and maybe that's around the corner, beyond the tools that you have today, that maybe businesses in general should be solving, or you're excited about helping them solve, maybe in the next year or two?