When algorithms surprise us

Machine learning algorithms are not like other computer programs. In the usual sort of programming, a human programmer tells the computer exactly what to do. In machine learning, the human programmer merely gives the algorithm the problem to be solved, and through trial-and-error the algorithm has to figure out how to solve it.

This often works really well - machine learning algorithms are widely used for facial recognition, language translation, financial modeling, image recognition, and ad delivery. If you’ve been online today, you’ve probably interacted with a machine learning algorithm.

But it doesn’t always work well. Sometimes the programmer will think the algorithm is doing really well, only to look closer and discover it’s solved an entirely different problem from the one the programmer intended. For example, I looked earlier at an image recognition algorithm that was supposed to recognize sheep but learned to recognize grass instead, and kept labeling empty green fields as containing sheep.

Source: Letting neural networks be weird • When algorithms surprise us

There are so many really interesting examples she has collected here, and show us the power and danger of black boxes. In a lot of ways machine learning is just an extreme case of all software. People tend to write software on an optimistic path, and ship it after it looks like it's doing what they intended. When it doesn't, we call that a bug.

The difference between traditional approaches and machine learning, is debugging machine learning is far harder. You can't just put an extra if condition in, because the logic to get an answer isn't expressed that way. It's expressed in 100,000 weights on a 4 level convolution network. Which means QA is much harder, and Machine Learning is far more likely to surprise you with unexpected wrong answers on edge conditions.

CAFE standard of 55mpg seem high? It's not the real number, and the real number is a lot more interesting.

If automakers complied with the rules solely by improving the fuel economy of their engines, new cars and light trucks on the road would average more than 50 miles per gallon by 2025 (the charts here break out standards for cars and light trucks separately). But automakers in the United States have some flexibility in meeting these standards. They can, for instance, get credit for using refrigerants in vehicle air-conditioning units that contribute less to global warming, or get credit for selling more electric vehicles.

Once those credits and testing procedures are factored in, analysts expected that new cars and light trucks sold in the United States would have averaged about 36 miles per gallon on the road by 2025 under the Obama-era rules, up from about 24.7 miles per gallon in 2016. Automakers like Tesla that sold electric vehicles also would have benefited from the credit system.

Source: How U.S. Fuel Economy Standards Compare With the Rest of the World’s - The New York Times

This is one of those areas where most reporting on the CAFE standard rollback has been terrible. You tell people the new CAFE standard is 55 mpg, and they look at their SUV, and say, that's impossible. With diesel off the table after the VW standard, only the best hybrids today are in that 55 mpg range. How could that be the average?

But it's not, it's 55 mpg equivalent. You get credit for lots of other things. EVs in the fleet, doing a better job on refrigerant switch over. 2025 would see a real fleet average of around 36 mpg if this was kept in place.

More importantly is that in rolling back this standard it's going to make US car companies less competitive. The rest of the world is going here, and US not just means companies that don't hit these marks have a shrinking global market.

The future of scientific papers

The more sophisticated science becomes, the harder it is to communicate results. Papers today are longer than ever and full of jargon and symbols. They depend on chains of computer programs that generate data, and clean up data, and plot data, and run statistical models on data. These programs tend to be both so sloppily written and so central to the results that it’s contributed to a replication crisis, or put another way, a failure of the paper to perform its most basic task: to report what you’ve actually discovered, clearly enough that someone else can discover it for themselves.

Perhaps the paper itself is to blame. Scientific methods evolve now at the speed of software; the skill most in demand among physicists, biologists, chemists, geologists, even anthropologists and research psychologists, is facility with programming languages and “data science” packages. And yet the basic means of communicating scientific results hasn’t changed for 400 years. Papers may be posted online, but they’re still text and pictures on a page.

Source: The Scientific Paper Is Obsolete. Here's What's Next. - The Atlantic

The scientific paper is definitely currently being strained in it's ability to vet ideas. The article gives a nice narrative through the invention of Mathematica and then Jupyter as the path forward. The digital notebook is incredibly useful way to share data analysis as long as the data sets are made easily available. The DAT project has some thoughts on making that easier.

The one gripe I've got with it is being a bit more clear that Mathematic was never going to be the future here. Wolfram has tons of great ideas, and Mathematic is some really great stuff. I loved using it in college 20 years ago on SGI Irix systems. But one of the critical parts of science is sharing and longevity, and doing that on top of a proprietary software platform is not a foundation for building the next 400 years of science. A driving force behind Jupyter is that being open source all the way down, it's reasonably future proof.