Sunnyday Technologies

Computational alloy design has had receipts for twenty-five years. Here’s why I’m bringing the methodology to cement.

A basketball-sized concrete pill sitting on a metallurgist's workbench, surrounded by an open notebook of handwritten phase diagrams, a small steel sample, and a glowing screen showing a ternary phase diagram.


Abstract

Rich Sutton’s 2019 essay The Bitter Lesson, which holds that general methods leveraging computation outperform hand-crafted human knowledge across long horizons, has been validated in computational materials engineering for twenty-five years, and the broader field has barely adopted it. We trace the discipline from Kaufman’s 1970 CALPHAD method through Sundman’s thermodynamic database infrastructure, Lukas’s phase-diagram optimization, and Olson’s 1990s synthesis of Materials by Design (Northwestern through the 1990s, now at MIT). Industrial proof points span U.S. Air Force aerospace alloys, Apple product alloys, Tesla giga-press aluminum, SpaceX Starship stainless, and NASA’s recent GRX-810 ICME high-temperature alloy. Documented timelines compress clean-sheet design to qualified flight service from a conventional fifteen to twenty years down to four to seven, with platform-level reductions in physical testing burden of up to 70%.

We argue that the Additive Construction subset of 3D-printable cements is the right next chapter for the methodology, on three grounds. First, substrate and process complexity exceed that of metals. Second, the pump-to-solid rheology transition of 3DCP introduces a class of multi-axis interactions that exceeds the reach of regression analysis (the methodology that operators including Alcemy have publicly noted suffices for conventional cement) and requires ICME. Third, Additive Construction supports worldwide open infrastructure in a way adjacent materials industries do not, specifically through the Open3DCP open data standard and the M3-CRETE open-source 3D concrete printer.

Sunnyday Technologies’ CEMFORGE platform, anchored on the LOGiMIX formulation engine and open-source materials data not previously aggregated, is presented as one operator alongside Concrete.ai, AICrete, Giatec, Alcemy, Meta with Amrize, and Cemex. We distinguish the regression-tractable conventional-cement problem from the multi-axis 3DCP problem that ICME methodology is sized for, and close with a working-method statement that distributes labor between the human (first-order effects, empirical results, conventional logic checks) and the model (second-, third-, and nth-order perturbations), arbitrated by hypothesis testing with controlled variables and statistical significance.


Rich Sutton’s 2019 essay The Bitter Lesson is the most useful 1,200 words in AI. The argument: across seventy years of AI research, the methods that won out weren’t the ones that encoded human cleverness. They were the ones that scaled with computation. Search and learning. The hand-crafted approaches plateaued. The general ones kept improving.

He called it bitter because the people learning the lesson were the people whose cleverness got beaten. The bitter part isn’t about computers. It’s about us.

I want to talk about what that means in materials science. Specifically why I’m applying it to cement and not metals, and what it asks of all of us about how we work with computers. The metals industry has been a controlled experiment for Sutton’s argument for twenty-five years. The results are not encouraging if you hoped the field would self-correct.

Why cement, not metals

I’m not running from metallurgy. I helped build its data foundation. Three decades ago I developed test plans and collected data for what eventually got named ICME, Integrated Computational Materials Engineering. Test matrices. Coupons. The data the first computational alloy designers needed in order to have anything to compute against.

I’ve stayed in metallurgy ever since. So when I look at the modern computational-materials field, I’m looking at it from inside. I watched the foundation get poured. I watched ICME mature into industrial use. I watched the methodology rack up receipts in alloy after alloy. The discipline has been available for two and a half decades. Most of the field has not moved into it.

The discipline grew out of work by several foundational figures. Larry Kaufman formally introduced CALPHAD in 1970 with H. Bernstein in Computer Calculation of Phase Diagrams. Bo Sundman at the Royal Institute of Technology in Stockholm built much of the thermodynamic database and software infrastructure that made CALPHAD operational. Hans Leo Lukas contributed the phase-diagram optimization tools that turned databases into design instruments. Greg Olson, at Northwestern through the 1990s and now at MIT, synthesized those threads into Materials by Design and inspired the discipline that became Integrated Computational Materials Engineering. It’s not new. It has receipts.

The receipts are public. Multiple computer-designed alloys have moved from clean-sheet design to qualified U.S. military flight service in under seven years. The conventional time-to-flight for a new aerospace alloy is closer to fifteen or twenty. The first of those alloys is older than most of the people now talking about AI for materials.

The same family of methods has since shown up across an unusually wide industrial range. The earliest applications came out of U.S. Air Force aerospace programs in the 1990s, where computer-designed alloys went into qualified flight service decades before most of the field caught on. Apple’s product alloys followed quietly in the 2010s. More visibly in Tesla’s giga-press aluminum bodies, where alloys engineered for strength, ductility, and castability without post-cast heat treatment make a one-piece underbody casting physically possible. More visibly still in the proprietary stainless SpaceX has iterated for Starship. The choice to weld a launch vehicle out of steel instead of carbon fiber didn’t actually make sense until you accept that they were going to develop the steel. Most recently, NASA’s GRX-810 high-temperature alloy was designed by ICME (thermodynamic models plus density functional theory) and 3D-printed for rocket-engine hot sections. The creep, tensile, and oxidation gains have nothing to do with intuition.

Defense aerospace. Consumer electronics. Automotive die-casting. Commercial launch. Civil space. Five industries with almost nothing in common except the methodology that designed the materials underneath them.

The timelines are what the methodology is actually about. The published cases land in the four-to-seven-year range from clean sheet to qualified flight, including alloys that went into U.S. Navy aircraft service in roughly six years and U.S. Air Force service in under seven. The more aggressive cases compressed from over a decade of conventional development to about three years. At the platform level, the published claim is up to a 70% reduction in physical testing burden.

That last number is the one that matters.

Testing is the part that costs the money and the years. Testing also has the worst-shaped shortcomings. Sample-to-sample variation. Environmental conditions you can’t fully reproduce in the lab. Scale effects that don’t show up in a coupon. The current state still requires testing to meet qualification. You don’t get to skip the press. The asymptote is a future where prediction is efficient enough that you do less of it.

That’s the actual Bitter Lesson direction in materials. Not “AI replaces the test bench.” Closer to “computation eats the part of the test bench that’s been holding everything else up.”

Outside that handful of operators, most metals R&D is still done largely the way it was in 1990. The methodology has been right for a quarter-century and the field, as a field, hasn’t adopted it. A small number of ambitious shops run the playbook well. The rest are nodding politely and going back to what they know.

None of those operators is doing pure AI either. They sit on top of decades of computational metallurgy plus increasingly data-driven optimization. The point isn’t what to call it. The point is that compute-leveraged search has been outperforming intuition-driven alloy design for two and a half decades, and the broader field has barely budged.

That’s what makes the Bitter Lesson actually bitter. The methods have worked in public for decades. Most of the experts have continued doing what they were doing before. Sutton predicted exactly that. Domain knowledge feels safer than search-and-learning, even when search has the receipts.

The metals story is the closest thing we have to a controlled experiment for what the Bitter Lesson predicts in physical materials. The experiment is not encouraging if your hope was that the field would self-correct quickly.

So here’s the framing I’ve settled into. Cement is not the next chapter because metals is finished. Metals isn’t finished. It may not be for another twenty years. Cement is the right chapter because the methods are mature, the compute is cheap, and the substrate is more analytically complicated than the metals I worked on.

Cement is multi-phase and multi-scale. Chemistry from nanometers to aggregate to meter-scale structure. Time-dependent across decades. Hydration kinetics in hours, early-age behavior in days, durability across the life of a building. Environmentally sensitive in ways you can’t fully specify. Temperature, humidity, water chemistry, curing regime. Trace admixtures interact non-linearly. Regional supply-chain variation that no global formulation can paper over.

Then there is the process. The same material has to flow as a liquid through a pump and a nozzle, and then within minutes develop enough early-age strength to hold its shape and a significant portion of its cured strength under the load of the bead going down on top of it. Curing has its own variables. Mixing has its own. Pumping has its own. Every axis interacts with every other. Process complexity stacked on material complexity. Both multi-dimensional. Both interacting nonlinearly.

That’s the shape of problem where heuristic-driven design plateaus and search-and-learning starts to win.

Why cement and not metals? I have practiced metallurgy for thirty years and I am still practicing it. That experience tells me exactly where compute-leveraged search has the most upside, and it is not in steel. The metals chapter is being written by operators who have spent decades inside Olson’s discipline, and they are doing it well. The next chapter is cement, and cement is the right fit for the methodology by every measure that matters. Substrate complexity. Process complexity. Search-space size. Market position. Competitive opening. CEMFORGE is built where the methodology compounds fastest. We picked cement on purpose.

What this asks of you

The instinct, when you read Sutton, is to nod along and then ignore him. Of course general methods win. Of course you should scale compute. Now back to the spreadsheet where you’re encoding your team’s hard-won judgment as features.

Taking the Bitter Lesson seriously is harder than nodding at it. It asks three uncomfortable things.

First, stop encoding your cleverness as constraints. The natural reflex, when you see a model wander, is to add a constraint that prevents the wander. Sometimes that constraint is real physics. Often it’s your taste, dressed up as physics. Adding it makes the model’s outputs look more like what you would have produced. Which is exactly the reason it’s bad. If you wanted what you would have produced, you wouldn’t have built the model.

Second, validate empirically, not intuitively. “It feels wrong” is a signal worth listening to once. After that it graduates into a falsifiable test or it retires. Conventional knowledge belongs in the logic check, where it asks whether a candidate violates known chemistry or just violates what the field is used to seeing. It does not belong in the veto. The model’s job is to suggest things you can’t derive. If your gut can erase any of them on feel, the model is decoration. The test bench decides, and the test bench has to be run with controls and statistics that mean something.

Third, pick problems where your priors are weakest, when you have the luxury. This is rarely a clean choice. Usually you take the problem in front of you. But when you do have a meaningful pick between the field you know best and a field you know less, the Bitter Lesson points at the second. It’s counterintuitive because the first one feels safer. You can tell when the model is wrong. But “telling when the model is wrong” includes a lot of false positives shaped exactly like your old training. The field where you can’t easily tell is the field where the model has the most room to be useful.

My metallurgy practice is current and active. CEMFORGE isn’t a metallurgy company, by design. We are deploying the methodology where it has the most room to work and the largest competitive opening. Conventional knowledge is the logic check. The test bench, with cylinders and a press, is the arbiter, and the press is unimpressed by anyone’s training.

What this asks of computers

The Bitter Lesson has a second edge, less often talked about. It changes what we should expect from computers themselves.

For most of computing history, a computer was a thing that did exactly what you said. If you wrote if x > 5, it checked whether x was greater than 5. The contract was deterministic, line-by-line, human-legible. Every line of code was an exact promise. We got good at being precise because that’s what the machine demanded. And we got really precise about things where the precision didn’t actually buy us much.

Modern AI systems break that contract on purpose. They’re statistical, not deterministic. They produce outputs whose justifications are not derivable from any single line. The contract has shifted from logical to empirical. The system is correct because, on a held-out test set and then on the world, it does the thing. We know it works the way we know aspirin works. We can show repeatedly that it does, without being able to point at the molecule and say “this atom is the one doing the helping.”

Working with these systems requires giving up an instinct that decades of programming taught us to cherish. We have to stop demanding that the computer show its work in a form we can audit by hand. We have to start asking whether the work passes the tests we care about.

That’s a different relationship with a computer. It asks a different discipline of the human in the chair. You have to write better tests than you used to, because tests have moved from “did the code do what I told it” to “did the model do something I wanted.” The standard for what counts as evidence has gone up. The standard for what counts as derivable understanding has gone down. Or rather, it’s moved off the page and into the experiment.

The hardware is following. The chips winning right now are the ones tuned for the math the Bitter Lesson favors. Large dense matrix multiplies. Low-precision arithmetic. Predictable memory access at scale. Computer architecture is reorganizing itself around statistical workloads the way it once reorganized around branchy serial control flow.

In another decade I suspect what we mean by “computer” will be closer to what we mean today by “lab full of physical assays” than to what we mean today by “spreadsheet that does what I tell it.” Computers themselves are also being asked to give up some of their old contract. The deterministic instruction-follower is becoming one component of the machine, not the whole of it.

That’s not a demotion. It’s a division of labor. The deterministic computer is still the right tool for “make this contract execute exactly as written.” The statistical computer is the right tool for “find me a thing in a space too large for either of us to enumerate.” We’re going to spend the next twenty years figuring out which jobs go where. The people who refuse to learn the second tool are going to spend it doing the work the second tool did better.

The future is here, just not evenly distributed

Anthropic’s Claude Mythos surfaced more than 2,000 previously unknown software vulnerabilities in seven weeks. One of them had been sitting in OpenBSD for twenty-seven years. Others had been in Firefox for more than fifteen. Software is the most well-trodden ground on Earth, audited by some of the most security-conscious engineers we have, and the model found what they had missed by orders of magnitude.

And technologists are not immune to the Bitter Lesson either. Tesla spent years building Full Self-Driving on roughly 300,000 lines of hand-written C++ rules, with neural networks confined to perception. The system did not really work. In 2023 Elon Musk publicly stated that v12 of FSD would replace essentially all of that hand-coded logic with an end-to-end neural network trained on millions of hours of human driving video. He put it directly on a livestream: “there’s no line of code that says there is a roundabout.” The most aggressive AI-adopting company in the auto industry had to put down its own hand-crafted cleverness when the Bitter Lesson caught up to it. The lesson does not ask whether you are an AI enthusiast. It asks whether your current approach scales with compute.

If AI can do that in software, it is going to find what materials scientists, physicists, chemists, and electronics engineers have missed in their fields too. And what it finds will not be limited to defects. New discoveries are in the same envelope. We are working with language models today, applied to language. Next: physics, chemistry, electronics, materials science. Or now, depending on where you look. The future is here. It is just not evenly distributed.

This technology is not yet fully appraised in materials, and that is the natural state of things at this stage. The applications that would prove the methodology are limited by the modeling capability for these materials. The modeling capability is limited by the available data. And the available data has been thin. That is the wheel Sunnyday Technologies decided to start spinning in 2022, and we have not stopped since. LOGiMIX is the current spoke.

The broader AI-for-cement field is real and well-funded. Concrete.ai out of UCLA, AICrete, Giatec Scientific with its SmartMix platform, Alcemy out of Berlin (now running in roughly a third of German cement plants), Meta’s BOxCrete program in partnership with the University of Illinois and Amrize, and Cemex with its model-based optimization work are operators with millions in funding and substantial teams, working primarily on conventional cement and concrete production. None of them works exclusively in 3D-printed concrete.

CEMFORGE does. That distinction matters, and we do not position ourselves as one of these operators. They are solving the production-quality and emissions-reduction problems for the conventional concrete supply chain, and for that problem the relationships between mix and outcome are tractable enough that regression analysis carries much of the predictive weight. CEMFORGE is solving a different problem on a different substrate. The material-times-process complexity of 3DCP is qualitatively greater than conventional concrete. The thixotropy of a 3DCP mix, the time-dependent shear behavior that lets the same material flow through a pump and then stiffen enough to hold a bead at rest, is different in kind. The processing equipment is different. The environment is different. The interactions between mix chemistry, rheology, processing path, and curing environment cross more axes than a regression handles cleanly. And the ingredient count goes up, not down. Conventional concrete typically lands in a five-to-six ingredient range. UHPC, the high-performance subset, runs to seven or eight. 3D-printable cements land in roughly the same range as UHPC. That parallel is the right cost anchor for anyone trying to size what 3DCP costs to deliver. The same near-doubling of ingredients that makes UHPC expensive compared to conventional concrete is what 3DCP carries.

The design surface scales with it. Every additional pair of components is a new interaction to characterize against the properties the mix has to deliver: pumpability, early-age strength, thixotropic stiffening, cured strength, durability, surface finish. The design problem is to coordinate this expanded ingredient palette across a wide and nonlinear process envelope. This is the substrate where ICME methodology becomes load-bearing. Same field, different subset, different methodology fit.

Open language, open hardware

There is something specific about Additive Construction that the polymers and metals industries don’t share. The instruments to create and measure 3D-printable cements have been available for a long time. What has not been available is a way to aggregate the results worldwide in a form the methodology can actually use. Materials labs publish inconsistent names, inconsistent units, inconsistent test protocols across continents. Combining datasets has been a manual reformatting job. The Bitter Lesson does not run on data locked behind a hundred incompatible formats.

That is why Sunnyday Technologies built Open3DCP.org as an open data standard for 3D-printable concrete, aligned with ASTM, RILEM, and NIST conventions. It is the common language. M3-CRETE.com is the open-source 3D concrete printer that runs it. The common equipment. Open language plus open hardware is the infrastructure that lets every lab in the world contribute data the methodology can actually use.

That is the unprecedented part. Additive Construction, uniquely among the materials industries with mature computational methodology in adjacent fields, can be approached through worldwide open infrastructure rather than walled proprietary stacks. The downstream prize is affordable, efficient, and ecologically defensible building materials. The human-scale prize is reducing the dull, dirty, and dangerous methods that produce concrete today.

What this looks like inside CEMFORGE

LOGiMIX is the hub. Given a project’s location, LOGiMIX assembles the supply-chain conditions specific to that site. Which cements, aggregates, and admixtures are actually available locally. At what cost. With what processing pathways already in place. Those conditions feed into the CEMFORGE formulation engine. CEMFORGE specifies a mix that delivers the required performance using the ingredients with the least shipping and processing burden. In theory, that is the lowest cost-to-market path.

Most AI-for-concrete systems work against the generality of specification-approved ingredients. They predict against the recipe that is supposed to work on average, across the supply chains the spec writer imagined. The generality is part of why prediction is hard. Average materials behave on average. The projects that actually get built use specific materials at specific sites with specific equipment.

CEMFORGE works against specifics. Exact product components paired with the specific equipment they will run through, the specific operating conditions on site, and the detailed material characterization generated during development with each customer. That level of characterization turns prediction from a population estimate into a project-specific answer. The precision is in the data we hold, not only in the methods we apply.

The workflow rests on the science, not on the AI. The model suggests. We check the logic against conventional understanding. Does this combination violate known chemistry, or does it just violate what the field is used to seeing? Then we test the hypothesis the right way. Controlled variables. Hold everything else equal. Is the candidate actually better, and is the improvement statistically significant? That part hasn’t changed since I started in materials engineering, and it isn’t going to. The model expands the search space we can cover. The science decides what is real.

The workload divides along the order of the perturbations. The human works first-order. Trust clear empirical results. Track the obvious effects. Predict what conventional understanding predicts. The model tracks the second-, third-, and nth-order. Those interactions live across coupled axes no senior engineer can hold in their head at once, and the field has never had a tool that could look at them all together. Hidden dangers in a candidate mix live in those higher orders. So do the unexpected upsides. The model is the expert eye watching that space while the human runs the first-order science. That is the optimal division of labor for what we are doing.

We will be wrong in public, occasionally. That is what running search-and-learning honestly looks like. We will publish what works and what doesn’t, because suppressing failures is how systems start lying about their own reliability. The methodology survives wrong answers. What it does not survive is intuition vetoing it on feel, or hypothesis tests run without controls.

The alternative is the alternative. Incremental refinement on the heuristics we already have, by people whose intuitions are tuned to the materials we already use. It works. It produces concrete. It does not produce what comes after concrete. And if there’s one thing the last thirty years of AI research should have taught all of us, it’s that the methods we’re most comfortable with are not the methods that scale.

The bitter pill

The image up top is AI-generated. Gemini, watermark left visible on purpose. We are not pressing concrete into pharmaceutical capsules, no matter how snappy the metaphor. We considered cropping the watermark out. We decided we’d rather you trust everything else we say.

It’s a joke. It’s also the most honest illustration of the argument. The Bitter Lesson tastes like swallowing something you would have preferred not to. The people most equipped to chew on it are the ones it asks the most of. Cement is what we’re choosing to taste it in.

The bitter aftertaste

The Bitter Lesson is bitter because the people most equipped to receive it, the experts with thirty years of pattern-matching in their heads, are exactly the people it asks the most of. It doesn’t ask them to leave. It asks them to stop using their intuition as the final layer of validation. It asks them to make their priors testable instead of governing. It asks them, in plain English, to be less sure.

That is the discipline. We are building CEMFORGE in cement on purpose. Cement is the substrate where the methodology compounds fastest, where the search space is largest, and where the competitive opening is widest. That is what the Bitter Lesson tells us to spend compute on.

The metals chapter has been open for twenty-five years and most of the field still hasn’t walked through the door. A small number of operators have. The alloys underneath their products are the receipts of what the methodology can do. Cement is even more open. The methods we want to bring to it are more validated than the field’s resistance suggests.

We are applying the same disciplines. Search and learning over a constraint surface. Conventional knowledge as a logic check, not a veto. The test bench as the arbiter, with controls and statistics that actually mean something. The substrate makes the disciplines do even more work because cement is more complicated than the metals those methods proved themselves on. The compute is better than it was when Olson started. The methods are better. The competitive opening is wider. That is the point.

We are not asking whether the methodology works. Twenty-five years of metals receipts already answered that. We are asking how fast we can deploy it where it does the most work. Cement is the answer. The Bitter Lesson is patient. So are we.

Sunnyday Technologies / CEMFORGE


A formal whitepaper version of this position paper is available as a PDF: The Bitter Lesson in Materials Science — Whitepaper v1.5 (PDF).


References

Foundational Works

  • Sutton, R. (2019). The Bitter Lesson. http://www.incompleteideas.net/IncIdeas/BitterLesson.html
  • Kaufman, L., & Bernstein, H. (1970). Computer Calculation of Phase Diagrams with Special Reference to Refractory Metals. Academic Press.
  • National Research Council. (2008). Integrated Computational Materials Engineering: A Transformational Discipline for Improved Competitiveness and National Security. National Academies Press.
  • Olson, G.B. — Research group at MIT, with prior work at Northwestern University. https://olson-research-group.mit.edu/
  • Thermo-Calc Software — CALPHAD-based computational thermodynamics platform. https://thermocalc.com/

Industrial Validation

AI in Software (analog)

Bitter Lesson in Industrial Deployment (technologists are not immune)

AI for Cement and Concrete — Active Operators

Sunnyday Technologies Infrastructure

Leave a Reply

Your email address will not be published. Required fields are marked *