I'm a radiologist but can't really weigh in without seeing the full 3D MRI dataset. Regarding this point:
> They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful.
Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.
Ironically, I think the AI era may make university degrees a better signal of the intellectual abilities of students due to the presence of pre-computer infrastructure like large lecture halls, industrial-scale copiers, etc.
There are other commenters saying this is a good practice they've also done for other injuries. You are saying you are an actual radiologist and immediately clock the problems with its advice.
I have seen this pattern over and over again. Anytime someone is an actual expert at anything, AI output appears insufficient or incomplete or outright misleading. It is only when you do not know what the AI is being asked to do is it likely you will find the output helpful.
This is itself alarming to me, but no one else seems to find this to be quite damning for the AI services being offered, preferring instanced to be wowed by the convenience and speed at which they can be delivered unreviewed and unproven information.
The implication that tokenmaxxing was an intentional and thoughtfully considered approach rather than blind hype-following by an overpaid manager class who are too far removed from value to understand the downsides of LLMs is hysterical beyond belief.
The first requires being able to overwrite binaries in the Swift tool directory. Yes, if you overwrite binaries executed by ghidra, you can trigger code execution. This is not a surprise.
The second, idk, I'm not familiar with TraceRMI (but it's probably worth noting that "RMI" stands for Remote Method Invocation).
The third is not a vulnerability in the slightest, they just demonstrate that native 7zip parsing code is reachable. Maybe there is a bug in the 7zip parser, but without that it's meaningless.
If we taught systems thinking in schools things like internet age verification would never get past being an idea on the back of a napkin. People struggle to consider the second-, third-, and nth-order effects of anything so asking them to consider what else might happen if we bring in laws and technical mechanisms to 'protect the children' is unfortunately too a big leap for a lot of them. Most people are bad at spotting causal links between parts of a system, and people who are good at it exploit that.
I have taken another look on these open models after the fiasco of Fable and GPT 5.6 this weekend and... GLM-5.2 truly is a good workhorse model for daily programming. I consider myself a heavy user of LLMs and a seasoned developer. A typical session for me with GPT is usually over a hundred dollars...
This weekend I programmed a matrix bot with encryption and a Rust agent with some tools. Because I need one and OpenClaw just felt... not what I wanted. Two days later and 20 dollars poorer I have what I need: a multimodal agent written in rust that has access to my homelab.
Nothing felt off with GLM. It did what I wanted, was fast, had a decent not very annoying personality and was much cheaper than Opus or GPT.
I used it unquantized through Fireworks, but there are multiple other providers too.
> There's something incredibly peaceful about being in the hands of an expert you trust. [...] AI can absolutely shatter that feeling in an uncomfortable way [...] but I don't know if I can fully trust AI either.
This really is key. We know we can't trust the AI, but at the same time we're also more comfortable asking the AI for clarifications or confronting it. Not having a time-bound appointment or paying by the hour helps a lot. But even then, more information doesn't necessarily help!
I once brought my 11-year-old car, a Civic with 150k miles, to multiple garages. I figured I'd play the "second opinion" game to correlate what the garages recommended to decide on what needed to be done...
I got 3 completely unrelated recommendations, including one that I knew was invalid! I felt worse off than when I started!
The solution to uncertain information isn't more information, which the AI can certainly provide, it's better information, and AI cannot currently provide that.
An alarming number of people don't understand that LLMs work via purely stochastic processes, so I'm happy to see in-depth pieces like this. I'm looking for a job and maybe this is why it's so hard to get a callback these days: resumes are just dumped in some LLM black hole and no one really knows how it works. The author says:
> temperature 0.1 — low, supposedly nudging the model toward deterministic outputs
This is not correct (and is briefly touched on later in the piece when he sets temperature to 0), temperature is not some kind of "deterministic" switch, but rather it affects the sampling distribution (which becomes more "spiky"—but is still very much a distribution).
Anybody else think it is weird that suddenly all Western countries suddenly want to lockdown the internet to "protect the children"? There is surely an international special interest group lobbying for this?
You can do this now: change the file permissions such that the user you run codex as can't read them, or run codex in a container without those files mounted.
If you don't do that, the agent will be able to incidentally upload them. What if the model runs "rg foo", and one of those files contains the string "foo"? It uploads the tool output, which includes the file contents.
And so, the only solution is to make it so the codex process is unable to access those files, hence using a container, or unix permissions, or deleting the files. Which you can already do.
I imagine this isn't resolved primarily because people expect it to apply to bash tool use, not just the "read" and "edit" tools, and people also expect those files to still be accessible i.e. if the agent invokes "make", which makes it impossible to solve perfectly.
There's a weird incuriosity in the responses here for a place that calls itself Hacker News. "This doesn't happen to me" is about the least interesting or useful response you could have to someone telling you something happens to them. Someone is telling you the world works differently for them than it does for you, which means you've got an opportunity to learn something new about the world and expand your model. Every good hack comes from understanding the world well enough to see the hack in the first place - someone telling you about their lived experience of the world is a gift.
It's not increasingly bizarre, really, if you just allow for the possibility of one thing:
There's something else worse that they know could be in such a book, but isn't yet, and it is so bad that it is worth doing this.
Perhaps they know that Wynn-Williams could have put it in the book and didn't. Perhaps they know that someone else — someone else British, say? — could write such things in a book and so far hasn't.
Once you assume their motivation is grounded in real fear, it gets easier to see why this isn't bizarre at all; it's inevitable.
Get ready for this to become a common theme. Boardrooms are still engaged in the fever-dream promise that AI will solve all their problems, particularly those involving pesky humans. The simple lesson of "AI is another tool" will be a hard-learned one. Some industries, such as software, will take more time to mop themselves into a corner before they discover that velocity should never be a first-class concern. Speed should only come as a side-effect of quality.
selectively giving away free money to big business is straight corruption. there is no other way to put it. everyone involved should lose re election and get investigated by the financial crimes unit.
but i dont think "leave it up to the market" is a better idea. investments like this just need to be transparent, open to everyone and set up strict punishment for stealing the money with prison for executives.
if they wanted to actually create jobs they would support small companies and set up open competitive programs based on project quality. or start a state investment bank giving super low interest loans so factories can expand without cutting profitable divisions like in china.
I know less about the airframe differences across the -400 and -8, but I can say the 747-8 represented a major upgrade in Flight Management Software.
I re-wrote the Central Maintenance System (portion inside the FMS) in C from scratch because no one had the original detailed design documents. The original -400 code was written in Pascal if I'm remembering correctly. I gleamed what I could from the source and relied on unit tests to get the rest of the way there based on what I knew of the protocol itself.
The entire FMS software was completely re-written in C++ and using modern object oriented patterns (at the time). Probably the most fun I've had over my now 20'ish year career. Of course Boeing was pissed with the delays this caused because the airframe wasn't a major change. I'll quote a Boeing (from MD originally) executive as saying "Meeting this project deadline is more important than your child dying."
Sadly this was also the time I remember Boeing's engineering ranks began to thin out. Personal opinion, this was a large part of what led to the MAX situation.
This is the root of AI psychosis. There’s a lot of unpack here, and I won’t go too deep because you can’t really have a discussion with affected folks because their fundamental basis is not evidence, it’s belief.
It is weirdly religious in a way, because if you were to present contrary evidence (e.g. experts in a field weighing in about how plausible sounding responses are bunk), you would only be told you don’t believe enough in the long term potential and capabilities.
Don’t get me wrong, I think we all agree capabilities will eventually improve (and farther-future capabilities could reasonably surpass experts), but really is unclear if the current transformer architectures with their probabilistic/hallucinatory outputs will plateau before they surpass current experts abilities in all promised fields.
I tried the Fugu models with some real world tales in C# and unity using mcp and open code. I exhausted the $20 plan 5 hour window in one prompt to review my theme system and plan some color changes. So I upgraded to the $100 to see the implementation and result. Well the result was worse than Opus, incredibly slow, and I ended up exhausting the new 5 hour window and have used 35% of the weekly now and it hardly created something opus was able to do at a fraction of the time and cost.
Do what you wish with this info, but it seems to be a complete waste of $$.
Everyone: For a moment forget everything you know about computers and wonder if perhaps 99% of normies are just following the directions on the package of their $19 Chinese IP camera. They have no idea what a firewall is, or what the "public internet" even means.
There's also a difference between your neighbor not closing her blinds and you using a telescope to look inside her apartment, which is what sites like this are.
Went over a few of these with a pretty keen eye, and they aren't that particularly interesting. The Docker one is just a weird bug, it's not a vulnerability, and certainly not a "0-day" (which is a pretty loaded term and people expect bad stuff to happen).
The nghttp2 nghttpx one is more interesting, and could potentially be used for phishing, but it's very hard to line up properly because the request queue is non-deterministic so basically impossible to target a specific victim (assuming proxy traffic).
The VLC one is just a straight-up crash/bug. And VLC crashes all the time when using weird codecs, so that's nothing new.
A few years ago (before the AI craze), I was misdiagnosed with tuberculosis. I had a chronic cough, and an outsourced radiologist at a clinic found signs of tuberculosis. The findings were sent to the city's tuberculosis hospital, as required by the country's law. The doctors there took the radiologist's conclusion at face value and required me to stay at their hospital for at least 8 months under a strict, prison-like regime. There was no option to say no, because I was considered some kind of biohazard, and by law I had to comply.
Before I was admitted, I quickly found another radiologist, who diagnosed pneumonia instead. I sent his report to the chief doctor at the tuberculosis hospital, and after some deliberation they concluded that the original reading was wrong. Turns out the doctors there can't read scans at all and just believe whatever a radiologist says...
The funny thing is, they had already officially put me on the tuberculosis register and didn't want to admit they had made a mistake. So instead, they simply gave me another paper saying that I had been cured of tuberculosis by them... in 7 days. I'm probably the only person in the country to defeat tuberculosis in a week :)
So if you don't trust the radiologist/doctor, maybe find another doctor if you can afford it? You can compare their conclusions and see if they match. Two unrelated doctors or radiologists saying the same thing is probably about as close to the truth as you're going to get. I'm not sure though whether I should trust AI or humans more. AI can hallucinate, but I've been misdiagnosed by humans so many times too...
Another thing that always needs pointing out: that ad-free, copyable, unencumbered, pixel perfect 4K drm-free rip with multiple language audio streams, hand crafted accurate subtitles, chapter tags, and embedded poster art cannot be bought from the movie industry at any price. That's why piracy is a product problem, not a price problem. The industry refuses to produce and offer the superior product, so regardless of the price, piracy is the only way to get it.
I have multiple LLM subscriptions at any given time, plus an array of local models.
When I ask a question outside of my domain of expertise I like to ask all of the LLMs I have access to. I also create separate sessions and ask the same question multiple ways.
It’s revealing to see how many different and contradictory answers I get, most of which are presented confidently.
The last time I ran a medical question through Claude I couldn’t even get consistent answers between sessions.
It’s also scary how easily you can lead each LLM to the answer you have in mind. When I would start asking questions about different options that other LLMs had presented, each session would drift toward that explanation.
I have seen it firsthand in the CS department here at Dartmouth. It is bad.
We're currently designing a new intro systems curriculum, and we're thinking of it as an adversarial problem. That is, we're designing the course to ensure that a student optimizing for the best grade per unit work still meets our learning objectives. That means, as everyone else is saying, paper exams, but also 1-on-1 interviews to check that students understand each assignment they turn in. These interviews feature both factual questions ("You're using this macro from that library. What does it do?", "Please describe what this function does and how it works.") and conceptual questions ("Why is this code structured this way instead of $whatever?", "How else did you try solving this?", etc.) This doesn't stop students from generating code, but at least they have to understand that code in detail.
This is not as good as writing the code yourself, but how much worse is it? For math classes, this gap is gigantic. Obviously, understanding someone else's proof is much easier than writing your own. For programming classes, I think (without evidence) that the gap is somewhat smaller.
My experience from the past is that when this kind of evaluation is made clear up front, the students know what to expect and either do fine or drop the class in the first week. If you start with take-home exams and then spring paper exams on them halfway through the course, then half the class is cheating and won't be able to recover, as we read in the article.
In general, our students are somewhat motivated by an abstract desire to learn, but are much more motivated by grades. If there exists a straightforward path through your course that leads to a good grade without doing much work, most students will take it. (Our undergrads' course review website is literally called "Layup List." They are actually this shameless.) It's our job as instructors to ensure that all paths leading to a good grade either require learning the material or are more difficult to pull off than just learning the material.
It's best not to blame the students. They are good at optimizing metrics; that's how they ended up here in the first place. We just need to better align the evaluation metrics with the outcomes that we're looking for.
> They performed shockwave therapy on my shoulder even though a recent clinical practice guideline says clinicians should not use or recommend shockwave therapy for rotator-cuff tendinopathy without calcification; I was told during ultrasound that there was no calcification.
Ultrasound isn't a great way to assess for calcification. It'll find large calcification but easily miss small ones. Plain radiograph would be more helpful, but the MRI may have revealed it as well. Either way, shockwave therapy isn't harmful in the absence of calcification--it's just not helpful.
Edit: when a radiology report says something isn't present, there's always an implicit caveat that the finding isn't present within the context of the modality and images obtained. So an ultrasound report can state there are no calcifications while a plain radiograph can report the presence of calcifications without being inconsistent. Obviously very confusing to patients and people unfamiliar with medical jargon, but clarifying this in reports would make them sound even more qualified, "hedgey", and annoying to read than they already are.