Rambus at Rosenblatt’s AI Summit: Navigating Memory in AI's Future

Investing.com

Published Jun 10, 2025 07:06PM ET

Rambus at Rosenblatt’s AI Summit: Navigating Memory in AI's Future

On Tuesday, 10 June 2025, Rambus Inc. (NASDAQ:RMBS) took center stage at Rosenblatt’s 5th Annual Technology Summit - The Age of AI 2025. The conference call, featuring semiconductor analyst Kevin Cassidy and Rambus's Steven Wu, delved into the evolving landscape of memory technology amid AI advancements. While the tone was optimistic, highlighting Rambus's strategic innovations, challenges such as capacity limitations of high-bandwidth memory (HBM) were also discussed.

Key Takeaways

  • Rambus is developing MRDIMM to enhance bandwidth, with a launch planned for 2026.
  • The company is optimistic about AI models like DeepSeek, which improve performance on existing hardware.
  • There is a growing shift towards optics in data transmission to meet increasing bandwidth demands.
  • Rambus is expanding into the PMIC market, leveraging its expertise in component integration.
  • Broader adoption of CXL is anticipated with the development of CXL 2.0 and 3.0.

Operational Updates

  • MRDIMM:

- Introduced in October, with a launch expected in 2026 and ramp-up in 2027.

- Initial data rates will start at 12.8 gigabit per transfers.

  • CXL:

- Rambus is active in the CXL market with its silicon IP core.

- The market is currently fragmented; broader adoption is expected with CXL 2.0 and 3.0.

  • PMIC:

- Rambus has entered the PMIC market, focusing on integration within memory modules to provide higher quality power.

Future Outlook

  • DDR6:

- Discussions are ongoing, with expectations based on historical trends suggesting a release five to seven years post-DDR5.

  • CXL:

- Anticipates broader adoption with CXL 2.0 and 3.0, though pooling use cases are expected later.

  • MRDIMM:

- Launch is anticipated in 2026, with a ramp-up in 2027.

Q&A Highlights

  • HBM vs RDIMMs:

- HBM is used alongside DDR or LPDDR memory, with only 20-30% capacity for KV cache.

  • DDR5 vs DDR4:

- DDR5 offers more bandwidth and higher capacities, with independent channels and additional components like PMICs.

  • DeepSeek:

- Seen as a positive industry development, enhancing application performance and accelerating next-gen applications.

  • Copper vs Optics:

- Optics are utilized for longer distances, but the memory industry must keep up with bandwidth demands.

  • PMIC:

- Rambus's entry into the PMIC market is driven by its expertise in space-constrained environments, providing efficient power management.

For a detailed account of the discussion and strategic insights, refer to the full transcript.

Get The News You Want
Read market moving news with a personalized feed of stocks you care about.
Get The App

Full transcript - Rosenblatt’s 5th Annual Technology Summit - The Age of AI 2025:

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Good afternoon, and welcome to Rosenblatt Security's fifth annual Age of AI scaling tech conference. My name is Kevin Cassidy. I'm one of the semiconductor analysts at Rosenblatt. And it's my my pleasure to introduce, Steven Wu. Steve is Rambus's fellow and distinguished inventor and is a technology innovator with over fifteen years of experience in the hardware and software performance solutions.

We have a buy rating on Rambus and an $80 twelve month target price. We're bullish on Rambus for the company's leadership in DRAMs and particularly server DRAM module companionships. In our view, DRAMs are the unsung hero in the AI revolution. Larger AI models need more DRAM. So simple as that.

So I'll kick off, the fireside chat with a few questions, and I'll take questions from the audience. To ask a question, click on the quote bubble on the graphics you see in your upper right corner, And, I'll read those questions to Steve. So, again, thank you, Steve, for participating. This is their fifth year of participating and always a crowd favorite.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Well, thanks so much. It's great to be here, and thanks very much again for having me.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Steve, I thought I'd kick it off with, questions I get from, investors. And, you know, last year, we kicked it off with questions I got about, well, what about, isn't HBM? Isn't the g you know, Grace back then, Grace Hopper gonna take market share away from DRAM, dual in line, know, DIMM DRAM, and and you've showed that a server, uses a CPU on one side and using DIMMs and a GPU on the other side using high bandwidth memory, and it's kind of a parallel universe. So now the questions are coming up now that AI inference systems are ramping into volume production. The question is, well, what happens when we have, training systems versus, AI inference systems?

That that seems to be the trend for 2025.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. It's a great question. So, you know, just let me kinda take a step back into sort of the distinction between training and inference. You know, training is is really how you make a model smarter. Right?

You know, you you want this model to be very, very good and an expert. And then inference is is how you use it to answer questions and and ultimately to make money. Right? So if you think about certainly these the largest models that we're all familiar with, things like chat, GPT, claw, things like that, you know, those are all really trained in large data centers across many different GPUs. So that's one of the keys is that you require a lot of GPUs in part because there's a lot of data, and you can't really fit it all on one GPU.

And then it's a parallelizable problem, so we can kind of do that training across many of these engines. And it's the showcase hardware, by far the fastest hardware that's out there in terms of the GPUs and the memories. What's interesting is that, the datasets are very large, and so, not everything fits in HBM. And what we see is that HBM is used in conjunction with DDR memory or LPDDR memory on CPUs. So you'll see this kind of collection of hardware.

And I've got a slide in a minute that'll kinda go into the details a little bit more, but, you know, that's kind of the distinction on training. On inference, we really wanna do inference everywhere. So it'd be you know, we do some of it in the data center today, but you really wanna be able to do it, on your home PCs, laptops. It'd be great to, you know, do more of it on phones and things like that. So when you think about inference, you run you really are thinking about doing inference everywhere That may mean I have to pair down my trained models in some way to make them fit on these things.

But for the kind of memory that's supporting inference really everywhere, you get all kinds of memory. So it could be DDR, could be LPDDR, GDDR, and and, you know, even in some cases, just doing an on chip SRAM, although that's much more rare. So you get a little bit of everything on the inference side, and, you know, training is really where the very heavy duty kind of work is going on.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Great. Yeah. And so so when you're describing the HBM versus, RDIMMs or yeah. What are the what are the trade offs? Is it is it, the more HBM you need, then do you have to have a comparable amount of, RDIMMs?

Or, I guess, how does that, workloads get split up?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. So I've got a couple of slides that, I can talk a little bit about the workload, then I'll show you sort of how it ends up mapping onto real hardware. So, you know, I get asked this question a lot about, you know, sir, how does an LLM work? So, like, chat GPT. You know, what really happens?

And so, when it comes time to do inference, what happens is you type in a question. So you might be sitting there at your computer, and you might say, hey. Are, you know, are dogs mammals? And so the part of how we reach an answer is called the prefill phase. What happens here is that the system takes a look at the individual words or tokens that compose your question, and they start to fill this thing called a k v cache.

And the k v cache, it's it's short for key value cache, but that's gonna be used to construct the answer. And so, really, what's going on here is you're trying to figure out the context of what's going on. You know, what is the person asking about? And you're trying to think about what words or, collections of words might actually be relevant to that answer. So this stage tends to be very computational, but it also tends to be short in terms of the execution time.

By far, the majority of the time is spent producing the answer, what's called the decode phase. What happens is, it's very iterative, and one at a time, this k v cache gets queried, and it produces the token or the word in the answer. And then we iteratively go back and say, well, starting with this word, what's the next best word in my answer? And you go back to the k v cache, produce the next word, and so on. So some answers are very, very long as and if you've ever played with ChatGPT, you know it can produce paragraphs of information.

And so this process of decoding can take a long time. Turns out, you know, 80 to 90% of the time is typically spent in the decode phase, and this decode phase is very bandwidth intensive. So the way to think of it is, if I'm spending most of my execution time in the decode phase and the decode phase is very bandwidth intensive, then overall, the whole thing is bandwidth intensive. Now the one thing that is interesting to know is that the key value cache is big. We would like to put it all on HBM, but we can't.

So, typically, you know, there are people who will put 20 to 30% of the capacity of the HBM will be for the key value cache, but that only stores a small portion of it. We actually need a lot more memory somewhere to store the rest of it, And then it becomes this process of pulling the relevant parts from memory that's a little bit further away. In this case, that would be DDR in many cases, or it could be LPDDR. And then you pull that into the HBM and use it to construct the answer. And so the way to think about this is there's now two almost tiers of memory.

The memory that's close to the GPU that's very fast, that's the HBM. But you don't have enough capacity, so you need some much larger capacity memory that's that's as close as possible to the HBM that turns out to be in the CPU. And it has to be very high bandwidth because we're constantly moving that data back and into the HBM memory for it to be processed. So if we take a look at kind of a modern system, this is a system that's sold by Super Micro, and you can see it's a rack full of a bunch of these four u boxes, and the boxes are shown here. At the top in two u of that space is where the is where you you typically put the HBM engines.

And so the HBM the or sorry, the the GPU engine. So the GPU engines are really where a lot of the work is done, and then the bottom to you is where the CPU is. So I I think I accidentally have the arrows switched here. These pipes that are going into the chassis, they are water cooling or liquid cooling. So liquid is actually flowing through here to cool all these components.

And what you'll notice about, here on the GPU side of things is you'll have eight of the high end NVIDIA GPUs, and they're packed with as much of the state of the art HPM memory as possible. So today, that HPM three memory and this one for this particular case, you have an x 86 kinda dual socket CPU, and then you can just pack you can just pack it with with memory. And so in this particular case, we can have up to eight terabytes of DDR memory, and you can see it's eight times the capacity of the HPM memory. So all total, there's about about nine terabytes of memory that you can get in this box, of which most of it is attached to the CPUs that are a little bit further away. So and then what we'll do is we'll just shuttle data back and you know, from the CPU's memory to the the GPU's memory and back.

And, again, you're really trying in the case of LLMs to to keep the key value cache, the relevant parts as close to the GPUs as possible. So the real trade off here is just you're trying to get as much of that super fast memory just as close as you possibly can to the GPUs. But in practice, the way we're constructing our models and, you know, the key value caches just keep getting bigger and bigger. And so we're gonna need, this other kind of what's called offload memory, next to the CPUs, and that's really the place where, you know, DDR dominates today.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: And that's, you know, if I could even expand on other, sessions we've had or things, conferences we've talked about, just maybe, again, for investors to understand that a solid state drive, when you say this has to be super fast, and it has to be as fast as DRAMs, solid state drives relative to a DRAM, what are the speeds and, you know, the SSDs don't really play into this?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. The you know, I I think that's a great point. You know, the kind kind of if you can't use DRAM, then your next choice is to use a solid state drive, but that's gonna be, it's gonna be three orders of magnitude slower. So, it's and, you know, the bandwidth is much lower as well. So you really just can't, service these kinds of applications out of, out of an SSD, because the performance just takes it completely out of the running for being used.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Yeah. All of this is for the response time for the user that is using the LLM. You know, I use Perplexity, I put it in. And if it takes twenty minutes, it doesn't do me any good. Right?

Steven Wu, Fellow and Distinguished Inventor, Rambus: That's right. In fact, you know, there's another interesting aspect of it to where, you know, as humans, we can only kind of read the information so fast. So, you know, as long as you can kinda keep up with what we can consume, life is good. The more interesting challenge, though, is, when you have machine to machine communication and machine to machine kind of learning in the case of, like, digital twins or trying to train robots and things like that. Those things can go faster than humans can perceive the answer.

And in that particular case, you're just trying to go as fast as you possibly can. And so, again, you know, even with humans, an SSD is not gonna be useful. But it it's you know, if it's not useful for humans, it's definitely not gonna be useful at the speeds that machines can communicate with each other.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Right. And and maybe if I'll just make one other point as long as you have a nice clear graphic here that these two CPUs that you have on the right hand side are each using eight sockets or eight eight channels for, DDR memory. And as we go to next generation, like Granite Rapids, what what happens to the number of channels there?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Actually, here, what's going on is each CPU has actually got, 16 DIMM sockets, if you can believe it. I mean, there's a a a very large number of DIMM sockets. Is it 16 or 12? Maybe. One, two, three, four.

16. So each CPU has 16 DIMM sockets, and then each DIMM socket supports two channels. So, really, there's 32 channels of DDR five memory feeding each of these CPU sockets, these black sockets right here.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Oh, okay.

Steven Wu, Fellow and Distinguished Inventor, Rambus: And so yeah. And so, you know, the the the idea here is, you know, you're you're really just space constrained. You're packing just as much as you can get into each CPU. So, you know, what we're seeing is the demand for memory capacity and bandwidth is so high in the CPUs that they're gonna remain space constrained. They're gonna fit as many as they possibly can, you know, you know, onto that motherboard.

And you can see there's just no room on the edges here. I mean, that's that's just fully packed. So

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: And what what is it that it's changed in the d d r five standard versus d d r four?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. A couple things. So in d d r five, there's more bandwidth. There's, you know, double the bandwidth or higher depending on the the speed that you're running at, and the capacities are higher as well. The other thing that's really interesting from a module design standpoint is these modules here, which are shown on the right.

In DDR five, these modules have kind of a left half and a right half, and those are independent channels. In DDR four, the whole module formed one channel. So now channels are great because they're independent resources that you can then provision. You can kind of loan them out to certain cores and things like that. And those resources are independent.

They don't get interfered with by other cores that aren't using them. Well, you know, the the other thing that I think is really interesting is as we've gone faster and as we've supported higher capacities, we've needed to have more power on, you know, the that's being supplied to the modules. And d d r five modules now have additional silicon compared to d d r four. There's a power management IC, which, delivers very high quality power, And there's other things. There's an SPD hub for management of the DIMM, and there are temperature sensors as well.

And this is all just reflective of the fact that, you know, as the performance demands have gone up for memory, in this case, d d r five, we need more silicon to kind of deliver what's really needed in the system.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: And and then you you've also announced MRDIMM. Can you say what the difference is as we you go from a DDR five RDIMM to a MRDIMM?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. So, you know, along that same theme of needing more capacity and more bandwidth, you know, the industry introduced a new standard called the MRDIMM, and here I've got a picture of it. So on the right is your standard, what we call a registered DIMM. Now what we have here, there's some interesting chips on it. So I mentioned that there's, a power management IC and an SPD hub and temperature sensors.

Really, one of the, you know, the most important chip, think, is this registered clock driver. And what it does is it, it takes commands from the CPU, and it fans them out to the DRAMs that are on the module. If you take a look at an MRDIMM, it's got many of the same kind of structures. There's an RCD in the middle here. There is an SPD hub and a power management IC and some temperature sensors.

The new addition are these 10, what are called data buffers or MDBs. Basically, they're they're, you know, MRDIMM DBs. What they do that's really clever is, normally, in a registered DIMM, one DRAM will be transmitting across the data wires back to the host. What happens in MRDIMM is two different DRAMs are transmitting across the same data wires, and they're being combined and multiplexed into one data stream that's twice the data rate of any of the individual DRAMs. And so what we do here is we really are increasing the number of transactions that we're servicing at one time, but we're multiplexing the data back on the same data wires.

So from the host standpoint, it looks like, wow. Now I've got twice the bandwidth going in and out of this module. And from the DRAM maker standpoint, it's great because they've actually made no changes to the DRAM. It's the same d d r five DRAMs. And, you know, making a new DRAM is a long pole in the tent.

And so if you can leverage the same DRAM that's out there and you can find a way to provide more bandwidth on the module back to the host, it's a win, and that's exactly what MRDIMM does. You know, we you know, Rambus is actually the company to produce an MRDIMM chipset. We introduced it last October. And, you know, it's we also introduced a new PMIC because you're gonna be consuming more power as you as you transmit more data. And so, you know, it's we're very happy about that.

And based on the current platform schedules that we've seen, we're anticipating that it'll launch in 2026, and the volumes will ramp after that, you know, into into 2027. And so and the data rates, you know, we're gonna see are gonna be very high, much higher than an RDIMM can can reach. So it's gonna be, it's gonna start at 12.8, gigabit per transfers. So very, very high data rates.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Right. And and, I guess, again, a limiting factor on this is a a CPU that has to be able to, use this MRDIMM.

Steven Wu, Fellow and Distinguished Inventor, Rambus: That's correct. Yeah. You know, it it's gotta be something that can absorb, you know, the the very high data rates that are there. But what we've seen is, the bandwidth demand has been going through the roof. And what's great about this is, the bandwidth demand has been growing faster than the industry is used to producing new kind of DDR DRAMs.

So in this particular case, the MDB and the MRDIMM architecture is a great way to kinda continue to leverage the infrastructure that's there to provide more of what the hosts need.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Right. And and maybe even to you know, like you showed in that CPU picture, there's 32 different channels. Would every channel have to switch to DDR or to MRDIMM, or would it be a mix and match? Or how how would that configuration happen?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. I mean, really, in practice, you're gonna do all the all the DIMM sockets are gonna be MRDIMM. You know? I I mean, you know, technically, it's it's it's it's, you know, theoretically possible to do. But, you know, once you start needing a lot of bandwidth, you're gonna switch all the sockets over on your platform.

Because there's there's really you know, it it becomes a management difficulty from the host side. Well, some of the channels do one thing, some do another. And so, you know, it just what we've seen in the past for this type of thing is people will tend to move all the sockets over to from one technology or the other.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: And even when we're talking about d d r five, I remember when it came out, there was gonna be six or is it eight, different iterations, you know, speed upgrades. And where are we now in that, upgrade cycle, and what would MRDIMMs be? What generation of DDR five?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. Yeah. So the industry is, you know, has has defined, kind of the speed road map and is, you know, a a they've defined all the timing bins according to the kind of initial definition of the part. And what always happens is, the industry starts to say, well, could could I just squeeze one more or two more generations out? And it's just that's just always how it is.

And people become smarter, and they start to they become, you know, more experienced dealing with the technology. And in practice, there's usually one or sometimes two extra speed grades. But what starts to happen also is you realize, okay. I'm working so hard to get just the next little bit of performance gain. They start to fall back and say, well, with MRDIMM, I can actually use a a slightly slower speed bin for the particular DRAMs.

And since I'm multiplexing two of them, the the end result is I get to a data rate that's much higher than I could ever get to with the high end of one DRAM. So, you know, so right now, the industry is is kind of, you know, there it's working through the speed road maps and things were defined. The platforms are adopting them and all that. But what MRDIMM really does is it gets you to a place that you can you just can't get to with the in with an individual DDR DRAM.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Okay. And, again, training and inference would be used in both?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. You know, really, you know, inference is a part of the training process, and so, you know, that that's really why on the largest kinds of systems, you can kinda use that hardware either for inference or for training. But it's, you know, we anticipate that, yeah, you know, it's, you know, it would be used in both kinds of systems. And, you know, the demand for bandwidth and capacity, like I mentioned on these KV caches, is it's it's phenomenal how much memory they want. I mean, you you really you really can't give them what they what they would ideally like.

And so you're trying to just, you know, do the best you can to give them just as much as you possibly can.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Maybe, you know, I'll take a question from the audience that goes along with what we were just talking about with your different generations of d d r five. Question is when is d d r six due out, and is is that in the pipeline?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. Yeah. So what typically happens, is, you know, once a memory launches, usually, there's some months, you know, it could be six months, could be a year, where a lot of effort is spent enabling the technology and then, really thinking about how to, get to the data rates at the end of the the architectural life of the memory. So, you know, when DDR five launched, you know, people were looking at, well, can we extend it one speed greater two? But usually about a year after that or a year and a half after that, the industry starts to think about what's next.

And so, yeah, there are discussions going on now about, you know, what would the next thing really be? And so it's all part of the normal progression of things, and the discussions are all very early right now. Usually, people just you know, through through standards organizations like JEDEC, you know, people just think about, you know, what are the challenges and, you know, what are some interesting ideas for how to kinda get through them.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: And the, you know, your long term licensing agreements with the DRAM manufacturers, you you'll be right in there with the IP for DDR six.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. I mean, our our, you know, our licenses, as you know, are signed for, you know, certain durations of time. And so, yeah, depending on when d d r six comes out and and, you know, it it it it may or may not be, you know, depending on, the timing. So it's it's really just based on when d d r six gets, you know, fully defined and then when it starts to come to market. So, you know, right now, I think it's it there's no, you know, like, set date, But, you know, we all have watched the memory industry over the years, and we know that DDR, you know, generations, they typically last five to seven years before the next one comes out.

So, you know, it's it's hard to say exactly when d d r six would come out. But if you just, you know, kinda follow the his you know, the history of it all, then, you know, if it if it if it follows that same history, then it'll be, you know, five to seven years after after DDR five.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Okay. Great. Yep. You know, one other standard that's out there is CXL, and, you know, if you're saying that solid state drives aren't anywhere near fast enough, but CXL is a little bit slower, but it's still DRAM. And it just seems to have been a little slow, you know, maybe in committee and you're getting a lot of people, still adding to the standard of CXL.

Maybe if you could give us an update of where we are with CXL.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. Yeah. Absolutely. So, you know, I I think, you know, we we what's important is that we participate in the CXL market because we have our, silicon IP core that is a CXL controller. So what's interesting is, you know, we're involved in the ecosystem, and we we get a lot of insight based on the customers that purchase our IP core.

You know, we understand a little bit more about what they're trying to do, but, you know, what I I think the industry had hoped early on that there would be kind of a convergence on maybe one or two use cases that would really start to drive wide adoption. But what we're seeing is that there's still a lot of experimenting going on and a lot of, kind of, kind of individual use cases that are, I guess, in some ways, fragmenting, the market. And and so the kind of you know, I think the the anticipated adoption, you know, broad adoption is kinda pushed out a little bit. And it's it's really I think it's kind of normal because there's a there's just learning that you need to do to figure out how do you use this other tier of memory. Memory.

I think also, you know, AI has been a big it has taken a lot of mindshare in the industry and a lot of dollars. And so there's been a lot of effort to focus on that as well. So, you know, what we've seen is that, you know, there's there's largely people that are doing kind of this expansion with CXL. It's kind of these very tailored use cases with tailored silicon right now. But we, you know, we think, and certainly the people we talk to, that when CXL two point o really gets fully enabled and the two dot x generation kind of really gets out there, you know, we'll see we think more adoption.

And then when, I think what a lot of people are saying is the the three point x version has a lot of really interesting features that can can then drive further adoption. So I think it it's really kind of based on the enabling of the standard that we think is gonna drive, you know, kind of more adoption. And some of these use cases like pooling and things like that, that was always intended to be a bit later. So expansion, you know, we think will be and we think it'll depend a lot on, you know, kind of the the individual spec adoptions, you know, kind of, you know, how much the market uptake is based on the hardware being compliant to the two dot o and three dot o specs. But, you know, we're still advocates of the serial attach for memory.

We think it solves a lot of really interesting problems, and, you know, it's partly why we're still in it in the Silicon IP business. And we're really monitoring a lot, you know, of what's going on. We're getting a lot of insight from that business, and, you know, and then that'll inform us on on, you know, what we may or may you know, what we may wanna do for on the silicon side.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Okay. And, you know, another question I get from investors is, you know, in January and we'll just touch on this because, tomorrow, we have our panel discussion at 01:00 to Mhmm. Go into more details about inference and efficient models like DeepSeq. So in January when DeepSeq came out, a lot of investors got very nervous about the whole AI cycle and how much memory you need that if you have these more efficient models and, you know, maybe just an idea of what what do you think is happening with DeepSeek, and does it change anything in the market?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. I mean, when I when I saw, DeepSeek come out, I you know, at the time, and I still believe this, I thought it's great for our industry. You know, I think in so many ways, what we do in semiconductors, we've been driven by Moore's Law. How do you do more with the dollars that you're spending? You know, how do I make transistors cheaper?

How do I make, you know, silicon more capable and, less expensive? And through the decades, what that's done with cheaper, more capable silicon is it's grown the adoption. Silicon's everywhere now. You know? It's in cars and appliances, toys, things like that.

Right? And so what we've repeatedly seen is that when you can make hardware more efficient, you can make it more capable, and you can make it more performant, that just drives the adoption of more hardware, and it drives better applications and and a broader range of applications. Couple of examples. You know, when when we saw, for example, multicore CPUs, right, there was this belief that, oh, you know, gosh. If a CPU now has two cores, I'm gonna sell half as many.

Right? But what really happened is it made computing cheaper. And now everybody wants to do more things through software. There's more availability of the hardware. And once standards gotten in place and and support for the multicore CPUs, you know, happened, then all of a sudden, you had cloud computing and more and more cores.

And it's consistently driven, you know, kind of cheaper silicon, and more cost effective solutions. When I look at DeepSeek, right, really what they did was they figured out how to take the hardware that they had and improve the application performance. And and the techniques are very clever. They're very applicable to, lots of different kinds of, of AI that we wanna do now and in the future. One of the things is they dedicated a lot of the cores and the hardware to communication between, the the engines.

And what they they realized was that communication and data movement were big problems because there was so much data that they needed to move. What they've now done is they've shown the industry that, hey. With this hardware we've got, you don't necessarily have to wait for the next gen hardware. You can start doing some of your next gen application development on today's hardware. And the next gen hardware, you can start doing two generations out of software that you were were planning to wait on doing.

This is gonna accelerate things. And, I'm really confident that this is a great breakthrough for our industry, and it's gonna continue to drive, you know, wider adoption, and it's gonna accelerate the kinds of applications that we're all looking forward to in the future. So I think it was a great thing for our industry.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Good. Good perspective. Thanks. Yeah. And maybe we were talking about what these new applications are.

You know, we've heard the terms now of AI agents and and hygienic AI. How does that play into the memory subsystems too and maybe quick description of what those are?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. Yeah. So, you know, I I kinda mentioned just now that, you know, what DeepSeaX does is help accelerate things. Right? And what's what's really interesting is it's accelerating the move towards agentic AI and, you know, you know, really better reasoning.

Right? And so the consequence of all of this is if I can compute faster or I can arrive at my solution earlier, then I need more memory bandwidth and more memory capacity because I'm gonna need to feed the engines with more data. So that's one consequence of of what DeepSeek's done is it's gonna drive the demand for better memory. But if you look at, these new models like AgenTeq AI, it's doing the same thing. So AgenTeq AI, really,

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: it's

Steven Wu, Fellow and Distinguished Inventor, Rambus: a it's this ability to have AI that it it is more autonomous and can kind of, you know, can kind of make decisions and change its goals over time. So the way to think about it is in generative AI, like these large language models. I type a question in, and the goal is to answer my question. Right? And that's the whole goal.

And it's trained to do that. With agentic AI, the goal may be more nebulous or more complex. Like, for example, it could be, let me know anytime a dog walks by the front of my house and turn on my sprinklers when you see that happen. Right? But but don't turn on the sprinklers too much that you overrun my water bill because I don't wanna I don't wanna pay some penalty for too much water.

And so, in in that particular case, you've got a goal. The goal's complicated. Right? And you have to be able to do something. Oops.

Sorry. The it just went out in my room here. Yeah. So, you know, the goal is more complicated. And so you need to have planning, and you need to have replanning as you get new information.

And so that's what's different about AgenTic AI. You're gonna have a lot of different things going on at the same time, and you're gonna be fusing a lot of information together. So many of the same things we see in generative AI. In fact, Pagentic AI will use generative AI, but it will have other things going on on top of that. So the end result is, from a memory standpoint, you're gonna need more capacity, more bandwidth because you're moving lots more data, and you're trying to make a lot of decisions in a short amount of time.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: You know, when you talked about more bandwidth, earlier today, we we had, NVIDIA networking Mhmm. VPN. And, you know, the the question came up from a lot of people of what you know, do we how long does copper last, and when do we, switch over to optics? And, yeah, I would say the same is gonna be true for, the the copper wire the lines going from the memory to the CPU and even within the HBM memory to the GPU. What what's your view on copper versus, going to co op?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. So so, you know, with with optics, you know, the idea is to provide a wider pipe to move your data. And what we've seen is that it's really a distance game. So initially, when you were moving data between data centers that were geographically very far from each other, really, only game in town is optics. And we're starting to see that distance shrink.

Right? So within data centers, there's optics and the top of rack switches and things like that. And, you know, at some data rate, it's inevitable that you've gotta, you know, be using optics inside the server for more communication. But the thing that is also true is that that optics will eventually get converted to back to electrons that will move over copper wires. And maybe the copper wires are short, but there's gonna be some conversion back to the copper world.

And that's really what most electronic components, operate in today. And so, you know, the the kind of the next thing that, will likely happen is that once optics starts to be kind of the conduit for moving data inside a server, there's still gonna be this conversion back to electronics. But the interesting part of it is, because optics has incredibly high bandwidth, you know, you're you only wanna put it there if you feel like you can feed it. And so it's gonna be up to the memory industry to come up with even more capacity and even more bandwidth to keep these pipes full. You don't you don't build a freeway just to let it have one or two cars every hour.

You build a freeway because you're planning to fill the thing. And so the call to action for our industry is, hey. The pipes are getting big now. You know, figure out how are you gonna continue down this bandwidth and capacity road map to keep those things full.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Great. You know, we only have about, five minutes left, but I'd like to jump, topics over to you know, when, looking at these, companionships that are, on the modules. You know, in particular, and it comes up a lot with investors is the the PMIC, the power management IC. Yeah. Lots of companies that make PMICs.

And, the question comes up, why is Rambus bothering getting into the PMIC market?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Oh, it's a great question.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Value to bring in. Yeah.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. Great question. So if you again, if you look at these modules that I've got up here, you'll notice that, you know, the DRAMs are not made by us. They're made by the DRAM manufacturers. But, again, there's a lot more silicon that's on the module these days.

And it turns out, you know, we've been doing the RCD, and we do have past experience on these kind of data buffers. But, you know, this other silicon, it makes a lot of sense to do because, for one thing, our chips are dependent on the power being that's being supplied. And so having a a companion chip that's part of the full module chipset that does power makes complete sense to us. Now the reason why it's you know, we're I think we're in a kind of an interesting position why we're in a good position to do this kind of thing, one is we have a lot of experienced talent on our staff that's related you know, that knows how to do PMICs and things like that. You know, we've got other people that are working on temperature sensors and SPD hubs, things like that.

But the really, really hard part about a lot of this is making them all work together in this incredibly space constrained environment where, you know, you have to kinda get all these components to operate together. You're under very difficult cooling constraints. And there's a lot of other components close by that could potentially interfere with, you know, electromagnetic interference or, just, you know, routing difficulties and things like that. So the physical constraints of operating on a DIMM are extremely difficult. And we have thirty years of experience of operating in this environment, and that's really the experience we bring, is you know, the the the value add is the ability to make all of these components work together in the chipset so the customer has confidence that when they buy our components, they will work together and that we understand the environment that these are going into.

So we understand the cooling and the sockets and and and the reliability requirements to make sure that these things will will work well in a finished product.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: So it just kinda adds on to your IP that where you do a lot of interfaces to the memory. So you understand that memory signal and how much noise that creates and, you know, power management is very sensitive to that. Is that

Steven Wu, Fellow and Distinguished Inventor, Rambus: Absolutely. I mean, the the important thing here about why the PMIC has been brought on to the module is, you know, as you go faster and faster and as you have more components, you need higher quality power. And the place where the power is coming from is the PMIC. It's right next to all of these components. That's it's it it's called being near the point of load.

And one way to think about it is if I was supplying the power from very far away, one thing is I'd lose some of the power because it would dissipate in the wires because it has to travel so far, and it could be subject to noise of from other electrical components. So that quality would degrade. And what we're seeing is that for the data rates we need to get to and for the efficiencies that the data centers are calling for, there's multiple reasons why that PMIC needs to be on the module. And, you know, our experience with, you know, working on that module and, of course, needing the power from these components. It's really an ideal kind of additional chip to to put into our chipset.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Great. That's nice to have one more piece of silicon to sell to your customers too.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Absolutely. Yeah. That complete chipset is is it's a good thing.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: So we got about a minute, and I'll give it one more poll if anyone from the audience wants to ask a question, last minute. Otherwise, I'll thank Steve once again, for a great presentation, and we'll talk to you tomorrow.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Thanks very much.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: I I get I did get the one question here.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Oh, okay.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Oh, here it is. Alright. There are other components that might move to the RDIMM similar to the PMIC. You know? Are there any others?

You know, what would, Rambus expand into this? So will are there gonna be more components coming onto the RDIMM to offload the motherboard is the way I think of it?

Steven Wu, Fellow and Distinguished Inventor, Rambus: Yeah. It's a great question. I think, I think there's always these discussions about, you know, would it be possible to move something onto a DIMM? And so, you know, I, there's there's nothing to announce right now, but, but we continue to participate in those kinds of discussions. And and on the research side of our business, we do look at that kind of thing where, you know, I'm from Rambus Labs where we do a lot of our innovation.

And, architecturally, we do look at that kind of thing. So, there are possibilities, but, you know, the industry hasn't really announced anything yet.

Kevin Cassidy, Semiconductor Analyst, Rosenblatt Security: Okay. Great. Well, again, thank you, Steve.

Steven Wu, Fellow and Distinguished Inventor, Rambus: Oh, you're welcome. Thanks for having me.

This article was generated with the support of AI and reviewed by an editor. For more information see our T&C.

Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers.

Sign out
Are you sure you want to sign out?
NoYes
CancelYes
Saving Changes