## TECH . tour. TW

### Next Gen P-core: The Lion Cove Architecture

Ori Lempel

Senior Principal Engineer, P-core



#### Lion Cove uArch

# Goals

Performance & area efficiency

Optimize ST perf/watt and perf/area for client SoCs

Overhaul microarchitecture

Generational IPC improvement and future scalability

Modernize design database

Accelerate innovation going forward





intel.

#### Lion Cove uArch

# Goals

Performance & area efficiency

Optimize ST perf/watt and perf/area for client SoCs

Overhaul microarchitecture

Generational IPC improvement and future scalability

Modernize design database

Accelerate innovation going forward





## Intel® Hyper-Threading Technology

Revolutionized core computing





### Hyper-Threading Benefits

Improving IPC within the same core area footprint



"Projected architecture representation of best-case benefits of Hyperthreading feature on vs. off on a latest generation P-core



Typical
Scheduling
on Hybrid
Client





## Single-Thread Optimized





Thread

Optimized core

+30%

Perf/power/area

## Single-Thread Optimized







Enhanced Power-Management

For higher sustained performance

Before

Static preset thermal guardbands Lion Cove

\_\_\_\_

New

Al self-tuning controller

adapting to real time operating conditions



## Finer Clock Granularity





intel.

#### Lion Cove uArch

# Goals

Performance & area efficiency

Optimize ST perf/watt and perf/area for client SoCs

Overhaul microarchitecture

Generational IPC improvement and future scalability

Modernize design database

Accelerate innovation going forward





Overhaul Microarchitecture Front-end HOP CACHE (12-WIDE) Up to 8x larger prediction block DECODE (8-WIDE) HOP QUEUE Wider fetch and increased decode BW ALLOCATE / RENAME / MOVE ELIMINATION / ZERO IDIOM (8-WIDE) Uop cache capacity growth INTEGER REGISTER FILE and read BW increase MEMORY SCHEDULER STORE DATA SCHEDULER P26 P21 Uop queue capacity growth P25 REGISTER FILE INTEGER SCHEDULER P20 PII AGU AGU P10 P5 LOAD STA LOAD AGU P4 TOR SCHEDULER P3 P2 STOREDATA STA LOAD ALU PO intel. V3 ALU

Overhaul Microarchitecture Out Of Order MEMORY SCHEDULE STORE DATA SCHEDULER INTEGER SCHEDULER VECTOR REGISTER FILE bSI | bSP | bSS P25 Engine Split PIO P5 PA P3 VECTOR SCHEDULER P2 PO LOAD ATE GAOL ULA STOREDATA 13 VI ULA V2 10 SHIFT SHIFT FADD FADD FMA SHIFT JUM JUM NLU MUL ALU NLU ULA Split INT & VEC domains SHIFT SHUF SHIFT SHUF independent renaming and schedulers MISC **FPDIV FPDIV** Future expandability grow domains independently X87 | MMX Power saving for domain specific workloads intel. TECH.



Overhaul Microarchitecture

## Integer Execution

5 ► 6 integer ALU

2 ► 3 jump units

2 ► 3 shift units

1 • 3 mul 64×64->64





Overhaul Microarchitecture

## Vector Execution

**3 ► 4 SIMD ALUs** (256b)

**2 FMAs @ 4 cycle latency** (256b)

1 ► 2 FP dividers

with improved latency/throughput (256b)





**Uarch Infrastructure Overhaul** 

## Memory Subsystem

#### Redwood Cove (D-side):

| Level | Load-To-Use | Read BW       | Capacity |
|-------|-------------|---------------|----------|
| LI    | 5           | 3x256b/2x512b | 48KB     |
| L2    | 16          | 2x64B         | 2MB      |

#### Lion Cove (D-side):

| Level | Load-To-Use | Read BW       | Capacity  |
|-------|-------------|---------------|-----------|
| LO    | 4           | 3x256b/2x512b | 48KB      |
| L1    | 9           | 2x64B         | 192KB     |
| L2    | 17          | 2x64B         | 2.5MB/3MB |

Uarch Infrastructure Overhaul

SHIFT

MUL

## Memory Subsystem

**96 ► 128** pages DTLB

**2 ► 3** STA AGU

intel. TECH.

STORE DATA SCHEDULER P27 PZG PZE AGU STOREDATA U STA STA Qr 48KBLOD-CACHE

192KBLID-CACHE

UP TO 3MB L2 CACHE

MEMORY SCHEDULER

AGU

ATZ

#### Lion Cove IP Performance Summary

Double-digit IPC over prior generation Redwood Cove and emphasizes PnP at lower TDP







# >12% >15%

Performance at power

>10%



Power



#### Lion Cove uArch

# Goals

Performance & area efficiency

Optimize ST perf/watt and perf/area for client SoCs

Overhaul microarchitecture

Generational IPC improvement and future scalability

Modernize design database

Accelerate innovation going forward





#### Modernizing P-core Database







#### Lion Cove

Our most performant yet efficient CPU core architecture

**Double digit** IPC uplift

Future scalability

State-of-art power management

Step function in peak ST PnP & PPA

Modern design TFM



intel. TECH: tour.TW Thank You

#### Notices & Disclaimers

The preceding presentation contains product features that are currently under development. Information shown through the presentation is based on current expectations and subject to change without notice.

Results that are based on pre-production systems and components as well as results that have been estimated or simulated using an Intel Reference Platform (an internal example new system), internal Intel analysis or architecture simulation or modeling are provided to you for informational purposes only. Results may vary based on future changes to any systems, components, specifications or configurations.

Performance varies by use, configuration and other factors. Learn more at www.intel.com/PerformanceIndex and https://edc.intel.com/content/www/us/en/products/performance/benchmarks/computex-2024/.

Al features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at www.intel.com/AIPC.

No product or component can be absolutely secure. Intel technologies may require enabled hardware, software or service activation.

All product plans and roadmaps are subject to change without notice.

Performance hybrid architecture combines two core microarchitectures, Performance-cores (P-cores) and Efficient-cores (E-cores), on a single processor die first introduced on 12th Gen Intel® Core™ processors. Select 12th Gen and newer Intel® Core™ processors do not have performance hybrid architecture, only P-cores or E-cores, and may have the same cache size. See ark.intel.com for SKU details, including cache size and core frequency.

Some images may have been altered or simulated and are for illustrative purposes only.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.



#### APPENDIX

|                                                                                                                                                                                                                                                               | SLIDE 5: Hyper-Threading Benefits                                                                                                                                                                                                                                             |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hyper-Threading scaling characterized as adding 30% IPC (or throughput) for 20% Cdyn (or power at the same V/F                                                                                                                                                | Projected architecture representation of best-case benefits of Hyperthreading feature on vs. off on latest generation P-core microarchitecture.                                                                                                                               |
|                                                                                                                                                                                                                                                               | SLIDES 7-8: Single-Thread Optimized                                                                                                                                                                                                                                           |
| Removing Hyper-Threading we get comparable single thread IPC for 15% lower Cdyn and 10% smaller area which translates into 15% better performance per power and 30% better performance per-area vs. a single thread running on a Hyper-Threading-capable core | All figures estimated based on hypothetical comparison of an HT-capable P-core vs. an Efficiency optimized P-core.                                                                                                                                                            |
|                                                                                                                                                                                                                                                               | SLIDE 19: Lion Cove IP Performance Summary                                                                                                                                                                                                                                    |
| Lion Cove core delivers 14%<br>better IPC vs. Redwood Cove<br>core                                                                                                                                                                                            | Ilso frequency benefit estimate across: components of SPECrate2017_int_base and SPECrate2017_fp_base (both estimated) running 1 copy, Cinebench R23 Single Core, Cinebench 2024 Single Core, Geekbench 5.4.5 Single-Core, Geekbench 6.2.1 Single-Core, WebXPRT 4, Speedometer |
| Lion Cove Performance at<br>different power levels vs.<br>Redwood Cove                                                                                                                                                                                        | Results are based on SPECrate2017_int_base (estimated) running n copies. Based on measurement on an Intel internal reference validation platforms at a fixed PL1 power setting.                                                                                               |

#