Bu konu çözüldü olarak işaretlenmiştir. Çözülmediğini düşünüyorsanız konuyu rapor edebilirsiniz.
Katılım
3 Aralık 2023
Mesajlar
15.077
Makaleler
104
Çözümler
1.590
Beğeniler
45.768
Yer
İstanbul
Herkese merhaba,

Bugün Fedora 44 kullanırken AMD Radeon RX 9070 XT ekran kartımla karşılaştığım can sıkıcı bir donanım kilitlenmesi (kernel panic) ve bu durumu nasıl çözdüğümü adım adım paylaşacağım. Özellikle en güncel (bleeding-edge) Linux dağıtımlarını kullanan ve donanımını yepyeni kernel sürümleriyle güncelleyenlerin başına gelebilecek klasik bir amdgpu regresyonu hikayesi.

1778376638427.webp


Sorun Nasıl Başladı?​

Sistem açılışında masaüstü (KDE Plasma/Wayland) gelmesi gerekirken, ekranda dikey mavi çizgilerin olduğu sabit bir gri ekran belirdi.

Belirtiler:
  • Ekranda donmuş gri/mavi çizgili bir görüntü.
  • Ctrl + Alt + F3 ile TTY (sanal konsol) ekranına geçiş yapılamaması (siyah ekranda kalması).
  • Sistemin tamamen kilitlenmesi (Kernel Mode Setting - KMS çökmesi).

Olay Yeri İncelemesi ve Loglar​


Sistem kilitlendiği için güç tuşundan hard reset atıp, GRUB menüsünden bir önceki çalışan kernel sürümünü (6.19.14) seçerek sisteme sorunsuz giriş yaptım. Ardından çöken hatalı oturumda neler olduğunu görmek için terminali açıp şu komutu çalıştırdım:

Bash:
journalctl -b -1 -p 3 | grep -iE 'amdgpu|drm|kernel'

Aldığım hata çıktıları tam olarak şuydu:

Bash:
recep@fedora:~$ journalctl -b -1 -p 3 | grep -iE 'amdgpu|drm|kernel'
May 10 07:00:59 fedora kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.I2C0], AE_NOT_FOUND (20251212/dswload2-162)
May 10 07:00:59 fedora kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20251212/psobject-220)
May 10 07:00:59 fedora kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PC00.I2C1], AE_NOT_FOUND (20251212/dswload2-162)
May 10 07:00:59 fedora kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20251212/psobject-220)
May 10 04:01:05 fedora kernel: snd_hda_Intel 0000:80:1f.3: no codecs found!
May 10 04:01:17 fedora kernel: amdgpu 0000:03:00.0: MES(0) failed to respond to msg=REMOVE_QUEUE
May 10 04:01:17 fedora kernel: amdgpu 0000:03:00.0: failed to remove hardware queue from MES, doorbell=0x1000
May 10 04:01:17 fedora kernel: amdgpu 0000:03:00.0: MES might be in unrecoverable state, issue a GPU reset
May 10 04:01:17 fedora kernel: amdgpu 0000:03:00.0: Failed to evict queue 0
May 10 04:01:17 fedora kernel: amdgpu 0000:03:00.0: Failed to evict process queues
                                               Module libdrm.so.2 from rpm libdrm-2.4.133-1.fc44.x86_64
                                               Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.133-1.fc44.x86_64

Sorunun Kaynağı: Loglardan da net bir şekilde anlaşılacağı üzere sorun MES (Micro Engine Scheduler). Sistemi kilitleten şey, 6.19 serisinden devasa bir güncellemeyle geçilen yeni 7.0.4-200 serisi kernel'dı. Bu yeni sürüm, RX 9070 XT'nin donanımsal zamanlayıcısını düzgün yönetememiş ve kartın scheduler'ı yanıt vermeyi kesmiş. Sürücü GPU'yu resetlemeye çalışsa da bellekten işlemleri atamadığı için başarısız olmuş ve sistem kernel seviyesinde donmuş.

Çözüm: Çalışan Kernel Sürümünü Kilitlemek
Bu tür durumlarda en mantıklı ve stabil yol, soruna sebep olan yeni kernel (7.0.4) için AMD ve Fedora tarafından bir yama gelene kadar, sistemi stabil çalışan eski kernel (6.19.14) sürümüne kilitlemek. Böylece sistem normal şekilde güncellemeleri almaya devam ederken, bu çalışan kernel sürümünün otomatik temizlemeye takılıp silinmesini engellemiş oluyoruz.

1. DNF Versionlock Eklentisinin Kurulumu

Önce kernel kilitleme işlemini yapabilmek için Fedora'ya gerekli eklentiyi kuruyoruz:

Bash:
sudo dnf install 'dnf-command(versionlock)'

2. Stabil Kernel'ı Kilitleme


GRUB'dan boot edip sorunsuz girdiğimiz 6.19.14 sürümündeyken (mevcut çalışan sürüm), kernel'ı kilitleyen şu komutu giriyoruz:
Bash:
sudo dnf versionlock add kernel-$(uname -r) kernel-core-$(uname -r) kernel-modules-$(uname -r) kernel-modules-extra-$(uname -r)

İşlem başarıyla tamamlandığında alacağınız çıktı şu şekilde olacaktır:
Bash:
"kernel = 6.19.14-300.fc44" için sürüm kilidi ekleniyor.
"kernel-core = 6.19.14-300.fc44" için sürüm kilidi ekleniyor.
"kernel-modules = 6.19.14-300.fc44" için sürüm kilidi ekleniyor.
"kernel-modules-extra = 6.19.14-300.fc44" için sürüm kilidi ekleniyor.

Sonuç ve Takip​

Sistemi 6.19 serisinde sabitleyerek sorunu şimdilik donanım seviyesinde çözdük. Fedora sisteminize 7.0.5 veya daha yeni bir kernel güncellemesi geldiğinde, güncellemeler otomatik olarak listeye eklenecek. Sistemi yeni kernel ile açıp MES hatasının çözülüp çözülmediğini güvenle test edebilirsiniz. Olur da aynı sorun yaşanırsa, reset atıp kilitli olan 6.19'dan devam edersiniz.

Eğer yeni kernel güncellemeleriyle sorunun çözüldüğünden emin olursanız, kilidi kaldırmak için terminale şu komutu girmeniz yeterli:

Bash:
sudo dnf versionlock clear

Sistemi durduk yere bu şekilde siyah ekrana veya TTY kilitlenmesine düşen arkadaşlara rehber olması dileğiyle. İyi sosyaller!

Güncelleme: Sorun Kernel değil, ekran kartı kaynaklı. Yapay zeka çalıştırmak isteyince patlıyor RX 9070 XT.
 
Son düzenleme:
Çözüm
Bash:
May 10 05:12:47 fedora kernel: ACPI: bus type drm_connector registered
May 10 05:12:47 fedora kernel: simple-framebuffer simple-framebuffer.0: [drm] Registered 1 planes with drm panic
May 10 05:12:47 fedora kernel: [drm] Initialized simpledrm 1.0.0 for simple-framebuffer.0 on minor 0
May 10 05:12:47 fedora kernel: simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
May 10 05:12:47 fedora kernel: ata1.00: supports DRM functions and may not be fully accessible
May 10 05:12:47 fedora kernel: ata1.00: supports DRM functions and may not be fully accessible
May 10 05:12:47 fedora kernel: intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v1.bin, version: 20260305*MTL_CLIENT_SILICON-NVR+NN-deployment*dbda783919b77553afe5576a59881bca37fb6d7b*dbda783919b77553afe5576a59881bca37fb6d7b*dbda783919b
May 10 05:12:47 fedora kernel: intel_vpu 0000:00:0b.0: [drm] Scheduler mode: HW
May 10 05:12:47 fedora kernel: [drm] Initialized intel_vpu 1.0.0 for 0000:00:0b.0 on minor 0
May 10 05:12:49 fedora kernel: amdgpu: Virtual CRAT table created for CPU
May 10 05:12:49 fedora kernel: amdgpu: Topology: Add CPU node
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: initializing kernel modesetting (IP DISCOVERY 0x1002:0x7550 0x1043:0x061A 0xC0).
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: register mmio base: 0x8C200000
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: register mmio size: 524288
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 0 <common_v1_0_0> (soc24_common)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 1 <gmc_v12_0_0> (gmc_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 2 <ih_v7_0_0> (ih_v7_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 3 <psp_v14_0_0> (psp)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 4 <smu_v14_0_0> (smu)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 5 <dce_v1_0_0> (dm)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 6 <gfx_v12_0_0> (gfx_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 7 <sdma_v7_0_0> (sdma_v7_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 8 <vcn_v5_0_0> (vcn_v5_0_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 9 <jpeg_v5_0_0> (jpeg_v5_0_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 10 <mes_v12_0_0> (mes_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: Fetched VBIOS from VFCT
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] ATOM BIOS: 115-G295BP00-100
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: vgaarb: deactivate vga console
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: Trusted Memory Zone (TMZ) feature not supported
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MEM ECC is not presented.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SRAM ECC is not presented.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: VRAM: 16304M 0x0000008000000000 - 0x00000083FAFFFFFF (16304M used)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Detected VRAM RAM=16304M, BAR=16384M
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] RAM width 256bits GDDR6
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0:  16304M of VRAM memory ready
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0:  23890M of GTT memory ready.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] GART: num cpu pages 131072, num gpu pages 131072
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] PCIE GART of 512M enabled (table at 0x00000083DAB00000).
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x0A003500
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [VCN instance 0] Found VCN firmware Version ENC: 1.12 DEC: 9 VEP: 0 Revision: 15
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MES: vmid_mask_mmhub 0x0000ff00, vmid_mask_gfxhub 0x0000ff00
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MES: gfx_hqd_mask 0x000000fe, compute_hqd_mask 0x0000000c, sdma_hqd_mask 0x000000fc
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: RAP: optional rap ta ucode is not available
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SECUREDISPLAY: optional securedisplay ta ucode is not available
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: smu driver if version = 0x0000002e, smu fw if version = 0x00000033, smu fw program = 0, smu fw version = 0x00684c00 (104.76.0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SMU is initialized successfully!
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Display Core v3.2.369 initialized on DCN 4.0.1
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-HDMI FRL PCON supported
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] DMUB hardware initialized: version=0x0A003500
May 10 05:12:49 fedora kernel: [drm] forcing DP-1 connector on
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-1: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-2: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-3: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] HDMI-A-1: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0x4000000
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0xc000000
May 10 05:12:50 fedora kernel: amdgpu: Virtual CRAT table created for GPU
May 10 05:12:50 fedora kernel: amdgpu: Topology: Add GPU node [0x1002:0x7550]
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: SE 4, SH per SE 2, CU per SH 8, active_cu_number 64
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 9 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 10 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring jpeg_dec uses VM inv eng 1 on hub 8
May 10 05:12:50 fedora kernel: amdgpu: HMM registered 16304MB device memory
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: Using BACO for runtime pm
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] Registered 4 planes with drm panic
May 10 05:12:50 fedora kernel: [drm] Initialized amdgpu 3.64.0 for 0000:03:00.0 on minor 1
May 10 05:12:50 fedora kernel: fbcon: amdgpudrmfb (fb0) is primary device
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
May 10 05:12:50 fedora kernel: [drm] pre_validate_dsc:1667 MST_DSC dsc precompute is not needed
May 10 02:12:52 fedora systemd[1]: [email protected] - Load Kernel Module drm skipped, unmet condition check ConditionKernelModuleLoaded=!drm
May 10 02:12:53 fedora kernel: snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
May 10 02:13:00 fedora coolercontrold[1606]: Initialized GPU Devices: {"amdgpu":{"temps":["temp1","temp2","temp3"],"driver name":["amdgpu"],"driver version":["7.0.4-200.fc44.x86_64"],"locations":["/sys/class/hwmon/hwmon2","/sys/devices/pci0000:00/0000:00:06.0/0000:01:00.0/0000:02:00.0/0000:03:00.0","pci:v00001002d00007550sv00001043sd0000061Abc03sc00i00"],"channels":["GPU Load","fan1","freq1","freq2","power1_average"]}}
May 10 02:13:02 fedora lact[2031]: 2026-05-10T02:13:02.998218Z  INFO lact_daemon::server::handler: AMDGPU DRM initialized
May 10 02:13:03 fedora lact[2031]: 2026-05-10T02:13:03.001451Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
May 10 02:13:03 fedora kwin_wayland[2200]: No backend specified, automatically choosing drm
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: MES(0) failed to respond to msg=REMOVE_QUEUE
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: failed to remove hardware queue from MES, doorbell=0x1000
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: MES might be in unrecoverable state, issue a GPU reset
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Failed to evict queue 0
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Failed to evict process queues
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: GPU reset begin!. Source:  3
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Dumping IP State
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Dumping IP State Completed
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: MODE1 reset
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: GPU mode1 reset
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: GPU smu mode1 reset
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] PCIE GART of 512M enabled (table at 0x00000083DAB00000).
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: VRAM is lost due to GPU reset!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: PSP is resuming...
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: RAP: optional rap ta ucode is not available
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SECUREDISPLAY: optional securedisplay ta ucode is not available
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SMU is resuming...
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: smu driver if version = 0x0000002e, smu fw if version = 0x00000033, smu fw program = 0, smu fw version = 0x00684c00 (104.76.0)
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SMU is resumed successfully!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0x4000000
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0xc000000
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] DMUB hardware initialized: version=0x0A003500
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 9 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 10 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring jpeg_dec uses VM inv eng 1 on hub 8
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: GPU reset(1) succeeded!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
May 10 02:13:07 fedora lact[2031]: 2026-05-10T02:13:07.682042Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
May 10 02:13:07 fedora lact[2031]: 2026-05-10T02:13:07.687340Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
                                               Module libdrm.so.2 from rpm libdrm-2.4.133-1.fc44.x86_64
                                               Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.133-1.fc44.x86_64
May 10 02:13:19 fedora lact[2031]: 2026-05-10T02:13:19.682609Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
May 10 02:13:19 fedora lact[2031]: 2026-05-10T02:13:19.687762Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
May 10 02:13:19 fedora kernel: amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:978
May 10 02:13:20 fedora kernel: amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:978

VS Code üzerinde Ollama aracılığıyla DeepSeek veya benzeri büyük dil modellerini (LLM) çalıştırırken, ekranda aniden çizgiler (artifact) belirmesi veya doğrudan siyah ekrana düşme sorunu yaşıyorsanız, sorun büyük ihtimalle donanımsal bir arızadan ziyade modelin VRAM'i ve GPU zamanlayıcısını (MES) kilitlemesidir. Özellikle RX 9070 XT gibi güçlü RDNA mimarili kartlarda bile Ollama'nın varsayılan ayarlarıyla bu durumla karşılaşmak mümkün.

Sorunun Kaynağı ve Log Analizi​

Sistem kilitlendiğinde TTY ekranına dahi (Ctrl+Alt+F3) düşülemiyorsa, sorun AMDGPU sürücüsünün kernel modülünde yaşanıyor demektir. Sistemi Live USB veya acil durum (rd.break) kabuğu ile başlatıp bir önceki çöken oturumun journalctl kayıtlarını incelediğimizde şu tabloyla karşılaşıyoruz:

Bash:
amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
amdgpu 0000:03:00.0: MES(0) failed to respond to msg=REMOVE_QUEUE
amdgpu 0000:03:00.0: MES might be in unrecoverable state, issue a GPU reset
amdgpu 0000:03:00.0: GPU reset begin!. Source:  3
amdgpu 0000:03:00.0: VRAM is lost due to GPU reset!

Buradaki KFD (Kernel Fusion Driver) satırları, Ollama'nın ROCm üzerinden GPU'nun hesaplama birimlerine kontrolsüz bir işlem yükü gönderdiğini gösteriyor. Bu ağır yükü yönetemeyen MES (Micro-Engine Scheduler) birimi kilitleniyor. Kernel, sistemi tamamen dondurmamak adına ekran kartına acil durum sıfırlaması (MODE1 reset) gönderiyor.

Sorun da tam bu noktada patlak veriyor: GPU resetlendiği an VRAM içindeki tüm veriler silinir. O sırada VRAM'i kullanan Wayland/KDE masaüstü ortamı ve VS Code'un donanım hızlandırma servisi (Electron) anında çöktüğü için ekranda o meşhur gri çizgiler oluşuyor ve sistem tepkisiz kalıyor.

Çözüm Adımları​

Sorunu kökünden çözmek için Ollama'ya bir "fren mekanizması" takmamız ve RDNA mimarisi için doğru profil tanımlamalarını yapmamız gerekiyor.

1. Ollama Servisine Güvenlik Sınırları Eklemek
Terminal üzerinden Ollama'nın systemd servisine müdahale ederek GPU'yu kilitlemesini engelliyoruz:
Bash:
sudo systemctl edit ollama

Açılan dosyaya (yorum satırlarının arasına) şu çevresel değişkenleri ekliyoruz:
INI:
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.0"
Environment="OLLAMA_KEEP_ALIVE=0"
İlk satır güncel AMD kartlar için doğru compute profilini (ROCm için) zorlarken, ikinci satır modelin kod üretimi biter bitmez VRAM'i boşaltıp kaynakları masaüstü ortamına geri vermesini sağlıyor.

Ayarları sisteme işleyip servisi yeniden başlatıyoruz:
Bash:
sudo systemctl daemon-reload
sudo systemctl restart ollama

2. Modeli Güvenli Bir Bağlam Penceresiyle (Context Window) Çalıştırmak
DeepSeek Coder v2 gibi modeller, devasa bağlam pencereleriyle VRAM'in tamamını çok hızlı şişirebiliyor. Bunu önlemek için modele VRAM kotası koyacağımız özel bir "Modelfile" oluşturuyoruz.
Bash:
nano deepseek-safe.modelfile

İçerisine şunları yazıp kaydediyoruz:
Kod:
FROM deepseek-coder-v2
PARAMETER num_ctx 8192
(8192 tokenlık bağlam, günlük kodlama asistanı işlemleri için fazlasıyla yeterli olacak ve VRAM'i boğmayacaktır.)

Sonrasında bu dosyayı okutarak Ollama'ya kendi güvenli modelimizi yaratmasını söylüyoruz:
Bash:
ollama create deepseek-safe -f deepseek-safe.modelfile

3. VS Code Entegrasyonu
Son olarak, VS Code içerisinde kullandığınız yapay zeka asistanı eklentisinin ayarlarına girerek model seçimi kısmında doğrudan deepseek-coder-v2 yerine az önce oluşturduğumuz deepseek-safe modelini seçiyorsunuz.

Bu üç adımı uyguladıktan sonra ne Wayland çöküyor ne de VS Code; LLM arka planda donanımı terletmeden gayet stabil bir şekilde çalışmaya devam ediyor.
Güncelleme: Sorun Kernel değil, ekran kartı kaynaklı. Yapay zeka çalıştırmak isteyince patlıyor RX 9070 XT.
Bugün yine patlayan başka bir 9070 XT... Yıllardır seriler atlıyor, mimariler değişiyor ama AMD'nin bu bozuk kalite kontrol standartları ve kronik donanım sorunları zerre değişmiyor. :D
 
Son düzenleme:
Bash:
May 10 05:12:47 fedora kernel: ACPI: bus type drm_connector registered
May 10 05:12:47 fedora kernel: simple-framebuffer simple-framebuffer.0: [drm] Registered 1 planes with drm panic
May 10 05:12:47 fedora kernel: [drm] Initialized simpledrm 1.0.0 for simple-framebuffer.0 on minor 0
May 10 05:12:47 fedora kernel: simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
May 10 05:12:47 fedora kernel: ata1.00: supports DRM functions and may not be fully accessible
May 10 05:12:47 fedora kernel: ata1.00: supports DRM functions and may not be fully accessible
May 10 05:12:47 fedora kernel: intel_vpu 0000:00:0b.0: [drm] Firmware: intel/vpu/vpu_37xx_v1.bin, version: 20260305*MTL_CLIENT_SILICON-NVR+NN-deployment*dbda783919b77553afe5576a59881bca37fb6d7b*dbda783919b77553afe5576a59881bca37fb6d7b*dbda783919b
May 10 05:12:47 fedora kernel: intel_vpu 0000:00:0b.0: [drm] Scheduler mode: HW
May 10 05:12:47 fedora kernel: [drm] Initialized intel_vpu 1.0.0 for 0000:00:0b.0 on minor 0
May 10 05:12:49 fedora kernel: amdgpu: Virtual CRAT table created for CPU
May 10 05:12:49 fedora kernel: amdgpu: Topology: Add CPU node
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: initializing kernel modesetting (IP DISCOVERY 0x1002:0x7550 0x1043:0x061A 0xC0).
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: register mmio base: 0x8C200000
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: register mmio size: 524288
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 0 <common_v1_0_0> (soc24_common)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 1 <gmc_v12_0_0> (gmc_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 2 <ih_v7_0_0> (ih_v7_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 3 <psp_v14_0_0> (psp)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 4 <smu_v14_0_0> (smu)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 5 <dce_v1_0_0> (dm)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 6 <gfx_v12_0_0> (gfx_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 7 <sdma_v7_0_0> (sdma_v7_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 8 <vcn_v5_0_0> (vcn_v5_0_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 9 <jpeg_v5_0_0> (jpeg_v5_0_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: detected ip block number 10 <mes_v12_0_0> (mes_v12_0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: Fetched VBIOS from VFCT
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] ATOM BIOS: 115-G295BP00-100
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: vgaarb: deactivate vga console
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: Trusted Memory Zone (TMZ) feature not supported
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MEM ECC is not presented.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SRAM ECC is not presented.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: VRAM: 16304M 0x0000008000000000 - 0x00000083FAFFFFFF (16304M used)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Detected VRAM RAM=16304M, BAR=16384M
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] RAM width 256bits GDDR6
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0:  16304M of VRAM memory ready
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0:  23890M of GTT memory ready.
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] GART: num cpu pages 131072, num gpu pages 131072
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] PCIE GART of 512M enabled (table at 0x00000083DAB00000).
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x0A003500
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [VCN instance 0] Found VCN firmware Version ENC: 1.12 DEC: 9 VEP: 0 Revision: 15
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MES: vmid_mask_mmhub 0x0000ff00, vmid_mask_gfxhub 0x0000ff00
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: MES: gfx_hqd_mask 0x000000fe, compute_hqd_mask 0x0000000c, sdma_hqd_mask 0x000000fc
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: RAP: optional rap ta ucode is not available
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SECUREDISPLAY: optional securedisplay ta ucode is not available
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: smu driver if version = 0x0000002e, smu fw if version = 0x00000033, smu fw program = 0, smu fw version = 0x00684c00 (104.76.0)
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: SMU is initialized successfully!
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] Display Core v3.2.369 initialized on DCN 4.0.1
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-HDMI FRL PCON supported
May 10 05:12:49 fedora kernel: amdgpu 0000:03:00.0: [drm] DMUB hardware initialized: version=0x0A003500
May 10 05:12:49 fedora kernel: [drm] forcing DP-1 connector on
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-1: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-2: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] DP-3: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] HDMI-A-1: PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0x4000000
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0xc000000
May 10 05:12:50 fedora kernel: amdgpu: Virtual CRAT table created for GPU
May 10 05:12:50 fedora kernel: amdgpu: Topology: Add GPU node [0x1002:0x7550]
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: SE 4, SH per SE 2, CU per SH 8, active_cu_number 64
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 9 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 10 on hub 0
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: ring jpeg_dec uses VM inv eng 1 on hub 8
May 10 05:12:50 fedora kernel: amdgpu: HMM registered 16304MB device memory
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: Using BACO for runtime pm
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] Registered 4 planes with drm panic
May 10 05:12:50 fedora kernel: [drm] Initialized amdgpu 3.64.0 for 0000:03:00.0 on minor 1
May 10 05:12:50 fedora kernel: fbcon: amdgpudrmfb (fb0) is primary device
May 10 05:12:50 fedora kernel: amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
May 10 05:12:50 fedora kernel: [drm] pre_validate_dsc:1667 MST_DSC dsc precompute is not needed
May 10 02:12:52 fedora systemd[1]: [email protected] - Load Kernel Module drm skipped, unmet condition check ConditionKernelModuleLoaded=!drm
May 10 02:12:53 fedora kernel: snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
May 10 02:13:00 fedora coolercontrold[1606]: Initialized GPU Devices: {"amdgpu":{"temps":["temp1","temp2","temp3"],"driver name":["amdgpu"],"driver version":["7.0.4-200.fc44.x86_64"],"locations":["/sys/class/hwmon/hwmon2","/sys/devices/pci0000:00/0000:00:06.0/0000:01:00.0/0000:02:00.0/0000:03:00.0","pci:v00001002d00007550sv00001043sd0000061Abc03sc00i00"],"channels":["GPU Load","fan1","freq1","freq2","power1_average"]}}
May 10 02:13:02 fedora lact[2031]: 2026-05-10T02:13:02.998218Z  INFO lact_daemon::server::handler: AMDGPU DRM initialized
May 10 02:13:03 fedora lact[2031]: 2026-05-10T02:13:03.001451Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
May 10 02:13:03 fedora kwin_wayland[2200]: No backend specified, automatically choosing drm
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:03 fedora kernel: amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: MES(0) failed to respond to msg=REMOVE_QUEUE
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: failed to remove hardware queue from MES, doorbell=0x1000
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: MES might be in unrecoverable state, issue a GPU reset
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Failed to evict queue 0
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Failed to evict process queues
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: GPU reset begin!. Source:  3
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Dumping IP State
May 10 02:13:05 fedora kernel: amdgpu 0000:03:00.0: Dumping IP State Completed
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: MODE1 reset
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: GPU mode1 reset
May 10 02:13:06 fedora kernel: amdgpu 0000:03:00.0: GPU smu mode1 reset
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] PCIE GART of 512M enabled (table at 0x00000083DAB00000).
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: VRAM is lost due to GPU reset!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: PSP is resuming...
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: RAP: optional rap ta ucode is not available
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SECUREDISPLAY: optional securedisplay ta ucode is not available
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SMU is resuming...
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: smu driver if version = 0x0000002e, smu fw if version = 0x00000033, smu fw program = 0, smu fw version = 0x00684c00 (104.76.0)
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: SMU is resumed successfully!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0x4000000
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: program CP_MES_CNTL : 0xc000000
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] DMUB hardware initialized: version=0x0A003500
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 9 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 10 on hub 0
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring vcn_unified_0 uses VM inv eng 0 on hub 8
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: ring jpeg_dec uses VM inv eng 1 on hub 8
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: GPU reset(1) succeeded!
May 10 02:13:07 fedora kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
May 10 02:13:07 fedora lact[2031]: 2026-05-10T02:13:07.682042Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
May 10 02:13:07 fedora lact[2031]: 2026-05-10T02:13:07.687340Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
                                               Module libdrm.so.2 from rpm libdrm-2.4.133-1.fc44.x86_64
                                               Module libdrm_amdgpu.so.1 from rpm libdrm-2.4.133-1.fc44.x86_64
May 10 02:13:19 fedora lact[2031]: 2026-05-10T02:13:19.682609Z  INFO lact_daemon: got kernel drm subsystem event, reloading GPUs
May 10 02:13:19 fedora lact[2031]: 2026-05-10T02:13:19.687762Z  INFO lact_daemon::server::handler: initialized amdgpu controller for GPU 1002:7550-1043:061A-0000:03:00.0 at '/sys/class/drm/card1/device'
May 10 02:13:19 fedora kernel: amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:978
May 10 02:13:20 fedora kernel: amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 100000 tries - hubp2_set_blank_regs line:978

VS Code üzerinde Ollama aracılığıyla DeepSeek veya benzeri büyük dil modellerini (LLM) çalıştırırken, ekranda aniden çizgiler (artifact) belirmesi veya doğrudan siyah ekrana düşme sorunu yaşıyorsanız, sorun büyük ihtimalle donanımsal bir arızadan ziyade modelin VRAM'i ve GPU zamanlayıcısını (MES) kilitlemesidir. Özellikle RX 9070 XT gibi güçlü RDNA mimarili kartlarda bile Ollama'nın varsayılan ayarlarıyla bu durumla karşılaşmak mümkün.

Sorunun Kaynağı ve Log Analizi​

Sistem kilitlendiğinde TTY ekranına dahi (Ctrl+Alt+F3) düşülemiyorsa, sorun AMDGPU sürücüsünün kernel modülünde yaşanıyor demektir. Sistemi Live USB veya acil durum (rd.break) kabuğu ile başlatıp bir önceki çöken oturumun journalctl kayıtlarını incelediğimizde şu tabloyla karşılaşıyoruz:

Bash:
amdgpu 0000:03:00.0: KFD node 0 ih_fifo overflow
amdgpu 0000:03:00.0: MES(0) failed to respond to msg=REMOVE_QUEUE
amdgpu 0000:03:00.0: MES might be in unrecoverable state, issue a GPU reset
amdgpu 0000:03:00.0: GPU reset begin!. Source:  3
amdgpu 0000:03:00.0: VRAM is lost due to GPU reset!

Buradaki KFD (Kernel Fusion Driver) satırları, Ollama'nın ROCm üzerinden GPU'nun hesaplama birimlerine kontrolsüz bir işlem yükü gönderdiğini gösteriyor. Bu ağır yükü yönetemeyen MES (Micro-Engine Scheduler) birimi kilitleniyor. Kernel, sistemi tamamen dondurmamak adına ekran kartına acil durum sıfırlaması (MODE1 reset) gönderiyor.

Sorun da tam bu noktada patlak veriyor: GPU resetlendiği an VRAM içindeki tüm veriler silinir. O sırada VRAM'i kullanan Wayland/KDE masaüstü ortamı ve VS Code'un donanım hızlandırma servisi (Electron) anında çöktüğü için ekranda o meşhur gri çizgiler oluşuyor ve sistem tepkisiz kalıyor.

Çözüm Adımları​

Sorunu kökünden çözmek için Ollama'ya bir "fren mekanizması" takmamız ve RDNA mimarisi için doğru profil tanımlamalarını yapmamız gerekiyor.

1. Ollama Servisine Güvenlik Sınırları Eklemek
Terminal üzerinden Ollama'nın systemd servisine müdahale ederek GPU'yu kilitlemesini engelliyoruz:
Bash:
sudo systemctl edit ollama

Açılan dosyaya (yorum satırlarının arasına) şu çevresel değişkenleri ekliyoruz:
INI:
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.0"
Environment="OLLAMA_KEEP_ALIVE=0"
İlk satır güncel AMD kartlar için doğru compute profilini (ROCm için) zorlarken, ikinci satır modelin kod üretimi biter bitmez VRAM'i boşaltıp kaynakları masaüstü ortamına geri vermesini sağlıyor.

Ayarları sisteme işleyip servisi yeniden başlatıyoruz:
Bash:
sudo systemctl daemon-reload
sudo systemctl restart ollama

2. Modeli Güvenli Bir Bağlam Penceresiyle (Context Window) Çalıştırmak
DeepSeek Coder v2 gibi modeller, devasa bağlam pencereleriyle VRAM'in tamamını çok hızlı şişirebiliyor. Bunu önlemek için modele VRAM kotası koyacağımız özel bir "Modelfile" oluşturuyoruz.
Bash:
nano deepseek-safe.modelfile

İçerisine şunları yazıp kaydediyoruz:
Kod:
FROM deepseek-coder-v2
PARAMETER num_ctx 8192
(8192 tokenlık bağlam, günlük kodlama asistanı işlemleri için fazlasıyla yeterli olacak ve VRAM'i boğmayacaktır.)

Sonrasında bu dosyayı okutarak Ollama'ya kendi güvenli modelimizi yaratmasını söylüyoruz:
Bash:
ollama create deepseek-safe -f deepseek-safe.modelfile

3. VS Code Entegrasyonu
Son olarak, VS Code içerisinde kullandığınız yapay zeka asistanı eklentisinin ayarlarına girerek model seçimi kısmında doğrudan deepseek-coder-v2 yerine az önce oluşturduğumuz deepseek-safe modelini seçiyorsunuz.

Bu üç adımı uyguladıktan sonra ne Wayland çöküyor ne de VS Code; LLM arka planda donanımı terletmeden gayet stabil bir şekilde çalışmaya devam ediyor.
 
Son düzenleme:
Çözüm