Попытка исключения использования бэд блоков файловой системой в Ubuntu


Советую внимательно смотреть на название диска и его разделы.


# apt-get install -y smartmontools


# smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.19.0-42-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-9YN164
Serial Number:    S1E02CG0
LU WWN Device Id: 5 000c50 04a892e6f
Firmware Version: CC4H
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Jan 30 23:12:53 2016 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
          was completed without error.
          Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
          without error or no self-test has ever
          been run.
Total time to complete Offline
data collection: 		(  584) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
          Auto Offline data collection on/off support.
          Suspend Offline collection upon new
          command.
          Offline surface scan supported.
          Self-test supported.
          Conveyance Self-test supported.
          Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
          power-saving mode.
          Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
          General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 231) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   085   085   006    Pre-fail  Always       -       159962165
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       94
  5 Reallocated_Sector_Ct   0x0033   063   054   036    Pre-fail  Always       -       49456
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       280601610
  9 Power_On_Hours          0x0032   067   067   000    Old_age   Always       -       29241
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       93
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   032   032   000    Old_age   Always       -       68
188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       15 15 18
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   060   051   045    Old_age   Always       -       40 (Min/Max 31/40)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       91
193 Load_Cycle_Count        0x0032   097   097   000    Old_age   Always       -       7727
194 Temperature_Celsius     0x0022   040   049   000    Old_age   Always       -       40 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       128
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       128
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       29025h+28m+50.931s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       81735190510632
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       123261680378241

SMART Error Log Version: 1
ATA Error Count: 20 (device log contains only the most recent five errors)
  CR = Command Register [HEX]
  FR = Features Register [HEX]
  SC = Sector Count Register [HEX]
  SN = Sector Number Register [HEX]
  CL = Cylinder Low Register [HEX]
  CH = Cylinder High Register [HEX]
  DH = Device/Head Register [HEX]
  DC = Device Command Register [HEX]
  ER = Error register [HEX]
  ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 20 occurred at disk power-on lifetime: 29241 hours (1218 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 c8 8c 1c 00  Error: UNC at LBA = 0x001c8cc8 = 1871048

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c8 8c 1c e0 00      02:28:59.941  READ DMA
  c8 00 08 c0 8c 1c e0 00      02:28:59.541  READ DMA
  c8 00 08 b8 8c 1c e0 00      02:28:59.216  READ DMA
  c8 00 08 b0 8c 1c e0 00      02:28:59.151  READ DMA
  c8 00 08 a8 8c 1c e0 00      02:28:59.095  READ DMA

Error 19 occurred at disk power-on lifetime: 29238 hours (1218 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      00:01:23.261  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:23.260  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:23.260  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:23.259  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:23.259  READ DMA EXT

Error 18 occurred at disk power-on lifetime: 29238 hours (1218 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 ff ff ff ef 00      00:01:20.307  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:20.289  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:20.253  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:20.252  READ DMA EXT
  25 00 e0 ff ff ff ef 00      00:01:20.250  READ DMA EXT

Error 17 occurred at disk power-on lifetime: 29238 hours (1218 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00      00:01:17.009  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:17.008  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:17.008  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:17.008  READ DMA EXT
  25 00 08 ff ff ff ef 00      00:01:17.008  READ DMA EXT

Error 16 occurred at disk power-on lifetime: 29238 hours (1218 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 ff ff ff ef 00      00:01:14.031  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:14.013  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:13.970  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:13.969  READ DMA EXT
  25 00 00 ff ff ff ef 00      00:01:13.969  READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Если верить написанному в интернете:

1    Raw_Read_Error_Rate
  • VALUE это текущее значение (остаток здоровья)
  • WORST - худшее количество здоровья за все время.
  • THRESH - это значение при котором диск можно считать покойником.


TYPE - тип атрибута:

  • Pre-fail - критический атрибут.
  • Old_age - некритический атрибут.


Почитать, если что:

https://www.opennet.ru/base/sys/smart_hdd_mon.txt.html


Может DD сможет забить диск нулями? И вдруг окажется, что никаких критических ошибок на диске на самом деле нет?

Хотя на таком диске ничего уже хранить важного я бы не стал. С него можно, например, торренты раздавать, хранить индийские фильмы.

Я в этом также разбираюсь, как свинья в апельсинах!

# dd if=/dev/zero of=/dev/sdb
dd: writing to ‘/dev/sdb’: Input/output error
1871049+0 records in
1871048+0 records out
957976576 bytes (958 MB) copied, 8600,9 s, 111 kB/s

Лучше запускать, чтобы было видно прогресс.

# apt-get install -y pv

# dd if=/dev/zero | pv | dd of=/dev/sdb


В общем команда dd только подтвердила, что диску не хорошо. Впринципе, она ничем и не помогла.


Попытка исключения использования бэд блоков файловой системой


# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x3ee07305.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.

Command (m for help): n
Partition type:
    p   primary (0 primary, 0 extended, 4 free)
    e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-3907029167, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-3907029167, default 3907029167):
Using default value 3907029167

Command (m for help):
Command (m for help): w
The partition table has been altered!


# mkfs.ext4 /dev/sdb1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
122101760 inodes, 488378390 blocks
24418919 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
14905 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
  32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
  4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
  102400000, 214990848

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:

done


Создаем список бедблоков:

# badblocks -s /dev/sdb1 > ~/badblocks_sdb1.txt

Пометка бэд блоков (в дальнейшем помеченные блоки будут игнорироваться):

# e2fsck -l ~/badblocks_sdb1.txt /dev/sdb1


Результаты

В одном случае получилось. В другом нет. (Мне отдали несколько дисков).

В том котором не получилось, диск натыкался на ошибку, в результате чего, он вообще переставал быть доступен системе.

Как-то не особо хотелось им заниматься и я забил.

В общем 2TB диск Seagate Barracuda ST2000DM001-9YN164 помер. Он прожил яркую жизнь без сбоев и умер совсем молодым, практически сразу после окончания гарантии (на самом деле диск бы не мой и я ничего не знаю о гарантии). Ему было всего 3.34 года (29238 часов / 24 / 365). (если я правильно понимаю и считаю).

Если было бы очень нужно его заставить работать, я бы попытался создать на нем разделы. И те разделы, в которых бы были бэд блоки я бы постарался не использовать. Но это все достаточно трудозатратно по времени.

Впрочем, если диск действительно нужен, лучше купить новый или как вариант хранить данные (не особо важные) в облаках, например майл ру. Они как-то раздавали бесплатно аккаунты с лимитом на 1TB. Сейчас, без каких-либо акций можно получить 20GB. Но что такое 20GB по сравнению с 1TB?