fafig Napisano Sierpień 18, 2009 Zgłoszenie Share Napisano Sierpień 18, 2009 ma ktos pomysl jak wymyslic co w kompie powoduje mi ciagle zawieszania? skanowalem memtestem cala pamiec dzisiaj. zainstalowalem sobie fedore na innej partycji, pochodzila 15 min, zwis, wlaczylem centosa, zostawilem tak na jakis czas, wracam - zwis. zadnej informacji w zadnym logu, nic, zero. co ciekawe cdrom zaczal sie dziwnie zachowywac, poruszalem wtyczka - chodzi. moze to sprawa wyrobionych koncowek do sata i w konsekwencji braku laczenia ( niby nie odlaczalem tego za czesto). tak dla pewnosci odkurzylem dzisiaj wnetrze przedmuchalem wtyczki i popryskalem "kontaktem". moze dysk po prostu pada? pomyslow za bardzo juz nie mam. sterowniki wylaczylem nvidii, wyglada na to ze to moze byc kwestia sprzetu. to samo sie dzieje na kazdym jednym kernelu. mam jedna hipoteze jeszcze - na fedorze wywalilo mi kerneloopsa z smp, moze to wina jakiegos governora do obslugi cpufreq, ale tez nie jestem pewien i nawet nie wiem juz gdzie szukac. z gory dzieki za pomysly. logi smartctl /dev/sda smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright ? 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3320620AS Serial Number: 3QF08V9R Firmware Version: 3.AAD User Capacity: 320,072,933,376 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Tue Aug 18 20:21:36 2009 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 115) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 108 094 006 Pre-fail Always - 14958294 3 Spin_Up_Time 0x0003 094 090 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1187 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 600604945 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14744 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1226 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 062 054 045 Old_age Always - 38 (Lifetime Min/Max 37/39) 194 Temperature_Celsius 0x0022 038 046 000 Old_age Always - 38 (0 12 0 0) 195 Hardware_ECC_Recovered 0x001a 063 054 000 Old_age Always - 218204681 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 284 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 269 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 269 occurred at disk power-on lifetime: 2822 hours (117 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 1e 54 42 00 e0 Error: ICRC, ABRT 30 sectors at LBA = 0x00004254 = 16980 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 3f 33 42 00 e0 00 00:18:09.053 READ DMA EXT 25 00 3f 3f 00 00 e0 00 00:18:09.053 READ DMA EXT 25 00 3f 2b 44 00 e0 00 00:18:09.050 READ DMA EXT 25 00 3f 33 42 00 e0 00 00:18:09.049 READ DMA EXT 25 00 3f 3f 00 00 e0 00 00:18:09.048 READ DMA EXT Error 268 occurred at disk power-on lifetime: 2822 hours (117 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c4 00 60 98 42 5e e2 00 00:17:12.231 READ MULTIPLE c4 00 08 90 42 5e e2 00 00:17:12.223 READ MULTIPLE c4 00 08 50 db f3 e2 00 00:17:12.069 READ MULTIPLE c4 00 08 48 db f3 e2 00 00:17:12.068 READ MULTIPLE c4 00 08 90 34 f1 e2 00 00:17:12.050 READ MULTIPLE Error 267 occurred at disk power-on lifetime: 2821 hours (117 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c4 00 60 19 42 c5 e1 00 00:08:46.163 READ MULTIPLE c5 00 08 69 ae 07 e0 00 00:08:46.160 WRITE MULTIPLE c5 00 08 18 d3 fe e2 00 00:08:46.160 WRITE MULTIPLE c5 00 08 38 b2 fc e2 00 00:08:46.190 WRITE MULTIPLE c5 00 08 b8 b1 fc e2 00 00:08:46.190 WRITE MULTIPLE Error 266 occurred at disk power-on lifetime: 2821 hours (117 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c4 00 80 00 de 0e e0 00 00:08:39.522 READ MULTIPLE c4 00 80 80 dd 0e e0 00 00:08:39.513 READ MULTIPLE c4 00 80 00 dd 0e e0 00 00:08:39.505 READ MULTIPLE c4 00 18 79 ea b6 e1 00 00:08:39.496 READ MULTIPLE c4 00 40 31 ea b6 e1 00 00:08:39.488 READ MULTIPLE Error 265 occurred at disk power-on lifetime: 2821 hours (117 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c4 00 80 00 bb 0e e0 00 00:08:32.319 READ MULTIPLE ec 00 00 00 00 00 a0 02 00:08:32.316 IDENTIFY DEVICE ef 03 08 00 00 00 a0 00 00:08:32.316 SET FEATURES [set transfer mode] ec 00 00 00 00 00 a0 02 00:08:32.313 IDENTIFY DEVICE 00 00 80 00 00 00 00 06 00:08:32.186 NOP [Abort queued commands] SMART Self-test log structure revision number 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. /dev/sdb smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright ? 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD103UJ Serial Number: S13PJ90QB40988 Firmware Version: 1AA01113 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Tue Aug 18 20:21:43 2009 CEST ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (11388) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 191) minutes. Conveyance self-test routine recommended polling time: ( 20) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 077 077 011 Pre-fail Always - 7720 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 193 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 4995 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 186 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 068 000 Old_age Always - 29 (Lifetime Min/Max 29/30) 194 Temperature_Celsius 0x0022 070 068 000 Old_age Always - 30 (Lifetime Min/Max 29/30) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 6669 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 5 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Odnośnik do komentarza Udostępnij na innych stronach More sharing options...
moorray Napisano Sierpień 18, 2009 Zgłoszenie Share Napisano Sierpień 18, 2009 Zawieszenia to najtrudniejsze do rozpracowywania błędy... na Twoim miejscu zająłbym się sda, oczywiście nie znam się na tym, ale jak masz Errory to może być powód problemu. Gdybyś chciał debugować jądro, to nie powinno być kłopotu ze znalezieniem w sieci jak to się czyni (kernel hacking/debugging). Musisz sobie przekompilować jądro włączając kilka fajnych rzeczy w sekcji kernel hacking i do dzieła... może uda Ci się pojechać na samych symbolach + magic SysRq, uderzyć SysRq+Alt+p i dostać nazwę przywieszonej funkcji. Sorry za pytanie, ale muszę je zadać... testowałeś bez X-ów prawda? Dobra wiadomość jest taka, że możesz znaleźć (i naprawić) błąd w jądrze Linuksa, ja bym się cieszył, powodzenia Odnośnik do komentarza Udostępnij na innych stronach More sharing options...
fafig Napisano Sierpień 18, 2009 Autor Zgłoszenie Share Napisano Sierpień 18, 2009 jeden problem to ja mam, na fedorze wyskakiwal blad SMP, czyli mozliwe ze wina kontrolera apic. np na module kvm centos dostawal lock na rdzenie kolejno 0 i 1. mozliwe ze po prostu apic cos nawala mimo ze nigdy nie musialem go wylaczac. zawsze mi sie wydawalo ze to sie powinno wylaczac dopiero jak jest kernel panic, no ale pewnie to sa tez objawy problemow z apiciem. na kerneltap wyczytalem ze oni cos popsuli w obsludze od kernela 2.6.9 i tak sie to ciagnie. z drugiej strony producenci plyt glownych (w szczegolnosci consumer electronics) implementuja apic niezgodnie ze standardem - stad problemy. nastepny komp zloze na plycie serwerowej jakiegos tyana wezme albo cos, bo w sumie czego oczekiwac od plyty za 350zl z ucietym biosem. heh w dmesg pisze zebym sobie nume wlaczyl, tylko ciekawe jak, skoro takiej opcji nie ma. jednak na plycie nie ma co oszczedzac...tak czy owak komp przeszedl czyszczenie wszystkich podzespolow, mozliwe ze cos nie laczylo. wlaczylem pelny test smarta na hdd sprobuje jutro potestowac bez apica. na listach dyskusyjnych niektorzy ludzie tez zglaszaja takie problemy. np nawet na centosie 4 ktos pisal ze dostaje takie zwisy, w losowym czasie, pod losowym obciazeniem - zasugerowal ze to moze wina sprzetu byc. raczej nie bede debugowal kernela z prostej przyczyny - nie za bardzo potrafie grzebac w takich rzeczach, tymbardziej ze 3ba by jakas ksiazke o kernelu poczytac wpierw.... tak przy okazji to przypomnialo mi sie ze kiedys na starym komputerze (p3 933) uruchamialem archa i tez sie zawieszal - po jakichs 15 minutach w konsoli. sam juz nie wiem co o tym sadzic...tak czy owak dzieki za odzew Odnośnik do komentarza Udostępnij na innych stronach More sharing options...
InIrudeBwoy Napisano Sierpień 18, 2009 Zgłoszenie Share Napisano Sierpień 18, 2009 A temperatury są w porządku? Jeśli chcesz wykluczyć problemy z dyskiem to odpal F11 z pendrive i zobacz czy też padnie. Odnośnik do komentarza Udostępnij na innych stronach More sharing options...
fafig Napisano Sierpień 18, 2009 Autor Zgłoszenie Share Napisano Sierpień 18, 2009 wlasnie tak mysle zrobic, tylko wpierw tego smarta skoncze. a moze po prostu wymiana kabli pomoze... Odnośnik do komentarza Udostępnij na innych stronach More sharing options...
Rekomendowane odpowiedzi
Jeśli chcesz dodać odpowiedź, zaloguj się lub zarejestruj nowe konto
Jedynie zarejestrowani użytkownicy mogą komentować zawartość tej strony.
Zarejestruj nowe konto
Załóż nowe konto. To bardzo proste!
Zarejestruj sięZaloguj się
Posiadasz już konto? Zaloguj się poniżej.
Zaloguj się