Equipment Reliability Institute
ERI News - your reliability newsletter
May 2004 - volume 15


Wayne TustinHello, readers

Spring! Isn’t it wonderful? A wonderful time of transition between winter and summer.

May I introduce the authors of four articles in this issue.

Allow me first to introduce Dave Pinsky an engineering fellow at Raytheon's Reliability Analysis Lab in Lexington, Mass. Dave handles special system-level investigations and long-term reliability evaluations. Yes, electronic units sometimes fail. It behooves us to find out why a unit failed, so that we can avoid similar failures in the future. Thus we bring you Dave’s article on what you need to know about electronic failure analysis.

Allow me next to introduce Joel Newberger of Farmingdale, LI, New York. Dave’s field is keeping electronics from overheating, and thus he has written “Rules of Thumb" for Thermal Design. If you’d like to know more about Joel, his teaching and his consulting, visit his page at ERI website.

Next, I’d like to introduce John Hess, who has written "Real-Time Monitoring and Analysis of Pneumatic Vibration Test Systems Actuators". The pneumatic actuators being used IN repetitive shock systems for HALT, ESS and HASS deteriorate over time. A new product, V-flare, actively monitors each actuator to determine its operational efficacy, catching issues before testing or screening is affected.

Finally, we have three more of Robert L. Renz’s “Test Lab Musings”. Hey, others among you experienced test engineers – please submit some of your thoughts for our August issue.

Best wishes,
Wayne Tustin

*******************************

Failure Analysis of Electronics Assemblies
by Dave Pinsky

Here's what you need to know before, during and after electronics assembly failure analysis.

For anyone who has been in a dispute over a failure condition, the value of an objective, independent failure analysis is apparent. Failures occur for any number of reasons: design flaws, poor quality materials, manufacturing problems, improper conditions during transport or storage, overstress during operation, etc. Most companies (and their suppliers) lack the facilities to perform the in-depth investigation that may be required to solve an error condition. Without an accurate analysis of the failure, good faith efforts to take corrective action can be futile or counterproductive.

Much more than a glorified test lab, a failure analysis lab requires specialized equipment and knowledge of cause and effect relationships to discover the root cause of a failure and the resulting chain of events. During the investigation, the analyst will know which tests to perform and in what sequence to bring the investigation to a prompt, economical conclusion. The analyst will also recommend appropriate corrective measures, based on where in the product cycle the failure occurred and the priorities of the organization. If any of these capabilities are necessary, engaging the services of a professional failure analyst will make sense.

Note that sometimes even a limited failure analysis can be highly worthwhile, such as when there is an urgent need to eliminate a potential cause. For example, knowledge that a failure was not caused by ESD can prevent wasting money on the wrong corrective action.

Things to know before a failure occurs
Laying the groundwork for a successful outcome starts before the failure analyst arrives on the scene. To avoid classic mistakes, the following precautions are worth sharing with all personnel who may find themselves dealing with a failure situation.

Preserve the physical evidence.
If you don't know why something is broken, leave it alone. Re-flowing a fractured solder joint or replacing a failed component may be expected in a manufacturing culture, but it's a disaster to the failure analyst, because it could mean destroying evidence of a larger problem. (Obviously, people aren't expected to stop production and call the failure analyst for every random event. But if there's a trend, the worst thing would be to keep fixing defective parts and sending them along.)

Don't put fractured surfaces back together to see how well they fit.
If something is broken in two, it must be human nature to try to jam the pieces together. On a microscopic level, this sliding and mashing together of the surfaces destroys valuable evidence. If provided with a clean, well-preserved fracture surface, the failure analyst can almost always tell you exactly why it broke - as well as whether the parts that were shipped last week need to be recalled.

Avoid electrical testing beyond what is minimally required for troubleshooting.
Don't keep testing an intermittent until the circuit becomes completely open. Once you blow up the failure site, there's not much analysis that can be done: other than to confirm that someone tried to ram a lot of power through the circuit.

Protect materials from heat, humidity, vibration, ESD and other forces.
Use recommended protective coverings and containers.

Collect all existing data from the failure.
What may seem irrelevant to the person on the scene may turn out to be crucial. Remember that it is better to err on the side of too much data rather than too little. Also be sure to record, photograph and otherwise document any test set-ups that were used.

Don't overlook human sensory input.
Take note of unusual - or even slightly different - sounds, smells and sights. Whenever possible, interview the people who first noticed the failure. Often you get different answers than from somebody who wasn't actually present.

Physical analysis
If the evidence has been preserved, the failure analyst can tease a great deal of information from a failure site, aided by specialized equipment not available in traditional test labs. For example, the electron microscope at Raytheon's Reliability Analysis Laboratory allows examination at up to 200,000×. In addition to knowing how to interpret the tests correctly, the analyst will understand what tests are truly necessary - and which aren't - to obtain a clear view of all forces affecting the failed part.

The investigation begins with an external examination of the suspect component. Depending on the device type and specific observations, the analyst may employ a variety of techniques: testing for thermal resistance, emission microscopy, hermeticity, acoustic microscan or infrared imagery. The analysis then proceeds to the internal examination, which brings into play one of the failure analyst's special skills; the ability to take things apart without destroying crucial evidence.

Taking apart delicate components is as much an art as a science. For example, a cross section involves more than simply slicing an object in half. The component is cast in epoxy to protect it from disintegrating, then ground and polished to a mirror finish for examination under a microscope. If the analyst is investigating a separation between metals, an electron microscope coupled with an X-ray emission spectrometer can map the distribution of atomic species and identify exactly where the separation occurred. This information is essential for the process people, who are putting down layer after layer of metallization.

Disassembly can take a variety of forms: chemical, thermal or mechanical. For example, if the component under investigation is a plastic encapsulated microcircuit, it will have to be decapsulated. This process involves dissolving the plastic, leaving behind the semiconductor device and the wires for examination. There also will be times where disassembly is not feasible: the problem may lie in the way parts fit together, there may be a need to look at the entire unit, or the part simply may be too valuable to destroy. In these situations, the analyst can use non-invasive techniques such as ultrasound and X-ray to examine what's inside.

Sorting out primary and secondary failures
Most failures involve a series of cascading events. The failure analyst interprets physical signatures to determine the correct sequence, often creating a failure tree (Figure 1) to help guide the investigation. (A complex failure cannot be solved without one.) Branches list all possible reasons why a failure has occurred and indicate where additional information is needed to support or refute each potential cause. The analyst will work his way down through the branches, investigating each possibility based on its degree of plausibility. At the end, the analyst has not only found the root cause, but also mapped the entire sequence back to the initial symptom.



Figure 1 - Sample fault analysis tree diagram

In tracing the sequence of events, the failure analyst can recognize subtle clues that may have eluded previous investigators. In a recent case, a client came to RAL for an assessment of why a high percentage of their power controller devices were failing. The client had already sent the devices back to the manufacturer, who had determined that the failure was caused by an electrical overstress - in other words, a system flaw and the client's responsibility. The client measured voltages and found that indeed they had voltage spikes, which are actually fairly common in a motor driver controller. They performed a design change to reduce the voltage spikes and the failure rate did decrease for a time.

Eventually the failure rate shot up again. At this point, the client contacted RAL, which used failure tree analysis to assure that every possible failure mode was considered, not only electrical overstress. In fact, the parts were being mechanically damaged during assembly, leaving them susceptible to later failure. The slight improvement after the design change occurred because the "weak sisters" (the damaged parts that were primed to fail) were no longer being hit quite as hard when voltage was applied.

Figure 2 - Scanning acoustic microscopy image of die attach in a plastic IC.

The manufacturing process involved a power die mounted on a heat sink. During assembly, physical stress was being placed on the die, causing minute cracks to form. While the initial mechanical insult only started the crack, the accumulation of power and temperature cycles caused it to spread through the brittle material within the die. In some cases the cracks penetrated the active regions of the die, effectively reducing the size of the transistor. The location of a crack was determining if the part would fail when power was applied.

The manufacturer had misinterpreted the sequence of events. When the power devices failed, they essentially blew up, leaving a crater and melted debris. The fact that there also happened to be cracks in the die was not unusual in a part that had been subjected to massive thermal stress. RAL determined that the cracks occurred first, causing the part to blow up - rather than that the thermal stress caused the cracks, as the manufacturer had asserted. The cracks were not readily visible - they were within material encapsulated inside the die. Using a scanning acoustic microscope, RAL was able to detect cracks in parts that had not yet blown up, confirming its theory. Eventually RAL worked with the manufacturer to modify the way the parts were mounted, reducing the bending stress that lead to the failure

Determining the "people-centric" root cause
Of course, the physical root cause didn't just occur on its own; ultimately the actions of people made it happen. While examining the human factors is beyond the scope of a narrow technical analysis, it may turn out to be the most valuable step. Is there a fundamental breakdown in communications or supervision? Are organizations and people receiving proper incentives for their work? Are operators trained properly? At some point the physical has to transition to the organizational.

Figure 3 - A failure analysis lab requires specialized equipment and knowledge of cause-and-effect relationships.

Part of the failure analyst's role involves determining the best way to work with the people involved to make sure that something similar doesn't happen again. For example, suppose the problem is excessive heat on a solder joint. The easy solution is to tell technician #124 that he put too much heat on the joint and to be more careful next time. However, a more valuable corrective action might be to revise training programs to spread awareness of the damage that can be caused by too much heat.

Identifying and implementing corrective actions
After pinpointing the root cause, the analyst will propose corrective measures, which could include improvements in manufacturing processes, using suppliers with higher levels of quality control, design changes, revised specifications and other recommendations. The analyst will also suggest ways to deal with the current problem depending on the stage of the development cycle. For example, if a capacitor failure is discovered before product assembly, the solution is easy - don't install those components. If the same failure is discovered after manufacturing has started, the analyst and client will have to weigh the alternatives. While the client may want to remove the bad component and install good ones, the cost and risk associated with rework may be prohibitive. An alternative might be to screen the manufactured parts by running them through an environmental test to weed out the bad ones.

One of the most difficult failure situations is when defective product has already gone out the door. Are you going to conduct a recall? Are you going to go out and replace every bad part in the field? For companies in this predicament, reliability testing can determine the odds that a defective component will fail, the possible consequences and the appropriate course of action. A reliability analysis often follows from a failure analysis and is used to determine the reliability of deficient material. This is the kind of information you won't find in a textbook, because no one would intentionally design and test systems constructed from damaged goods. And when they are forced to perform such tests, they never publish the results!

The failure analyst will conduct experiments on a sample bad part, such as accelerated environmental stress testing, to determine the probably of failure within a given time period. The result will be numbers that managers can deal with to make business decisions. For example, if the product is due for scheduled maintenance in 18 months, the degree of risk may not justify a special retrofit before then.

Validating corrective actions
The final step in the failure analysis is to verify that the corrective measures were effective. The type of testing obviously depends on what was fixed and how. If it's a production yield issue, the test is obvious: run the process and see if yield has improved. If a connector has failed after years in the field, the analyst might devise an accelerated aging test to simulate long-term exposure. If the failure involved corrosion, the analyst might run a salt spray test. There is a whole science and art that goes into designing test conditions to simulate various environments.

During this phase, the analyst's expertise will help the client avoid common mistakes. For example, when evaluating a thermal cycling failure, most people assume that thermal shock must be more severe than a slow thermal cycle. Therefore, they'll run a thermal shock test, without bothering with a slow thermal cycle test. (The shock test also takes less time to complete, which may also have something to do with the decision.) However, in certain failure conditions, slow thermal cycling can actually be more stressful than shock. Sometimes essential tests are neglected because they seem unnecessary or counter-intuitive to the layman.

Success without fanfare
Most companies engage a failure analyst when disaster is looming and they need an expert who can save the day at the last minute. An alternative relationship is to consult with the failure analyst during the design stage, when considering approaches where the outcome is not certain. For example, someone may want to try a new technique or technology because it seems better/cheaper/faster. Another red flag would be when familiar technology is to be used in a new environment. For example, an internal void within a component is normally not an issue, unless it's at the bottom of the sea, where the part could be crushed.

Through early intervention, the failure analyst can review the proposed design with an eye toward reliability and potential for failure, identifying potential flaws before they rise to the level of a problem. The analyst will then advise the design team on unexpected chemical, electrical and mechanical stresses, and how the design can excite the various failure modes known to exist. By identifying pitfalls in advance, the analyst can recommend changes that will avoid the need for a full-scale failure investigation down the road. It's a less dramatic approach, but more efficient and predicable, and certainly easier on the blood pressure.

Dave Pinsky handles special system-level investigations and long-term reliability evaluations as an engineering fellow at Raytheon's Reliability Analysis Lab in Lexington, Mass. He can be reached at david_a_pinsky@raytheon.com and (781) 860-3330.
ERI has been granted permission to publish the above article from Reed Business.

(back to the top)

*******************************

"Rules of Thumb" for Thermal Design
by Joel Newberger

There would be no “rules of thumb” if everyone knew the Old English origins of thumb rules! Nevertheless, keeping up with engineering traditions, the following “rules of thumb” relate to safe, reliable, thermal design practice.

Table 1 summarizes allowable power density levels associated with frequently encountered packaged electronics physical designs. All power density levels shown in Table 1 are based on spatially invariant heat loads (i.e., uniformly distributed), and a maximum temperature rise of 20C (36F) above surrounding bulk fluid temperature (air, or liquid where indicated). Areal (A.) and Volumetric (B.) power density limits should not be processed as “maximum-don’t-ever-exceed” levels, but rather, if noticeably exceeded, the designer must be prepared to incorporate thermal enhancements such as fans and/or heat sinks.

As an example, an ambient cooled 9.0 x 6.0 inch circuit board dissipates 38.0 watts. The power density is 38.0/9x6=0.7 watts/sq in. This design is clearly not a candidate for natural cooling. Forced cooling will require multiple chip heat sinks and/or higher air velocity…….but not too high, or else acoustic levels will be unacceptable. And finally, liquid immersion cools everything!
A second example is a metallic, totally enclosed package at 500 cu. in. (6”height). At a thermal load of 30.0 watts, power density=0.06 watts/cu in. This configuration can be readily designed to reliably operate in an indoor environment. However, if used outdoors, “beware” of solar illumination and nocturnal radiation to the sky “vault” (mechanism for internal moisture condensation, and ultimate corrosion).

Finally, consider a 300.0 cu. in. plastic desk top unit, six inches high with 30.0 watts dissipation. Rules A and B of Table 1 do not apply to plastics. However, at a six inch enclosure height, the required vent opening (inlet perf=outlet perf) is 30/2=15 sq. in. opening, minimum. The more vent opening, the better! Openings can be rectangular in shape, but no opening should be less than 0.08 inches.

It is left as an exercise to research the etymological origins of “Rules of Thumb.” Good Luck.

Table 1:

Rules of Thumb
A - Areal Power Density; Watts/Sq In
1. Air, Naturally Cooled Vertical Surfaces 0.15 (Height<12.Inches)
2. Air, Forced Cooled Surfaces 0.25-0.30 (Velocity=500ft/Min)
3. Liquid Immersion >2.0
B - Volumetric Power Density; Watts/Cu In
Naturally-Cooled, Non Vented 0.03, Height > 2.0 Ft.
Metal Enclosures 0.07, Height < 2.0 Ft.
C - “Perf” (Vent Opening) Power Density; Watts/Sq In Vent Opening
  Enclosure Height, In. Watts/Sq In
Naturally Cooled, Vented Enclosure (Internal Flow Impedance Negligible When Compared To Impedance Associated With Equal I/O Vent Openings.) 6.0 2.0
12.0 2.5
60.0 6.0


Joel Newberger is President of Thermalogics, Inc., and a principal in SNA Engineering. SNA specializes in mechanical design/packaging and in thermo/structural analysis of electronic equipment used in both commercial/industrial and military applications. To contact Joel send an e-mail to newberger@equipment-reliability.com.

(back to the top)

*******************************

Real-Time Monitoring and Analysis of Pneumatic Vibration Test Systems Actuators
by John Hess

Recently there has been growth in the acceptance and use of pneumatic vibration test systems. While these systems help to improve the reliability of tested products, they themselves are subject to failure due to wear and contamination caused by unusually harsh operational environments. Since all pneumatic vibration test system designs rely upon reciprocating actuators, they all can suffer from the same actuator failure issues. Failure or degradation of one or more actuator(s) can lead to an imbalance or skewing of the vibratory energy on the table, i.e. hot and cold spots, as well as excessive wear on the remaining good actuators. Since the vibration control system automatically compensates for actuator failure by running the remaining operating actuators at a higher reciprocation rate, the user is unaware of the problem until the next maintenance cycle, if then. The result is under testing or over screening of products and unrepeatable tests and test results.

In response to this problem, Data Flare has developed V-flare. V-flare, supporting all systems currently available, actively monitors the operation of pneumatic actuator vibration systems. Using V-flare, users proactively determine the operational efficacy of their pneumatic actuator test system, catching issues before affecting product quality or reliability.

V-flare consists of three major components: sensors, signal conditioner, and analysis/communications platform. V-flare’s sensors attach directly to the actuator, requiring no actuator disassembly and expediting system installation. These sensors, through induction, measure the velocity of the actuator piston within its housing. Since V-flare measures the motion of the piston within the actuator, it requires one sensor per actuator to accurately assess the performance of the entire vibration table.

The computational platform supports the sensor signal analysis and communications requirements of V-flare. Based upon an x86 processor, the system analyzes the filtered and amplified sensor signals from the signal conditioner, calculating velocity, acceleration, and position information of the actuator piston. Through its sensor system and analysis techniques, V-flare determines the operational frequency of actuator pistons to a resolution of less than .05 Hz. This ultra fine analysis of behavior allows V-flare to identify unintended actuator fluctuations to levels previously unattainable.

V-flare identifies and determines the characteristics of operation of individual pneumatic actuators. The system uses these characteristics for each actuator, which include piston velocity, acceleration, and position, to identify statistically significant deviations, as well as for comparison against gold standards. V-flare then reduces these characteristics to four status levels; off, running, marginal, and failed. As the behavior of an actuator, or multiple actuators change due to contamination or excessive wear, V-flare identifies these changes, and flags them for the user.

Not only does the computational platform support V-flare’s analysis, but it also supports the underlying communications capabilities. Depending upon the platform selected, the system supports local viewing of actuator characteristics, or remote access through TCP/IP networks with an ordinary web browser. In addition, since the platform is based upon Microsoft’s .NET platform, V-flare supports actuator status streaming through web services to 3rd party and user created applications.

In a recent field test, V-flare analyzed a poorly performing vibration system. During testing, V-flare identified one failed (non-operating) and one degraded (running at lower impact levels) actuator. Replacing these actuators returned the system into specification, saving the user over $6,000 in actuator replacement costs. Incidentally, these savings do not take into account costs associated with out of specification tests and screening.

If the consistency and repeatability of your accelerated testing and screening is important to your organization and if you prefer a maintenance program based on the health of you system rather than the calendar, please contact Data Flare regarding V-flare.

John Hess, CTO of Data Flare, leads the technology initiatives at Data Flare. Previously, John managed and directed hardware and software development at Internet network communications, E-commerce, and Capital Equipment organizations.

(back to the top)

*******************************

Test Lab Musings (part 4)
by Robert L. Renz

And while we're on the topic of leftover stuff, what about the old fixture that is bolted together and drilled for use on a table/expander that doesn't exist, except maybe in photos of the "Old Lab" that are hanging on the wall? At this point, throw it away, or if you still need it from time to time, its off to the machine shop for a dose of Plastic Aluminum epoxy adhesive, and some new holes.

Even though an accelerometer has hex sides, tighten it down very gently – use a very small finger to push the wrench instead of your hand. The makers have torque specs, but you might not have a small enough torque wrench…

Murphy’s Law being what it is, the odds are that the accelerometer you screw into the fixture will wind up with the connector facing in the wrong direction. Not to worry - you can either use an accelerometer with a top connector instead of a side connector or add a shim washer, but be sure that it is as thin as possible (an 0.015 thick washer will move the accelerometer 1/2 turn if you use 10-32 accelerometer studs).

Robert L. Renz of General Dynamics - Advanced Information Systems at Bloomington, Minnesota.

(back to the top)


Free sample of Chapter 1


If you would like to request a free sample of Chapter 1 - "What are vibration and shock?", from Wayne's new book "(...) Random Vibration and Shock Testing", please visit our website. Fill out the quick form and submit it to us. We will then e-mail you a PDF file of Chapter 1.

 
New Fixture Design course


"Vibration and Shock Test Fixture Design" is the newest course offered by ERI. It will meet at October 12-14, 2004, at Pomona, California.

You can read about instructor Steve Brenner at our website.
Steve admits to having designed some poor fixtures. Fortunately, he was able to gain some theoretical understanding of structural responses to
vibration and shock, to study those unsatisfactory fixture, to improve
them, and to avoid most mistakes on future designs. Here he will teach
what he has learned on this subject. Most of Steve's presentation will
use PowerPoint slides. A highlight on Day #3 will be a visit to a professional manufacturer of fixtures, Baughn Engineering at LaVerne, California to see fixtures evolving from raw materials into finished,
ready-to-ship fixtures.

 

ERI Overseas Courses


If you cannot attend courses in the US, ERI offers two overseas courses. ERI Specialists Deepak Jariwala will teach at Singapore, and Markus Dumelin will teach at Switzerland. Dates and locations are shown below. Click on the links for more detailed information.

July 13-15, 2004,
at Singapore

October 5-7, 2004,
at Zug, Switzerland

 
Vibration and Shock courses coming up


Wayne Tustin will teach short courses in vibration testing, shock testing, measurement, analysis, calibration, HALT, ESS and HASS at the following locations:

August 24-26, 2004
Santa Barbara, California

October 5-7, 2004
Littleton, Massachusetts

November 1-3, 2004
Detroit, Michigan

December 7-9, 2004
Marietta, Georgia

If neither of these locations and dates meets your needs, perhaps you’d like to have a customized training for presentation at your facility for your designers and test specialists.

 
Announcements


Safety Critical Systems Conference
19-20 May, 2004 Dissecting The Latest Tools & Techniques For Increasing
Effectiveness In Safety Critical Systems (London, UK)

Please click here to get more information about this event.

 
Contact information


ERI - Equipment Reliability Institute
1520 Santa Rosa Ave.
Santa Barbara - CA - 93109
Tel: (805) 564-1260
Our fax number:
(805) 966-7875

Wayne Tustin tustin@equipment-
reliability.com

Webmaster webmaster@equipment
- reliability.com

Websites
http://www.equipment-
reliability.com

http://vibrationand
shock.com

Copyright © 2000-2004 Equipment Reliability Institute. All rights reserved.

 
Free Newsletter


Subscribe
If you would like to subscribe to ERI News, go to either website, fill in the form "Free Newsletter" and hit the Submit button. Subscribe now!

Recommend
If you enjoy reading ERI News and want to recommend it to a friend, just hit "forward" on the menu of your e-mail program or tell your friend to subscribe at our website.

Previous issues
Missed the previous issues? It is not a problem. Just visit our newsletter archives section and find all ERI's News issues.

Unsubscribe
If you do not want to receive ERI's quarterly newsletter, please send us an e-mail using the same e-mail address that brought you the newsletter, with "remove" as subject.

 


Visit www.equipment-reliability.com
Visit www.vibrationandshock.com