|
Equipment Reliability
Institute
ERI News - your reliability newsletter
May 2004 - volume
15
|
| Hello,
readers
Spring! Isn’t it wonderful? A wonderful time
of transition between winter and summer.
May I introduce the authors of four articles in this
issue.
Allow me first to introduce Dave
Pinsky an engineering fellow at Raytheon's Reliability Analysis
Lab in Lexington, Mass. Dave handles special system-level investigations
and long-term reliability evaluations. Yes, electronic units sometimes
fail. It behooves us to find out why a unit failed, so that we can
avoid similar failures in the future. Thus we bring you Dave’s
article on what you need to know about electronic failure analysis.
Allow me next to introduce Joel Newberger of Farmingdale,
LI, New York. Dave’s field is keeping electronics from overheating,
and thus he has written “Rules of Thumb"
for Thermal Design. If you’d like to know more about Joel,
his teaching and his consulting, visit his
page at ERI website.
Next, I’d like to introduce
John Hess, who has written "Real-Time
Monitoring and Analysis of Pneumatic Vibration Test Systems Actuators".
The pneumatic actuators being used IN repetitive shock systems for
HALT, ESS and HASS deteriorate over time. A new product, V-flare,
actively monitors each actuator to determine its operational efficacy,
catching issues before testing or screening is affected.
Finally, we have three more of Robert
L. Renz’s “Test Lab Musings”.
Hey, others among you experienced test engineers – please
submit some of your thoughts for our August issue.
Best wishes,
Wayne Tustin
|
| ******************************* |
| Failure
Analysis of Electronics Assemblies
by Dave Pinsky
Here's what you need to know before,
during and after electronics assembly failure analysis.
For anyone who has been in a dispute over
a failure condition, the value of an objective, independent failure
analysis is apparent. Failures occur for any number of reasons:
design flaws, poor quality materials, manufacturing problems, improper
conditions during transport or storage, overstress during operation,
etc. Most companies (and their suppliers) lack the facilities to
perform the in-depth investigation that may be required to solve
an error condition. Without an accurate analysis of the failure,
good faith efforts to take corrective action can be futile or counterproductive.
Much more than a glorified test lab, a failure analysis lab requires
specialized equipment and knowledge of cause and effect relationships
to discover the root cause of a failure and the resulting chain
of events. During the investigation, the analyst will know which
tests to perform and in what sequence to bring the investigation
to a prompt, economical conclusion. The analyst will also recommend
appropriate corrective measures, based on where in the product cycle
the failure occurred and the priorities of the organization. If
any of these capabilities are necessary, engaging the services of
a professional failure analyst will make sense.
Note that sometimes even a limited failure
analysis can be highly worthwhile, such as when there is an urgent
need to eliminate a potential cause. For example, knowledge that
a failure was not caused by ESD can prevent wasting money on the
wrong corrective action.
Things to know before a failure
occurs
Laying the groundwork for a successful outcome starts before the
failure analyst arrives on the scene. To avoid classic mistakes,
the following precautions are worth sharing with all personnel who
may find themselves dealing with a failure situation.
Preserve the physical evidence.
If you don't know why something is broken, leave it alone. Re-flowing
a fractured solder joint or replacing a failed component may be
expected in a manufacturing culture, but it's a disaster to the
failure analyst, because it could mean destroying evidence of a
larger problem. (Obviously, people aren't expected to stop production
and call the failure analyst for every random event. But if there's
a trend, the worst thing would be to keep fixing defective parts
and sending them along.)
Don't put fractured surfaces back
together to see how well they fit.
If something is broken in two, it must be human nature to try to
jam the pieces together. On a microscopic level, this sliding and
mashing together of the surfaces destroys valuable evidence. If
provided with a clean, well-preserved fracture surface, the failure
analyst can almost always tell you exactly why it broke - as well
as whether the parts that were shipped last week need to be recalled.
Avoid electrical testing beyond
what is minimally required for troubleshooting.
Don't keep testing an intermittent until the circuit becomes completely
open. Once you blow up the failure site, there's not much analysis
that can be done: other than to confirm that someone tried to ram
a lot of power through the circuit.
Protect materials from heat, humidity,
vibration, ESD and other forces.
Use recommended protective coverings and containers.
Collect all existing data from
the failure.
What may seem irrelevant to the person on the scene may turn out
to be crucial. Remember that it is better to err on the side of
too much data rather than too little. Also be sure to record, photograph
and otherwise document any test set-ups that were used.
Don't overlook human sensory input.
Take note of unusual - or even slightly different - sounds, smells
and sights. Whenever possible, interview the people who first noticed
the failure. Often you get different answers than from somebody
who wasn't actually present.
Physical analysis
If the evidence has been preserved, the failure analyst can tease
a great deal of information from a failure site, aided by specialized
equipment not available in traditional test labs. For example, the
electron microscope at Raytheon's Reliability Analysis Laboratory
allows examination at up to 200,000×. In addition to knowing
how to interpret the tests correctly, the analyst will understand
what tests are truly necessary - and which aren't - to obtain a
clear view of all forces affecting the failed part.
The investigation begins with an external
examination of the suspect component. Depending on the device type
and specific observations, the analyst may employ a variety of techniques:
testing for thermal resistance, emission microscopy, hermeticity,
acoustic microscan or infrared imagery. The analysis then proceeds
to the internal examination, which brings into play one of the failure
analyst's special skills; the ability to take things apart without
destroying crucial evidence.
Taking apart delicate components is as
much an art as a science. For example, a cross section involves
more than simply slicing an object in half. The component is cast
in epoxy to protect it from disintegrating, then ground and polished
to a mirror finish for examination under a microscope. If the analyst
is investigating a separation between metals, an electron microscope
coupled with an X-ray emission spectrometer can map the distribution
of atomic species and identify exactly where the separation occurred.
This information is essential for the process people, who are putting
down layer after layer of metallization.
Disassembly can take a variety of forms:
chemical, thermal or mechanical. For example, if the component under
investigation is a plastic encapsulated microcircuit, it will have
to be decapsulated. This process involves dissolving the plastic,
leaving behind the semiconductor device and the wires for examination.
There also will be times where disassembly is not feasible: the
problem may lie in the way parts fit together, there may be a need
to look at the entire unit, or the part simply may be too valuable
to destroy. In these situations, the analyst can use non-invasive
techniques such as ultrasound and X-ray to examine what's inside.
Sorting out primary and secondary
failures
Most failures involve a series of cascading events. The failure
analyst interprets physical signatures to determine the correct
sequence, often creating a failure tree (Figure 1) to help guide
the investigation. (A complex failure cannot be solved without one.)
Branches list all possible reasons why a failure has occurred and
indicate where additional information is needed to support or refute
each potential cause. The analyst will work his way down through
the branches, investigating each possibility based on its degree
of plausibility. At the end, the analyst has not only found the
root cause, but also mapped the entire sequence back to the initial
symptom.

Figure 1 - Sample fault analysis tree diagram
In tracing the sequence of events, the failure analyst
can recognize subtle clues that may have eluded previous investigators.
In a recent case, a client came to RAL for an assessment of why
a high percentage of their power controller devices were failing.
The client had already sent the devices back to the manufacturer,
who had determined that the failure was caused by an electrical
overstress - in other words, a system flaw and the client's responsibility.
The client measured voltages and found that indeed they had voltage
spikes, which are actually fairly common in a motor driver controller.
They performed a design change to reduce the voltage spikes and
the failure rate did decrease for a time.
Eventually the failure rate shot up again. At this
point, the client contacted RAL, which used failure tree analysis
to assure that every possible failure mode was considered, not only
electrical overstress. In fact, the parts were being mechanically
damaged during assembly, leaving them susceptible to later failure.
The slight improvement after the design change occurred because
the "weak sisters" (the damaged parts that were primed
to fail) were no longer being hit quite as hard when voltage was
applied.
Figure 2 - Scanning acoustic microscopy image of die attach
in a plastic IC.
The manufacturing process involved a power die mounted
on a heat sink. During assembly, physical stress was being placed
on the die, causing minute cracks to form. While the initial mechanical
insult only started the crack, the accumulation of power and temperature
cycles caused it to spread through the brittle material within the
die. In some cases the cracks penetrated the active regions of the
die, effectively reducing the size of the transistor. The location
of a crack was determining if the part would fail when power was
applied.
The manufacturer had misinterpreted the sequence
of events. When the power devices failed, they essentially blew
up, leaving a crater and melted debris. The fact that there also
happened to be cracks in the die was not unusual in a part that
had been subjected to massive thermal stress. RAL determined that
the cracks occurred first, causing the part to blow up - rather
than that the thermal stress caused the cracks, as the manufacturer
had asserted. The cracks were not readily visible - they were within
material encapsulated inside the die. Using a scanning acoustic
microscope, RAL was able to detect cracks in parts that had not
yet blown up, confirming its theory. Eventually RAL worked with
the manufacturer to modify the way the parts were mounted, reducing
the bending stress that lead to the failure
Determining the "people-centric"
root cause
Of course, the physical root cause didn't just occur on its own;
ultimately the actions of people made it happen. While examining
the human factors is beyond the scope of a narrow technical analysis,
it may turn out to be the most valuable step. Is there a fundamental
breakdown in communications or supervision? Are organizations and
people receiving proper incentives for their work? Are operators
trained properly? At some point the physical has to transition to
the organizational.
Figure
3 - A failure analysis lab requires specialized equipment and knowledge
of cause-and-effect relationships.
Part of the failure analyst's role involves determining
the best way to work with the people involved to make sure that
something similar doesn't happen again. For example, suppose the
problem is excessive heat on a solder joint. The easy solution is
to tell technician #124 that he put too much heat on the joint and
to be more careful next time. However, a more valuable corrective
action might be to revise training programs to spread awareness
of the damage that can be caused by too much heat.
Identifying and implementing corrective
actions
After pinpointing the root cause, the analyst will propose corrective
measures, which could include improvements in manufacturing processes,
using suppliers with higher levels of quality control, design changes,
revised specifications and other recommendations. The analyst will
also suggest ways to deal with the current problem depending on
the stage of the development cycle. For example, if a capacitor
failure is discovered before product assembly, the solution is easy
- don't install those components. If the same failure is discovered
after manufacturing has started, the analyst and client will have
to weigh the alternatives. While the client may want to remove the
bad component and install good ones, the cost and risk associated
with rework may be prohibitive. An alternative might be to screen
the manufactured parts by running them through an environmental
test to weed out the bad ones.
One of the most difficult failure situations is
when defective product has already gone out the door. Are you going
to conduct a recall? Are you going to go out and replace every bad
part in the field? For companies in this predicament, reliability
testing can determine the odds that a defective component will fail,
the possible consequences and the appropriate course of action.
A reliability analysis often follows from a failure analysis and
is used to determine the reliability of deficient material. This
is the kind of information you won't find in a textbook, because
no one would intentionally design and test systems constructed from
damaged goods. And when they are forced to perform such tests, they
never publish the results!
The failure analyst will conduct experiments on
a sample bad part, such as accelerated environmental stress testing,
to determine the probably of failure within a given time period.
The result will be numbers that managers can deal with to make business
decisions. For example, if the product is due for scheduled maintenance
in 18 months, the degree of risk may not justify a special retrofit
before then.
Validating corrective actions
The final step in the failure analysis is to verify that the corrective
measures were effective. The type of testing obviously depends on
what was fixed and how. If it's a production yield issue, the test
is obvious: run the process and see if yield has improved. If a
connector has failed after years in the field, the analyst might
devise an accelerated aging test to simulate long-term exposure.
If the failure involved corrosion, the analyst might run a salt
spray test. There is a whole science and art that goes into designing
test conditions to simulate various environments.
During this phase, the analyst's expertise will
help the client avoid common mistakes. For example, when evaluating
a thermal cycling failure, most people assume that thermal shock
must be more severe than a slow thermal cycle. Therefore, they'll
run a thermal shock test, without bothering with a slow thermal
cycle test. (The shock test also takes less time to complete, which
may also have something to do with the decision.) However, in certain
failure conditions, slow thermal cycling can actually be more stressful
than shock. Sometimes essential tests are neglected because they
seem unnecessary or counter-intuitive to the layman.
Success without fanfare
Most companies engage a failure analyst when disaster is looming
and they need an expert who can save the day at the last minute.
An alternative relationship is to consult with the failure analyst
during the design stage, when considering approaches where the outcome
is not certain. For example, someone may want to try a new technique
or technology because it seems better/cheaper/faster. Another red
flag would be when familiar technology is to be used in a new environment.
For example, an internal void within a component is normally not
an issue, unless it's at the bottom of the sea, where the part could
be crushed.
Through early intervention, the failure analyst
can review the proposed design with an eye toward reliability and
potential for failure, identifying potential flaws before they rise
to the level of a problem. The analyst will then advise the design
team on unexpected chemical, electrical and mechanical stresses,
and how the design can excite the various failure modes known to
exist. By identifying pitfalls in advance, the analyst can recommend
changes that will avoid the need for a full-scale failure investigation
down the road. It's a less dramatic approach, but more efficient
and predicable, and certainly easier on the blood pressure.
Dave Pinsky handles special system-level investigations and
long-term reliability evaluations as an engineering fellow at Raytheon's
Reliability Analysis Lab in Lexington, Mass. He can be reached at
david_a_pinsky@raytheon.com
and (781) 860-3330.
ERI has been granted permission to publish the above article from
Reed Business.
(back to the top)
|
| ******************************* |
| "Rules
of Thumb" for Thermal Design
by Joel Newberger
There would be no “rules
of thumb” if everyone knew the Old English origins of thumb
rules! Nevertheless, keeping up with engineering traditions, the
following “rules of thumb” relate to safe, reliable,
thermal design practice.
Table 1 summarizes allowable
power density levels associated with frequently encountered packaged
electronics physical designs. All power density levels shown in
Table 1 are based on spatially invariant heat loads (i.e., uniformly
distributed), and a maximum temperature rise of 20C (36F) above
surrounding bulk fluid temperature (air, or liquid where indicated).
Areal (A.) and Volumetric (B.) power density limits should not be
processed as “maximum-don’t-ever-exceed” levels,
but rather, if noticeably exceeded, the designer must be prepared
to incorporate thermal enhancements such as fans and/or heat sinks.
As an example, an ambient cooled 9.0 x 6.0 inch circuit
board dissipates 38.0 watts. The power density is 38.0/9x6=0.7 watts/sq
in. This design is clearly not a candidate for natural cooling.
Forced cooling will require multiple chip heat sinks and/or higher
air velocity…….but not too high, or else acoustic levels
will be unacceptable. And finally, liquid immersion cools everything!
A second example is a metallic, totally enclosed package at 500
cu. in. (6”height). At a thermal load of 30.0 watts, power
density=0.06 watts/cu in. This configuration can be readily designed
to reliably operate in an indoor environment. However, if used outdoors,
“beware” of solar illumination and nocturnal radiation
to the sky “vault” (mechanism for internal moisture
condensation, and ultimate corrosion).
Finally, consider a 300.0 cu. in. plastic desk
top unit, six inches high with 30.0 watts dissipation. Rules A and
B of Table 1 do not apply to plastics. However, at a six inch enclosure
height, the required vent opening (inlet perf=outlet perf) is 30/2=15
sq. in. opening, minimum. The more vent opening, the better! Openings
can be rectangular in shape, but no opening should be less than
0.08 inches.
It is left as an exercise to research the
etymological origins of “Rules of Thumb.” Good Luck.
Table 1:
| Rules of
Thumb |
| A - Areal Power Density; Watts/Sq
In |
| 1. Air, Naturally Cooled
Vertical Surfaces |
0.15
(Height<12.Inches) |
| 2. Air, Forced Cooled
Surfaces |
0.25-0.30
(Velocity=500ft/Min) |
| 3. Liquid Immersion |
>2.0 |
| B - Volumetric Power Density;
Watts/Cu In |
| Naturally-Cooled, Non Vented |
0.03, Height > 2.0 Ft. |
| Metal Enclosures |
0.07, Height < 2.0 Ft. |
| C - “Perf” (Vent
Opening) Power Density; Watts/Sq In Vent Opening |
| |
Enclosure
Height, In. |
Watts/Sq
In |
| Naturally
Cooled, Vented Enclosure (Internal Flow Impedance Negligible
When Compared To Impedance Associated With Equal I/O Vent Openings.) |
6.0 |
2.0 |
| 12.0 |
2.5 |
| 60.0 |
6.0 |
Joel
Newberger is President of Thermalogics, Inc., and a principal
in SNA Engineering. SNA specializes in mechanical design/packaging
and in thermo/structural analysis of electronic equipment used in
both commercial/industrial and military applications. To contact
Joel send an e-mail to newberger@equipment-reliability.com.
(back to the top)
|
| ******************************* |
| Real-Time
Monitoring and Analysis of Pneumatic Vibration Test Systems Actuators
by John Hess
Recently there has been growth in the acceptance
and use of pneumatic vibration test systems. While these systems
help to improve the reliability of tested products, they themselves
are subject to failure due to wear and contamination caused by unusually
harsh operational environments. Since all pneumatic vibration test
system designs rely upon reciprocating actuators, they all can suffer
from the same actuator failure issues. Failure or degradation of
one or more actuator(s) can lead to an imbalance or skewing of the
vibratory energy on the table, i.e. hot and cold spots, as well
as excessive wear on the remaining good actuators. Since the vibration
control system automatically compensates for actuator failure by
running the remaining operating actuators at a higher reciprocation
rate, the user is unaware of the problem until the next maintenance
cycle, if then. The result is under testing or over screening of
products and unrepeatable tests and test results.
In response to this problem, Data Flare has developed
V-flare. V-flare, supporting all systems currently available, actively
monitors the operation of pneumatic actuator vibration systems.
Using V-flare, users proactively determine the operational efficacy
of their pneumatic actuator test system, catching issues before
affecting product quality or reliability.
V-flare consists of three major components: sensors,
signal conditioner, and analysis/communications platform. V-flare’s
sensors attach directly to the actuator, requiring no actuator disassembly
and expediting system installation. These sensors, through induction,
measure the velocity of the actuator piston within its housing.
Since V-flare measures the motion of the piston within the actuator,
it requires one sensor per actuator to accurately assess the performance
of the entire vibration table.
The
computational platform supports the sensor signal analysis and communications
requirements of V-flare. Based upon an x86 processor, the system
analyzes the filtered and amplified sensor signals from the signal
conditioner, calculating velocity, acceleration, and position information
of the actuator piston. Through its sensor system and analysis techniques,
V-flare determines the operational frequency of actuator pistons
to a resolution of less than .05 Hz. This ultra fine analysis of
behavior allows V-flare to identify unintended actuator fluctuations
to levels previously unattainable.
V-flare identifies and determines the characteristics
of operation of individual pneumatic actuators. The system uses
these characteristics for each actuator, which include piston velocity,
acceleration, and position, to identify statistically significant
deviations, as well as for comparison against gold standards. V-flare
then reduces these characteristics to four status levels; off, running,
marginal, and failed. As the behavior of an actuator, or multiple
actuators change due to contamination or excessive wear, V-flare
identifies these changes, and flags them for the user.
Not only does the computational platform support
V-flare’s analysis, but it also supports the underlying communications
capabilities. Depending upon the platform selected, the system supports
local viewing of actuator characteristics, or remote access through
TCP/IP networks with an ordinary web browser. In addition, since
the platform is based upon Microsoft’s .NET platform, V-flare
supports actuator status streaming through web services to 3rd party
and user created applications.
In a recent field test, V-flare analyzed a poorly
performing vibration system. During testing, V-flare identified
one failed (non-operating) and one degraded (running at lower impact
levels) actuator. Replacing these actuators returned the system
into specification, saving the user over $6,000 in actuator replacement
costs. Incidentally, these savings do not take into account costs
associated with out of specification tests and screening.
If the consistency and repeatability of your accelerated
testing and screening is important to your organization and if you
prefer a maintenance program based on the health of you system rather
than the calendar, please contact Data Flare regarding V-flare.
John
Hess, CTO of Data
Flare, leads the technology initiatives at Data Flare. Previously,
John managed and directed hardware and software development at Internet
network communications, E-commerce, and Capital Equipment organizations.
(back to the top)
|
| ******************************* |
Test
Lab Musings (part 4)
by Robert L. Renz
And
while we're on the topic of leftover stuff, what about the old fixture
that is bolted together and drilled for use on a table/expander
that doesn't exist, except maybe in photos of the "Old Lab"
that are hanging on the wall? At this point, throw it away, or if
you still need it from time to time, its off to the machine shop
for a dose of Plastic Aluminum epoxy adhesive, and some new holes.
Even
though an accelerometer has hex sides, tighten it down very gently
– use a very small finger to push the wrench instead of your
hand. The makers have torque specs, but you might not have a small
enough torque wrench…
Murphy’s
Law being what it is, the odds are that the accelerometer you screw
into the fixture will wind up with the connector facing in the wrong
direction. Not to worry - you can either use an accelerometer with
a top connector instead of a side connector or add a shim washer,
but be sure that it is as thin as possible (an 0.015 thick washer
will move the accelerometer 1/2 turn if you use 10-32 accelerometer
studs).
Robert L. Renz of General Dynamics - Advanced Information
Systems at Bloomington, Minnesota.
(back to the top)
|
|
| Free
sample of Chapter 1 |
|
If you would like to request a free sample of Chapter 1 - "What
are vibration and shock?", from Wayne's new book "(...)
Random Vibration and Shock Testing", please visit our website.
Fill out the quick form and submit it to us. We will then e-mail
you a PDF file of Chapter 1.
|
| |
| New
Fixture Design course |
|
"Vibration
and Shock Test Fixture Design" is the newest course offered
by ERI. It will meet at October 12-14, 2004, at Pomona, California.
You can read about instructor Steve Brenner at our website.
Steve admits to having designed some poor fixtures. Fortunately,
he was able to gain some theoretical understanding of structural
responses to
vibration and shock, to study those unsatisfactory fixture, to improve
them, and to avoid most mistakes on future designs. Here he will
teach
what he has learned on this subject. Most of Steve's presentation
will
use PowerPoint slides. A highlight on Day #3 will be a visit to
a professional manufacturer of fixtures, Baughn
Engineering at LaVerne, California to see fixtures evolving
from raw materials into finished,
ready-to-ship fixtures.
|
| |
|
ERI Overseas Courses |
|
If you cannot
attend courses in the US, ERI offers two overseas courses. ERI Specialists
Deepak
Jariwala will teach at Singapore, and Markus
Dumelin will teach at Switzerland. Dates and locations are shown
below. Click on the links for more detailed information.
July
13-15, 2004,
at Singapore
October
5-7, 2004,
at Zug, Switzerland
|
| |
| Vibration
and Shock courses coming up |
|
Wayne Tustin will teach short courses in vibration testing, shock
testing, measurement, analysis, calibration, HALT, ESS and HASS
at the following locations:
August
24-26, 2004
Santa Barbara, California
October
5-7, 2004
Littleton, Massachusetts
November
1-3, 2004
Detroit, Michigan
December
7-9, 2004
Marietta, Georgia
If neither of these locations and dates meets your
needs, perhaps you’d like to have a customized training for
presentation at your facility for your designers and test specialists.
|
| |
| Announcements |
|
Safety Critical Systems Conference
19-20 May, 2004 Dissecting The Latest Tools & Techniques For
Increasing
Effectiveness In Safety Critical Systems (London, UK)
Please click
here to get more information about this event.
|
| |
|
Contact information
|
|
ERI - Equipment Reliability Institute
1520 Santa Rosa Ave.
Santa Barbara - CA - 93109
Tel: (805) 564-1260
Our
fax number:
(805) 966-7875
Wayne Tustin tustin@equipment-
reliability.com
Webmaster webmaster@equipment
- reliability.com
Websites
http://www.equipment-
reliability.com
http://vibrationand
shock.com
Copyright © 2000-2004 Equipment Reliability Institute.
All rights reserved. |
| |
|
Free Newsletter |
|
Subscribe
If you would like to subscribe to ERI News, go to either website,
fill in the form "Free Newsletter" and hit the Submit
button. Subscribe
now!
Recommend
If you enjoy reading ERI News and want to recommend it to a friend,
just hit "forward" on the menu of your e-mail program
or tell your friend to subscribe at our website.
Previous issues
Missed the previous issues? It is not a problem. Just visit our
newsletter
archives section and find all ERI's News issues.
Unsubscribe
If you do not want to receive ERI's quarterly newsletter, please
send us an e-mail
using the same e-mail address that brought you the newsletter,
with "remove" as subject. |
|