Digital Power Supervision and Telemetry

2012-08-21

What does a Digital Power device do between start-up and shutdown? Two core functions are supervision and telemetry. Supervision is a fast acting safety feature that prevents damage to the device and/or load. Telemetry is an ongoing quality management feature.

A recent advertisement in Bodo’s Power Systems listed the following benefits to digital power:

Optimization
Predictive Maintenance
Fault Detection

In this post we will look at a typical POL internal architecture and consider its impact on power system design.

POL Internal Structure

Figure 1 shows a simplified POL with three main blocks:

Supervisor
Monitor
Digital Processing Unit

The Digital Processing Unit is the brain. (Not showing is the core power conversion.) The Digital Processing Unit is responsible for handling the PMBus and CONTROL inputs, and asserting the POWER_GOOD and FAULT/.

Most devices have several of these, but for simplicity, we can stick input and output.

Supervision

Supervision is a fast acting single comparator or window comparator. Generally, the output bypasses state machines in the Digital Processing Unit and can directly stop power conversion and assert FAULT/. The Digital Processing Unit is updated afterward so that a PMBus host can query the device’s fault register.

The purpose of Supervisors is to protect the load and device, therefore it trades off some accuracy for speed. The HI/LO values are typically stored in NVROM or programmed by the PMBus through commands like VOUT_UV_FAULT_LIMIT. Fault behaviors are also stored in NVROM, and include things like retrying, delay between retry, etc.

Monitoring

Monitoring is a high accuracy measurement via an ADC. The Digital Processing Unit is typically implemented as a state machine or software loop that polls ADC output data and makes it available by PMBus. Monitoring data can also be used in a very accurate Digital Processing Unit servo loop to improve output accuracy.

Faults

Faults can occur from a Supervisor or Monitor. For a supervisor, a DAC provides a reference to the comparator, and the output feeds directly to the FAULT/ pin. For a monitor, the Digital Processing Unit uses a digital comparator or software conditional instruction to Digital Processing Unit the FAULT/ pin.

Trade-offs

The trade-offs a POL designer makes are pretty straight forward. Safety dictates which inputs and outputs have supervisors. Monitoring trade-offs involve accuracy, because ADCs take up real estate and power, and the number of channels, muxes, etc.

As a system designer, one must consider what their system will use the data for, and how accurate it must be. For example, typical uses are:

System bring up and debug
Efficiency monitoring
Energy monitoring
Failure prediction
Optimization (local and global)
Accuracy enhancement (servo)

Examples

Each power architecture is different, and there is no universal set of trade-offs, so I’ll provide a few examples of using Supervisors and Monitors to wet your imagination. Besides, you may find a competitive advantage once you see what is possible.

Example Supervisor Fault Generation

This example comes from an LTC2974, a supervisor/monitor device that manages four POLs. The output voltage (of a POL it is in charge of) has a window comparator based supervisor.

13_blog_2 — Figure 2. Supervisor Generated Fault

Trace 4 is the FAULT/ pin of the device, and Trace 3 is the ALERT/ pin of the device. I shorted the output to ground. The delay between grounding and the FAULT/ going low is about 12us on this device. Very shortly after, we also get an ALERT/ pulled low. These are very fast because the supervisor bypasses all state machines involved in monitoring slower ADC and directly creates the fault. It also halted power conversion.

Looking at the PMBus, the PMBus host performed an Alert Response Address (ARA) transaction. The address 0 × 0C is placed on the bus, and faulted device put 0 × 64 on the bus. The host shifts this right one bit to get the address 0 × 32. Then the host reads the fault register by placing address 0 × 32 on the bus followed by the command byte 0 × 79. Then a repeated start with address 0 × 32 is put on the bus and two data bytes are returned for a status word of 0 × 8041.

Looking at the datasheet for this device it shows an under voltage fault.

We can also look at this with an external tool that displays the device registers and status.

Remember the use models I presented in my second post?

The Supervisor leads to a fault which supports both models. It can be processed by PMBus, or by an external tool.

Note: We will see the implementation of this in a future post, but basically the ALERT/ pin is tied to a Microcontroller interrupt.

Example Temperature Monitoring

Many devices can monitor internal die temperature and external temperature via a diode. In this example (LTC3880), I have a board manager that monitors rails via PMBus and has a LCD touch screen display.

13_blog_5 — Figure 5. Temperature Monitoring

The telemetry plot is showing internal die temperature. The dip in the graph occurred when I put my finger on the device and it cooled down. Min and max on the plot are 30deg and 40deg. You can see that the measurement is pretty good.

The device will use this temperature to protect itself, but it could also be used to detect more subtle problems. If you add in an I²C temperature monitoring device and put sensors around a PCB, between the sensors and all the PMBus devices, you can get a pretty good profile of the board. You could use this to balance temperature by controlling loads, characterize a system under different loads, or simply send a warning message to a system operator so they can replace a board and send it for service.

The same could be done for efficiency. By measuring input and output voltages and current, you can calculate the power efficiency on the fly, and use the information for system optimization by shifting work-loads to keep converters nearer their optimum efficiency. You could also look for atypical patterns that might detect problems before they occur. Board managers typically have communication interfaces can send out these notifications for you.

Autonomous vs. Managed Systems

I would like to put these performance parameters in the context of use models:

In my first post I presented two use models:

Configure and Deploy
Monitor and Act

Another way to describe these is an Autonomous vs. Managed System. An autonomous system is one where the power converters power up and act completely independently from the PMBus, much like model 1. And a Managed System actively uses the PMBus, much like model 2.

These models have different performance implications. PMBus has its own performance limited by the 400Khz (typical) bus clock. Supervision performance is independent from the PMBus whether implemented as an analog comparator and direct logic or slower logic in the Digital Processing Unit.

In a Managed System, one like Monitor and Act, the Act portion has the same performance as an Autonomous System until the PMBus is in a decision making loop by the host. Once a host has to read telemetry and apply some functional or parametric change to the device, performance is typically limited by the PMBus.

Managed System performance is also qualitatively different, because the host has to manage multiple rails, where the quantity depends on the system architecture. Suppose it takes 200uS (400kHz bus) to read a value and change a value in response. Say I then have 10 rails in control loops in the host, now I have 2mS. Now, add some I²C chips for monitoring temperature. Add other functions in the host not related to PMBus, and eventually the response time of the system is slower than the Digital Processing Unit. Furthermore, if you run the bus at 100kHz because of some slower I²C devices, things get even slower.

It is for this reason that hybrid use models are used, where the critical functions are all handled by the Digital Processing Unit (and fast supervisors) and don’t depend on PMBus, and higher level functions such as energy consumption and failure prediction are handled by a PMBus host.

And for this same reason, when higher level functions are not required, devices work just fine by themselves, and PMBus is an enabler for configuration tools. In particular, a PMBus tool is very useful for board bring up. Tools display the status of all rails of a system in a dashboard format: telemetry, faults, and settings.

Review

Most Digital Power devices have Supervisors and Monitors. I have characterized the Supervisor as a fast acting safety device, and the Monitor as a sampling device for telemetry. This is a convenient way to categorize, but be careful with terminology, especially with respect to supervisors and faults. Sometimes the word “Supervisor” is applied to a fault generating technique using data from a Monitor, and then it typically it has a larger latency than comparators.

There is nothing wrong with this. If a device already needs a monitor, and if faults do not need to occur super-fast, why pay for comparators and logic you don’t need? Just read data sheets and look at their block diagrams carefully so you understand how the device behaves. Chip designers are pretty good at making trade-offs, but only you can confirm they work for your application. But in general, you will find if it is a safety issue, there will be comparators, and if it is an accuracy issue, there will be high accuracy ADCs.

While the use of supervisors is pretty obvious, monitors are sometimes under-appreciated, until you wish you had them. It is easy to focus on sizing rails, considering transient response, and all the other analog behaviors and not think about system level opportunities. But once you have PMBus and all its controls, NVMRAM to store settings, and software tools for configuration, consider what you can do with live data. You may find a competitive advantage with just a little more effort. You also don’t have to implement all the firmware up front. The nice thing about firmware is you can upgrade it without hardware changes.

If you can predict a failure or optimize efficiency, you can often pay back the investment in firmware by 100X the development cost. All it takes is adding or using an existing FPGA or Microcontroller in the design, some knowledge of your application domain, and a little imagination.

In future posts, I’ll cover issues of PMBus integration in more detail. By now you are probably thinking it will be a lot of work. However, it is not that hard at all.

Stay tuned…