Understanding with AXI Protocol and Cache Coherency
As AXI protocol and Cache Coherency are commonly used concepts these days in almost each and every complex SoC’s so knowledge of those concepts are must for everyone to know how it works.
Explain AXI architecture. What are the different channels as per AXI protocol?
The AXI protocol is burst-based and defines the following independent transaction channels:
β’ read address
β’ read data
β’ write address
β’ write data
β’ write response.
An address channel carries control information that describes the nature of the data to be transferred.
The data is transferred between master and slave using either:
β’ A write data channel to transfer data from the master to the slave. In a write transaction, the slave uses the write response channel to signal the completion of the transfer to the master.
β’ A read data channel to transfer data from the slave to the master.
The AXI protocol:
β’ permits address information to be issued ahead of the actual data transfer
β’ supports multiple outstanding transactions
β’ supports out-of-order completion of transactions
Does AXI support existing AHB and APB interface?
Yes, it supports
Advantages of AXI over AHB protocol?
1. AXI has 1 read address channel, 1 write address channel, 1 read data channel, 1 write data channel. 1 write response channel That is all together it has 5 parallel channels.
Whereas AHB has 1 address channel, 1 read data channel, 1 write data channel.
2. AXI as native support for multiple outstanding transactions.
3. AXI supports transaction IDs. The user may issue multiple outstanding transactions per transaction ID.
4. User can insert a pipeline register anywhere in the path of any of the 5 channels, which helps in timing closure and help achieve higher operating frequency.
5. The length of the burst is always known right at the start. This feature is supported by using AxLEN bits. Wherein AHB is unknown at the start.
6. Write Strobes Are supported.
7. AXI3 supports Locked Transfers, AXI4 does not support Locked Transfers.
What do you mean by multiple outstanding transactions? why is it useful?
Master initiates a transaction and doesn't wait for it to complete(response to arrive) and initiates another transaction. So the first transaction is an outstanding transaction. AXI supports multiple outstanding transactions so an AXI master doesn't have to wait for a transaction to complete to initiate a new one. So the performance
is boosted.
Why read has only 2 channels?
READ operation doesn't have a response channel because direction both the read data and read response is from slave to master. With every beat, the slave will send a read response along with the data in read data channel.
What is the minimum and maximum data bus width supported in AXI?
The data bus width as per spec can be 8,16,32...,1024 bits. So the minimum is 8 and maximum is 1024 bits
Why is write data channel treated as buffered?
Write data channel information is always treated as buffered so that the master can perform write transactions without slave acknowledgment of previous write transactions.
Which channels are exclusive to the slave?
Write Response and Read data channels.
As per AXI terminology differentiate between beat, burst and transaction?
Transaction - The complete set of required operations on the AXI bus.
Burst - Required payload data to is transferred.
Beats - Burst can comprise multiple data transfers.
Can a master can give WLAST in middle of a burst transfer?
No, Because early burst termination is not supported.
What is easy addition of register stages to provide timing closure?
Each AXI channel transfers information in only one direction, and the architecture does not require any fixed relationship between the channels.
This means a register stage can be inserted at almost any point in any channel, at the cost of an additional cycle of latency.
What is an interconnect?
An Interconnect is a component with more than one interface that connects one or more master components to one or more slave components.
What is control information?
The characteristics of a transaction(read/write) like burst_length, burst_size, burst type, atomic characteristics, etc are called the control information.
What are the major actions done by interconnect?
Manages the transactions between the MASTER and SLAVE like Routing, providing responses, buffer.
Topologies using Interconnect?
Shared address and data buses, shared address buses and multiple data buses, multilayer with multiple addresses and data buses.
what is meant by high latency?
If the AXI slave component is taking more time in responding back to the master for the completion of the transfer then such components
are said to be having high initial access latency.
which component is responsible for calculating subsequent transfers in a burst?
Slave(calculates address of subsequent transfers)
Difference between Channel and Bus ? If they are same then why two different names?
The input and output on hardware are set to individual channels. But the bus is just a pathway from and-to somewhere.
What is need of interleaving?
Data interleving increases the throughput.
What is the meaning of point to point interconnect?
The connection between two components.
What does AXLEN and AXSIZE represents?
The AXSIZE signal denotes how much amount of data in bytes can be accommodated in a single beat of the burst.
AXLEN denotes how many transfers are there in a burst.
Mention the LOW POWER INTERFACE SIGNALS supported by AXI3 AXI4 protocols?
CACTIVE, CSYSREQ, and CSYSACK.
What is the purpose of byte lane strobe ? Is strobe used for both read and write operation?
The strobe signal is used to indicate which bytes of the write data bus are valid for each transfer of data. No, it's only used in a write operation.
What's the purpose of LAST signal during a transaction? Does both read and write operation use it? If yes, which channel is used to send this signal?
This signal indicates the last transfer in a write/read burst. Yes, Write data and read data channels are used to send this signal.
Explain the basic handshaking mechanism in AXI.
1.The source uses the VALID signal to indicate when valid information is available.
2.The VALID signal must remain asserted, meaning set to high, until the destination accepts the information.
3.The destination indicates when it can accept information using the READY signal. The READY signal goes from the channel destination to the channel source.
4.This mechanism is not an asynchronous handshake and requires the rising edge of the clock for the handshake to complete.
Does VALID and READY signal have dependencies on each other?
No, the initiator and receiver should not wait for the assertion of handshaking signal but after a successful handshake, valid must be deasserted as per spec.
When should the VALID signal go high and low?
VALID should go high when the initiator has valid information to send. It should go low if there is no valid information and it should go low after a successful handshake.
When must the slave give write response?
Write response is generated after the completion of a write transaction.
What is deadlock condition?
There are certain dependencies on how handshaking signals should be asserted. If it's violated handshaking will not occur and the process will be stalled. It's called a deadlock scenario.
For eg., A deadlock condition can occur if the slave is waiting for WVALID before asserting AWREADY.
Why there was no write response for each beat in burst Write. But there is a separate read response for each beat in a read burst?
For read transfers, the information and the response flow are from slave to master. But for a write transaction, the information and the response are in different directions.
So individual responses for each transfer will involve more clock cycles and unnecessary traffic because of the two-way flow between master and slave.
So it is better to have a single response for a write transaction compared to a response for each transfer in a read transaction.
How to ensure data integrity on AXI?
By ensuring proper channel handshaking dependencies as per the protocol, We can ensure data integrity.
Is there a possibility that A Read transaction can complete in One Cycle?
NO, because data handshaking happens at least one CLK cycle after the address handshaking.
What will happen if last is not asserted after completion of the transfer?
With respect to write operation, WLAST indicates it's the last transfer in write burst. So if WLAST is not provided by MASTER, the slave will not know whether the transfer is completed or not. So it will not be able to assert any response signal.
In AXI we have any time out condition w.r.t channel handshake.
NO, But it's based on the user's requirement.
What is 4KB address boundary in AXI?
The granularity of mapping in AXI is 4KB. That means the smallest "block" of addresses that can be assigned to a given slave/peripheral is 4KB. And all allocations are multiples of 4KB. So when you cross a 4K boundary you are potentially going from slave A's address space to slave B' Discarding read data that is not required can result in lost data when accessing a read-sensitive device such as a FIFO.
When accessing such a device, a master must use a burst length that exactly matches the size of the required data transfer
Importance of RRESP and BRESP?
After the initiation of a transaction, the Master must have status information of that particular transaction. Sometimes an address to which a transaction
is initiated will not be available because the address will not be there or maybe not accessible because of the secured type. Sometimes the slave may not accept the data.
So in these conditions the master but be aware of the status so it can act accordingly. So response signals are important.
Types of responses?
Okay, exclusive okay, decode error, slave error.
If master is sending a address but none of the slave is having that address. So which response will you get?
Decode error
With respect to the assertion of valid and ready signals, which order of assertion provides most efficient handshaking?
When both the source and destination happen to indicate in the single rising edge, that they can transfer the address, data, or control information.
In this case, the transfer occurs at the rising clock edge when the assertion of both VALID and READY can be recognized. This means the transfer occurs at the next rising edge.
Why the specification recommends default state of AWREADY as High?
When AWREADY is HIGH the slave must be able to accept any valid address that is presented to it. As the default, AWREADY state of LOW forces the transfer to take at least two cycles, one to assert AWVALID and another to assert AWREADY.
RVALID to be asserted before ARVALID? Explain if the statement is right or not?
It is incorrect. the slave must wait for both ARVALID and ARREADY to be asserted before it asserts RVALID to indicate that valid data is available.
The address phase is followed by the data transfer phase.So why a master must not wait for AWREADY to be asserted before driving WVALID?
Address and data are two independent channels. Address and control information is transfer to the address channel by which the slave configuring accordingly to receive the data.
As this information is generated from the master, it can assert a valid signal. And also deadlock conditions can the avoided.
What are the rules governing the use of bursts as per AXI protocol?
1. For wrapping bursts, the burst length must be 2, 4, 8, or 16
2. A burst must not cross a 4KB address boundary
3. Early termination of bursts it not supported.
What's the significance of AxBURST signal? What's the different burst types supported in AXI?
Burst type. The burst type and the size information, determine how the address for each transfer within the burst is calculated.
FIXED, INCR, WRAP are the burst types supported in AXI
Who usually generates decode error?
An interconnect component, to indicate that there is no slave at the transaction address.
How to proceed with the further transfers if the Start Address issued by the MASTER is UNALIGNED?
For the second transfer, convert the unaligned address to aligned and then continue the transaction.
What happens in the case of WRAP BURST if FIRST ADDRESS is higher than the wrap boundary.
The transaction starts with that first address only. When it reaches the addressN it wraps back to the wrap boundary and continues till axlen.
What happens when unalinged addr is given for wrap brust type?
Transaction will not take place as unaligned addresses are not supported in the wrap burst.
Should valid and ready be deasserted after a every successful handshaking is done.if yes,why?
Valid should be deasserted if handshaking is completed and the addresses are not coming in each and every clock cycle. Ready need not be deasserted.
Where we can use INCR And Fixed burst?
INCR burst is used in sequential memory and FIXED is used in FIFO.
What is max bytes which can be transfer in a single burst?
AXI3 16*128 bytes, AXI4 128*256 bytes
What is restriction on size of any transfer?
Data bus width
What is upper byte lane and lower byte lane?
The byte lane of the highest addressed byte of a transfer is the upper byte lane and the lowest addressed byte of a transfer is the lower byte lane.
AXLEN = 4 then burst length?
Burst_length = AxLEN+1 so 5.
AXI supports for burst length for the incr burst type is 1-256 bytes and for fixed, wrap is 1-16 bytes.(True/False)
True. As per AXI4 specification only for INCR burst, the burst length is 1-256 transfers. For fixed and wrap it's 1-16 transfers. And for a wrap, it must be 2,4,8 and 16.
How the address is defined as Aligned or unaligned?
If(start address % transfer size == 0) address is aligned address else address is unaligned
Is 0110 strobe is valid?
No, It will either upper byte lane or the lower byte lane. We cannot mix both lanes to transfer.
If the AXI Bus is wider than the burst size then how the transfer is done? What's a narrow transfer and how is it performed?
If a transfer is narrower than its data bus. Then it's called a narrow transfer. When a master generates a transfer that is narrower than its data bus, the address, and control information determine which byte lanes the transfer uses:
β’ in incrementing or wrapping bursts, different byte lanes are used on each beat of the burst.
β’ in a fixed burst, the same byte lanes are used on each beat.
What is the value of WSTRB when WVALID is LOW?
WSTRB can take any value. But it's recommended that they have to either driver low or held previous values.
What is WRAP and How to Calculate Address in WRAP Burst?
These equations determine addresses of transfers within a burst:
β’ Start_Address = AxADDR
β’ Number_Bytes = 2 ^ AxSIZE
β’ Burst_Length = AxLEN + 1
β’ Aligned_Address = (INT(Start_Address / Number_Bytes) ) x Number_Bytes.
This equation determines the address of the first transfer in a burst:
β’ Address_1 = Start_Address.
For an INCR burst, and for a WRAP burst for which the address has not wrapped, this equation determines the address of any transfer after the first transfer in a burst:
β’ Address_N = Aligned_Address + (N β 1) Γ Number_Bytes.
For a WRAP burst, the Wrap_Boundary variable defines the wrapping boundary:
β’ Wrap_Boundary = (INT(Start_Address / (Number_Bytes Γ Burst_Length)))Γ (Number_Bytes Γ Burst_Length).
For a WRAP burst, if Address_N = Wrap_Boundary + (Number_Bytes Γ Burst_Length), then:
β’ use this equation for the current transfer:
β Address_N = Wrap_Boundary
β’ use this equation for any subsequent transfers:
β Address_N = Start_Address + ((N β 1) Γ Number_Bytes) β (Number_Bytes Γ Burst_Length).
What do you understand by outstanding transactions?
The transactions which are yet to be completed are called outstanding transactions.
for example: Let us say we have 10 writes initiated from the Master component. Out of 10, only 3 of them have received an OKAY response from slaves. In such a case, the rest of the 7 writes whose responses are yet to be received are called outstanding transactions.
What does high initial latency devices mean?
If the AXI slave component is taking more time (in terms of clock cycles) in responding back to the master for the completion of the transfer then such components are said to be having high initial access latency.
What is a byte strobe?
AXI protocol provides a signal called WSTRB will enable on which data
lanes the data has to transfer.
What is an out of order response?
The responses from the slave can be sent out of order. There is no
restriction from the slave side where the responses are completed in the order in which they have been received. The exception here is the first transaction. Except for the first transaction, this facility is applicable.
Can we generate address information from slave?
No. Addresses (read/write) are generated only from the AXI Master side only. It is the READ data and write response channels that are owned by AXI slave.
The slave will only be sending READ Data, READ response, WRITE Responses.
Both the read data channel and the write data channel also include a LAST signal to indicate when the transfer of the final data item within a transaction takes place. Elaborate this statement.
The statement means that for both WRITE and READ, there will be an
associated WLAST and RLAST signals which can indicate whether the last item within a transaction has been taken place or not.
Explain the significance of AWSIZE.
The AWSIZE signal denotes how much amount of data in bytes can be
accommodated in a single transfer of the burst. The maximum value is 128 bytes.
Difference between rvalid, araddr, arvalid?
5 channels
Each channel will have a valid & ready signal.
Write operation has both data and address channels
AWADDR: write address
AWVALID: Write address valid: source is Master
AWREADY: READ address ready : source is Slave
WDATA : Write data
WVALID: VALID write data : Source is Master
WREADY: write ready : Source is slave
READ operation has both data and address channels
ARADDR: READ address
ARVALID: READ address valid: source is Master
ARREADY: READ address ready: source is Slave
RDATA : READ data: slave
RVALID: VALID READ data: Source is Slave
RREADY: READ ready: Source is Master
WRITE response channel:
Owned by slave
BVALID: Source is a slave
BREADY: Source is MAster
What is the maximum amount of allowable data that can be sent in a single Write transaction from an AXI Master as per the protocol?
The max allowable AWSIZE is 128 bytes and the max allowable length is 16.
So, it is the product of 128*16 = 2048 bytes.
Explain how a WRAP burst is an example of cache line access?
Let us take the following system scenario:
L2 cache memory is in the path between the processor and interconnect.
Any transfer that can access the cache will check the cache contents
(called cache lookup) before potentially accessing the downstream memory in
this case, it is DDR memory.
INCR is the simplest burst type, accessing a lower address and sequentially
and stepping up in memory to a higher address. These types of bursts can also be used in performing a cache, but the problem with that burst type is that you might need to perform a complete cache linefill before that data you want is stored in the cache and made available to the processor. This is where WRAP burst has an advantage.
A WRAP burst fetches the important data first (which the processor actually
wants) and then completes the cache line fill around that important data.
In system-level terminology, this important data which the processor actually
wants from the particular access location of the cache is called "critical word".
As an example, if we had an 8-word cache line, and the processor wanted to
read data from address 0x18 (the 7th entry on a cache line if that data was
cached), and INCR burst would need to fetch data for:
0x00, 0x04, 0x08, 0x0C, 0x10, 0x14 before finally getting the 0x18 data the
processor wants (the processor is no longer stalled), and then the final 0x1C
cache line entry is filled.
Instead, if we use a WRAP burst, this burst can start at 0x18 (so the processor
is no longer stalled), and the cache line then fills up around this "critical word", with accesses to 0x1C, 0x00, 0x04, 0x08, 0x0C, 0x10 and 0x14.
There will still be 8 memory accesses to perform the cache linefill but in most
cases the WRAP burst type will stall the requesting processor for fewer cycles than the INCR burst type.
Is EBT supported in AXI ? How can the AXI Master disable further writing of the transfer ? How can this be handled in READ transfer?
NO. Early burst termination is NOT supported in AXI. AXI Master can disable
writing by deasserting all the write strobes but it must complete the
remaining transfers of the burst. Discarding READ data that is NOT required
can result in lost data when accessing a READ sensitive device like FIFO.
What is the simple definition of cache coherency ?
Cache coherency is a system where the system s/w updates all cache to the same data,
using some additional extensions provided by the AMBA AXI4 ACE(AXI Coherency Extension) protocol.
L1 cache is specific to each core.
L2 cache is specific to processor sub-system
Example: Each core will have a unique L1 cache and all other cores in a sub system will have 1 L2 cache.
What is the role of system software with respect to the cache address allotment?
System s/w will decide which address is cacheable & which address is non-cacheable.
Accordingly, the processor will generate the signal AWCACHE in such a way that the address will be cached.
What will happen if the address is not present in the cache?
The processor will go and create an entry in the cache and will fetch the data & put it into the cache.
Explain the need for cache coherency
If the address is not present in the cache, then the processor will go and create an entry in the cache and will fetch the data & put it into the cache. During this process, there is a chance that L1 and L2 may go out of sync.
For example, there is an address 'h1000 present in the DDR memory, L1 and L2. In a case where the L1 cache address got updated and L2 is NOT updated, there should be a mechanism to make them in sync. Such a mechanism is called cache coherency.
Explain cache prefetching.
Prefetching refers to retrieving & storing data into buffer memory (cache) before the processor requires the data. When the processor wants to process the data, it is readily available and can be processed within a short period of time.
Had there not been a cache memory, the processor has to download the data directly from the memory address, hence there could be a delay.
Cache prefetching is a speed-up technique used by the processors where instructions/data are fetched before they are needed.
AWCACHE[1]:- For writes this means that number of writes can be merged together.
ARCACHE[1]:- For reads, this means that the location can be prefetched or can be fetched just once for multiple read transactions.
System s/w will decide which address is cacheable & which address is non-cacheable. Accordingly, the processor will generate the necessary
attributes over the signals AWCACHE/ARCACHE to provide support to system-level caches about the transaction types.
What is the purpose of RA and WA?
RA: if high, it means that if the transfer is read and if it misses in the cache then it could be allocated.
WA: if high, it means that if the transfer is write and if it misses in the cache then it could be allocated.
How a protection mechanism is provided in AXI protocol?
In the form of different variants of accesses.
a. privileged
b. normal
c. secure
d. nonsecure
Master1 performing EX-READ to a slave address. At the same time, another master2 performs an EX-READ on the same addrs of the same slave before EX-WRITE of Master1. What will happen in this scenario in terms of EX access result?
EX access fails. If a master doesn't complete the write portion of an exclusive operation, a subsequent EX-RD changes the address that is
being monitored for exclusivity.
M1 performing EX-RD towards slave address, M2 performing WR(normal) to the same address, what will happen if M1 tried EX-WR on the same location later?
EX Fails. In such a case, to overcome the memory overriding problem, the slave reserves some memory resource for M1 virtually as indicated by EX-RD request earlier from M1. This is the fundamental advantage of exclusive access in AXI.
How does the slave treat the EX RD operation initiated by the master?
AXI slave will start monitoring the ADDRS on which EXREAD operation has been initiated and also the ARID provided by the master until either a write occurs to that location or until another EX READ with the same ARID value resets the EX ACCESS monitoring logic in the slave to a different address.
What are the restrictions applied for WRAP bursts?
The length of the burst must be 2,4,8,16. No support for unaligned transfers
In the response signalling mechanism, what is the difference between the responses for READ & WRITE?
IN WRITE: there is just one response given for the entire burst but not for each and every individual data item within the burst.
FOR READ: the slave can provide different responses for different transfers within a burst.
For example: in a burst of 16 read transfers, the slave might return an OKAY response for 15 of them and a SLVERR response for the 16th item.
How does AXI interconnect ensures that ID tags from all the masters are unique?
In a multi-master system, the IC will append additional information to the
ID tag to ensure that ID tags from all the masters are unique. The ID tag is
similar to a master number but with an extension that each master can
implement multiple virtual masters within the same port by supplying an ID tag to indicate the virtual master number.
Why is write data treated as buffered?
Write data is treated as buffered so that the master can perform write
transactions without slave acknowledgment of previous writes
Does the slave should provide the responses to bufferable transactions all the time in a system?
No. The interconnect can provide the responses.
Where is WLAST asserted?
WLAST is asserted for the last data item of the burst by the AXI Master.
An AXI slave MUST NOT give read data unless read address phase completes. Is it TRUE? Explain how?
Yes. Unless both ARVALID & ARREADY signals are seen HIGH, RVALID cannot be driven to HIGH value. It is a READE transaction. Unless the master drives the ADDRS for fetching the data, the READ transaction cannot be performed. Unless there is a valid read address, there cannot be a READ.
What is address boundary calculation?
Transfer size * burst_length
AWLEN: 4
AWSIZE is 4 bytes: 32 bits
address boundary is: 16
0 - F
How to set all the WSTRB bits to β1β?
By default, the WSTRB member of the master transaction is random, and it would get random values of all the bits when randomized. If the user wants to set all the bits to '1', then they can apply the constraint as below
Add relevant constraint during randomization as follows:
foreach (wstrb[i])
wstrb[i] == (1<<(1<<this.burst_size)) - 1;
All AXI and Cache Coherency concepts are guided by one of the most experienced Verification Engineer in the industry Mr. Rahul Bhardwaj who is having more than 15 years of experience in the ASIC Verification Domain and working on different products in the VLSI industry.
Thank you so much Mr. Rahul Bhardwaj for providing such valuable information as I know you are having the busiest schedule still you are giving your time to help the engineers.
I will come with new blog posts soon till then Keep on learning and Keep on Growing See Ya Take Care:)
Hello Hardik,
I would like to add one more point to the below mentioned question.
Q: What will happen if the address is not present in the Cache?
Ans:
For any given data, the Processor sends its request to the Cache memory.
If the data is found in Cache, it can be loaded quickly into the CPU. If is not resident in Cache, the request is forwarded to the next lower level of the hierarchy, and this process begins again.
If the data is found at this level, the whole block in which the data resides is transferred into the Cache.
If the data is not found at this level, the request is forwarded to the next lower level, and so on.
Thanks a lot, Prathyusha for adding more detail view of the concept π
Excellent Post. Thank you!
Thanks a lot Bhubaneshwar π
It was really a easy and nice explanation. Thank you !
Hi Minal,
Thank you so much for your valuable feedback.
Regards,
Hardik
Hi,
can anyone explain how to write a test case for outstanding and out of order transactions in AXI
Hi, can any one give the information about how boot code works in soc? And how reset is handling in soc?
Regards,
Raushan
Excellent post .
Very useful information.
Thanks
Hi Raushan,
Thanks, for reading my blog post.
Thank you Hardik for such an amazing content.
I was searching for AXI interview question with answer. And finally my search end at your website.
Kudos to you and look forward to more such content.
Thank you so much.
Just one observation.
I found your website more readable in mobile rather than laptop.
I am using Microsoft edge browser there it was looking like plan text with no colouring.
Hi Ravi,
Thanks a lot for going through the blog posts and I checked in my Microsoft edge it looks fine to me. The theme of the website is just like that as I know.
Hi,
Thank you for this valuable content.
A question – AXI4 allows the write data to be sent before the write address and control information.
Can you please suggest when it might be useful, and elaborate on this subject?
Hi,
Thank you. Excellent post!!
I would like know the advantage of unaligned transfer.
Why is unaligned transfer used?
Hi Roy,
Thanks, for going through my blog posts. I think this will help you to understand better about unaligned transfers.
https://stackoverflow.com/questions/20926386/what-is-non-aligned-access-arm-keil
Regards,
Hardik