from Demos.RegCreateKeyTransacted import trans
Pipeline
Pipeline is used to process output items.
In BlockchainSpider, the default pipeline is to save all output data to a file, e.g., CSV, JSON, etc.
The default storage path is . /data
.
All built-in pipelines are defined under BlockchainSpider/pipelines
.
Of course, you can also define your own pipeline.
Here is an example.
We start a trans.block.evm
spider that extracts data from the block.
If there are more than 1000 Ether transactions in the extracted data, we print the transaction hash in the custom pipeline.
First, we define a file mypipe.py
in the project root directory:
from BlockchainSpider.items import SyncItem, TransactionItem
class MyPipeline:
def process_item(self, item, spider):
# `trans.block.evm` spider output the SyncItem that should be unpacked.
# you can find the definition of SyncItem in `BlockchainSpider/items/sync.py`
if not isinstance(item, SyncItem):
return item
transfers = item.get(TransactionItem.__name__)
for transfer in transfers:
value = transfer.get('value', 0)
amount = value / 10**18 # Convert Wei to Ether
if amount > 1000:
print(f"Transaction hash: {transfer.get('transaction_hash')}")
return item
Next, you need to enable your pipeline in BlockchainSpider/settings.py
:
ITEM_PIPELINES = {
'mypipe.MyPipeline': 888,
}
Finally, start the trans.block.evm
spider:
scrapy crawl trans.block.evm \
-a providers=https://eth.llamarpc.com \
-a start_blk=19000000 \
-a end_blk=19100000